DuckDB

Databases

DuckDB is increasingly being used for the lightning-fast analysis of large datasets in data analysis, Geo-ICT, AI, and modern analytics workflows. For example, organizations use DuckDB to process Parquet files, analyze millions of records without a heavy-duty database server, and combine SQL with Python and R environments. Thanks to its high performance and ease of use, DuckDB is rapidly emerging as a popular alternative to traditional analytics databases.

Course duration: 2 days

DuckDB Basics Course

DuckDB is increasingly being used for the rapid analysis of large datasets in data analysis, Geo-ICT, AI, and modern analytics workflows. Organizations use DuckDB, for example, to process Parquet files, analyze millions of records without a heavy-duty database server, and combine SQL with Python and R environments.

DuckDB is a modern, embedded analytics database specifically designed for fast analytical queries. Unlike traditional database servers, DuckDB does not need to be installed or managed as a separate server. This makes DuckDB particularly well-suited for data analysts, GIS specialists, researchers, and developers who want to work directly from files, scripts, or notebooks.

A key advantage of DuckDB is its strong support for modern data formats such as CSV, JSON, and Parquet. Parquet, in particular, is playing an increasingly important role in data engineering, cloud data lakes, GeoAI, and scalable analytics environments. With DuckDB, these files can be queried directly using SQL, without the data first needing to be loaded into a heavy-duty database environment.

DuckDB integrates well with modern workflows using Python, R, Pandas, Polars, and Jupyter notebooks. As a result, DuckDB serves as a practical bridge between traditional SQL analysis and modern data science. Participants will learn how to efficiently analyze, combine, and prepare large datasets for reports, dashboards, models, and further processing.

Why Take the DuckDB Basics Course?

DuckDB is of interest to anyone working in data analysis, data engineering, or Geo-ICT who wants to handle large datasets faster, more easily, and with greater flexibility. The database combines the power of SQL with the convenience of working from files and programming environments.

  • Speed: DuckDB is optimized for analytical queries and can process large datasets very efficiently.
  • Simplicity: No complex server installation is required. DuckDB can be used directly locally or embedded.
  • Modern data formats: DuckDB works excellently with CSV, JSON, and especially Parquet.
  • Integration: DuckDB integrates well with Python, R, Pandas, Polars, and notebook environments.
  • Applicability: DuckDB is suitable for data analysis, reporting, ETL, AI workflows, and Geo-ICT applications.

Key points when working with DuckDB:

  • Embedded analytics: DuckDB can be used directly in applications, scripts, and notebooks.
  • SQL for analysis: Participants learn to query, filter, aggregate, and combine data using SQL.
  • File-oriented working: Data can be read directly from files without first importing them.
  • Performance: DuckDB uses column-oriented processing and smart optimizations.
  • Data engineering: DuckDB is suitable for building lightweight, fast, and reproducible data workflows.

The Basics: What Is DuckDB and How Does It Work?

DuckDB is a relational database designed for analytical workloads. While traditional databases often focus on transaction processing, DuckDB excels at performing fast analyses on large amounts of data. This makes DuckDB suitable for applications where datasets need to be explored, cleaned, merged, and analyzed.

Features of DuckDB:

  • Embedded database: DuckDB runs directly within an application, script, or notebook and does not require a separate database server.
  • Column-oriented processing: DuckDB is optimized for analytical queries that process large numbers of rows.
  • SQL support: Users can work with familiar SQL concepts such as selections, joins, aggregations, and views.
  • File integration: CSV, JSON, and Parquet files can be queried directly.

Modern Data Formats Highlighted:

  • CSV: Commonly used table files can be quickly imported, checked, and analyzed.
  • JSON: Semi-structured data can be processed and converted into usable analytical formats.
  • Parquet: A columnar storage format that is highly suitable for large datasets and modern data lake workflows.

DuckDB makes it possible to quickly turn raw data into actionable insights. This makes it a valuable tool for professionals who want to work efficiently with large datasets without immediately setting up a complex database infrastructure.

What Youโ€™ll Learn in the DuckDB Basics Course

Data Analysis with SQL and DuckDB

In this course, participants learn how to use DuckDB for practical data analysis. The focus is on writing clear SQL queries, working with tables and files, and performing analyses on larger datasets.

Key Concepts in DuckDB:

  • Tables and views: Structuring data for analysis and reuse.
  • SQL queries: Selecting, filtering, sorting, and aggregating data.
  • Joins: Combining different datasets into a single analysis environment.
  • Aggregations: Summarizing data with totals, averages, counts, and groups.

Working with Files in DuckDB:

  • CSV files: Importing, verifying, and querying data directly.
  • JSON files: Processing and analyzing semi-structured data.
  • Parquet files: Efficiently storing and analyzing large datasets.

DuckDB in Python, R, and Data Workflows

DuckDB is often used in combination with programming languages and data science environments. In this course, participants learn how DuckDB fits into modern workflows with Python, R, and DataFrames.

  • Python integration: Using DuckDB from Python scripts and notebooks.
  • R integration: Applying DuckDB within R workflows for analysis and reporting.
  • DataFrames: Exchanging data with Pandas, Polars, and similar environments.
  • ETL Workflows: Loading, transforming, and writing data to modern file formats.

Practical Applications:

  • Data analysis: Quickly explore and summarize large datasets.
  • Reporting: Preparing data for dashboards, reports, and visualizations.
  • Data engineering: Building lightweight and reproducible pipelines with SQL, Python, or R.
  • Geo-ICT: Preparing data for further analysis within GIS, GeoAI, and GeoParquet workflows.

Using DuckDB effectively helps professionals work with modern data sources faster and more easily. The course provides a solid foundation for anyone who wants to use DuckDB in data analysis, data engineering, AI, or Geo-ICT.

Why choose our DuckDB Fundamentals Course?

Choosing our DuckDB Fundamentals course at Geo-ICT Training Center offers unique benefits. We combine up-to-date knowledge of modern data analysis with practical applications within Geo-ICT, data engineering, and analytics workflows. The course is designed to be hands-on, so participants not only learn what DuckDB is, but more importantly, how they can immediately apply DuckDB in their own work. Thanks to our focus on GIS, GeoAI, open-source tooling, and modern data standards, this course provides a strong foundation for professionals who want to work with data in a future-oriented way.

Read more

Enroll

โ‚ฌ1095,-
  • Course duration:2 Course days from 9:00 AM to 4:00 PM
  • Location: Apeldoorn or Online. On-site is also possible. Please get in touch for a quotation.
Register for this course

Dagindeling

Day 1 โ€“ Fundamentals of DuckDB and Modern Data Analysis

On the first day of the course, participants will gain a thorough understanding of DuckDB as a modern analytics database. The course begins with an overview of DuckDBโ€™s architecture and philosophy, as well as how it differs from traditional databases such as PostgreSQL, MySQL, and SQLite. Participants will learn why DuckDB is particularly well-suited for analytical workloads, embedded applications, and modern data engineering workflows.

Next, the course covers working with SQL within DuckDB. Key topics include selections, filters, joins, aggregations, and combining datasets. Additionally, participants learn how to work directly with CSV, JSON, and Parquet files without complex import processes. Attention is also given to performance, column-oriented processing, and the efficient analysis of larger datasets.

Throughout the day, participants will engage in hands-on work with datasets and learn how DuckDB can be used for data analysis, reporting, and light ETL processes.

Day 2 โ€“ Advanced Analysis and Working with Modern Data Formats

On the second day of the course, participants will delve deeper into DuckDBโ€™s analytical capabilities and working with modern data sources. The focus is on efficiently processing larger datasets and performing more complex SQL analyses within a modern analytics environment.

Participants learn how to combine datasets from different sources and how to use DuckDB for filtering, aggregations, joins, and analytical queries on large volumes of data. The course involves extensive work with CSV, JSON, and especially Parquet files. Attention is also given to optimizing queries and understanding DuckDBโ€™s performance with analytical workloads.

In addition, modern data engineering concepts such as columnar storage, embedded analytics, and data lake workflows are explored. Participants discover how DuckDB can work directly on files without heavy database infrastructures and why DuckDB is increasingly being used in analytics, AI, and Geo-ICT environments.

Throughout the day, participants will engage in hands-on work with datasets and perform various analyses. By the end of the course, participants will have a solid foundation to independently use DuckDB for modern data analysis and analytics workflows.

Course duration: 2 dagen
Sign me up

Leerdoelen

  • You will learn how DuckDB can be used as a modern analytics database for analyzing large datasets.
  • You will learn how to work with SQL queries within DuckDB, including filters, joins, aggregations, and analytical operations.
  • You will learn how to directly import, combine, and analyze CSV, JSON, and Parquet files.
  • You will learn how DuckDB achieves high performance through columnar processing and embedded analytics concepts.
  • You will learn how DuckDB can be applied within modern data analysis, data engineering, and Geo-ICT workflows.

Want to know more?

Do you have questions about the course content? Or are you unsure whether the course aligns with your learning goals or preferences? Would you prefer an in-house or private course? Weโ€™d be happy to help.

DuckDB FAQs

PostgreSQL is primarily designed as a server-based relational database for transaction processing and multi-user environments. DuckDB, on the other hand, focuses on analytical workloads and runs embedded without requiring a separate server installation. This makes DuckDB particularly well-suited for fast analysis of large files and datasets.

DuckDB uses column-oriented processing and vectorized execution. As a result, the database only needs to read the relevant columns, allowing analytical queries to be executed very efficiently. This makes DuckDB particularly well-suited for aggregations, filtering, and analysis on large datasets.

SQLite is primarily designed for lightweight transactional applications and embedded applications. DuckDB has a similarly simple architecture, but is specifically optimized for data analysis and analytical workloads. As a result, DuckDB offers better performance with large datasets and complex analyses.

Parquet is a columnar file format that works efficiently with analytical databases such as DuckDB. DuckDB can query Parquet files directly without complex import processes. This enables large datasets to be analyzed quickly within modern data engineering, AI, and Geo-ICT workflows.

ย