Apache Druid : Complete Guide

Learn Druid Architecture, Kafka Ingestion, Schema Evolution, Tuning and Hive & Presto Integration

4.20 (72 reviews)

Udemy

platform

English

language

Other

Why take this course?

🏗️ Apache Druid: Complete Guide 🚀

Course Overview

Embark on a comprehensive journey through the realm of Apache Druid, where you'll uncover its powerful architecture and master the intricacies of Kafka ingestion, schema evolution, tuning, and integration with Hive & Presto. This course is meticulously designed to empower you with the knowledge and skills needed to leverage Druid for real-time analytics at scale.

What You'll Learn

Core Competencies:

Druid Architecture: Dive deep into the components that make up Apache Druid, understanding its strengths and where it excels in the data ecosystem.
Real-time Data Ingestion with Kafka: Write your own Twitter Producer app to pull tweets from Twitter in real-time and push them to Apache Kafka. Then, learn how to create a Kafka streaming task that ingests these tweets into Apache Druid.
Transformation, Filtering, and Schema Configuration: Master the art of applying transformations, filters, and configuring schemas during Kafka ingestion to ensure your data is accurate and useful.
Batch Ingestion Methods: Explore both Native and SQL Batch ingestion methods in detail, automating the loading process into Druid as part of an ETL pipeline.
Data Analysis with Spark: Utilize Apache Spark to read from Druid tables, create Spark Dataframes, and harness spark's predicate pushdown and aggregate capabilities for more efficient queries.
Schema Registry Integration: Understand how Druid integrates with Schema Registry for schema validation, particularly with Avro records.
Hive & Presto Integration: Learn how to integrate Druid with existing Hive or Presto (Trino) environments, enabling seamless data joins between Druid and your organization's data warehouses.

Module Breakdown

Module 1: Theoretical Foundations & Real-time Kafka Ingestion

Gain theoretical knowledge of Apache Druid.
Develop a Twitter Producer app for real-time data ingestion.
Understand the Kafka streaming task creation into Apache Druid.
Apply transformations, filters, and schema configurations during Kafka ingestion.

Module 2: ETL & Batch Data Loading

In-depth exploration of Native and SQL Batch ingestion methods.
Automate the loading to Apache Druid as part of an ETL pipeline.

Module 3: Data Analysis with Spark

Utilize Apache Spark to read from Druid tables.
Create Spark Dataframes and understand predicate and aggregate pushdown features.

Module 4: Schema Registry & Avro Record Parsing

Learn the integration of Druid with Schema Registry for schema validation.
Understand how Druid parses Avro records.

Modules 5 & 6: Druid's Out-of-the-Box Capabilities

Explore hive and presto integration with Apache Druid.
Discover how to join your organization's data in Hive or Presto with Druid tables.

Why This Course?

Apache Druid is a column-oriented, distributed, real-time analytics database that serves query logs for websites and other applications in need of fast, aggregated access to event-driven data. The ability to ingest data from Kafka, integrate with existing Hive or Presto environments, and handle schema changes without downtime are just a few reasons why mastering Druid is essential for modern analytics workflows.

By the end of this course, you will be equipped to implement Apache Druid within your organization, unlocking the full potential of your real-time data analytics capabilities.

🎉 Enroll Now and transform your data infrastructure with the power of Apache Druid!

Loading charts...