This year we have added half day tutorial* – 26th of February.

Time:

February 26th, 2 pm – 6 pm

Place:

Golden Floor a Conference & Workshops Center, Aleje Jerozolimskie 123A, 02-017 Warsaw.

*We will work in a group of no more than 20 people.

Detect, capture, and ingest changed data from RDBMS to Hadoop

26 Feb. – 15th floor, room no. 8

DESCRIPTION

This is a “How to” tutorial where we will be configuring a CDC data processing. It addresses common problem of tracking continuously changing data in RDBMS using Hadoop environment. We will deliver reliable data that is optimised for querying and further analytics.

TARGET AUDIENCE

Data Engineers who are interested in change data capturing concept.

We will work in a group of no more than 20 people.

WHAT YOU WILL LEARN

  • How to configure Debezium with Kafka Connect so that current changes made to the source table can be downloaded and saved to Apache Kafka;
  • How to download captured data from Apache Kafka topic and save it to HDFS using Apache NiFi;
  • How to set up the job which updates the content of the table in Apache Hive.

 

During the event we will discuss the whole process and what can possibly happen when we change conditions in our environment.

TIME BOX

The tutorial will last for 4 full hours. There will be coffee breaks during the training.

AGENDA

Session #1 – Track changes on source table – Kafka, Kafka Connect, Debezium

  • What we are doing here?
  • Configure Kafka Connect with Debezium
  • Troubleshooting
  • Discuss possible improvements

Session #2 – Write data to HDFS and update Hive table – Kafka, NiFi, HDFS, Hive

  • Download data from Kafka to HDFS using NiFi
  • Update table data on Hive
  • Troubleshooting
  • Discuss possible improvements

Tutorial conducted by:

Bartosz Kotwica

Data Engineer, GetInData

-