Workshops 2020 | BigData Technology

**Workshops*: 26 or 28th of February – date of your choice upon registration**

Time:

9 am – 5 pm

Place:

Golden Floor a Conference & Workshops Center, Aleje Jerozolimskie 123A, 02-017 Warsaw.

*We will work in a group of no more than 20 people.

Developing production-ready Spark application

26 Feb. – 15 th floor, room no.2

28 Feb. – 15 th floor, room no.5

DESCRIPTION

During this workshop we will create fully functioning, production-ready Spark application using day-to-day tools like Scala, sbt or Intellij.

Workshop’s targeted audience are semi-professionals with little or more background with programming. We will provide necessary project setup and introduction to Scala language and required tools for building the application. Previous Scala knowledge is not mandatory, merely general IT skills.

REQUIREMENTS

We use Scala as the main programming language for this course. A basic understanding of the language is recommended or other programming languages – Python, Java.
It would be beneficial to have some knowledge of Spark SQL, Datasets, and Dataframes – it’s not an introduction to Apache Spark.
Laptop with pre-installed JDK8 and IntelliJ IDEA (https://www.jetbrains.com/idea/) with Scala Plugin.
We will provide git repository 1-2 weeks before the scheduled training.

AGENDA

Session #1 Introduction to Scala and Spark. Present workshops goals.

– brief introduction to Scala programming,

– discuss workshops’ project structure

– present e2e setup for testing processing logic

Session #2 Write application code to process JSON data from HDFS to HIVE with Spark

– implement input data processor and formatting,

– apply custom transformations to the data,

– tune processing logic and performance

Session #3 Implement testing logic to validate processing

– run and test application code,

– exercise testing skills

Session #4 Wrap up

– quick overview,

– discuss deplyoment and maintanence of Spark Jobs

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price)

We will work in a group of no more than 20 people.

Workshop trainers:

Paweł Kubit

Data Engineer, GetInData

Patrycjusz Sienkiewicz

Data Engineer, GetInData

Real-Time stream processing

26 Feb. – 15 th floor, room no. 4

28 Feb. – 15 th floor, room no. 6

DESCRIPTION

In this one day workshop you will learn how to process unbounded streams of data in real-time using popular open-source frameworks. We focus mostly on Apache Flink and Apache Kafka – the most promising open-source stream processing framework that is more and more frequently used in production.

During the course we simulate real-world end-to-end scenario – processing logs generated by users interacting with a mobile application in real-time. The technologies that we use include Kafka, Flink, HDFS and YARN. All exercises will be done on the remote multi-node clusters.

TARGET AUDIENCE

Data engineers who are interested in leveraging large-scale and distributed tools to process streams of data in real-time.

REQUIREMENTS

Experience with programming in Java or Scala
Basic familiarity with Big Data tools (HDFS, YARN)
Working computer (rather personal than company)
Possibility to log in to any machines via SSH port (here corporate rules can mix a lot)
Installed on your machine:
- Java JDK >= 1.8
- IDE – preferrably Intellij, but Eclipse is also fine
- Maven
- SSH client (eg. Putty for Windows)
- git
- SwitchyOmega plugin in web browser
  - chrome – https://chrome.google.com/webstore/detail/proxy-switchyomega/padekgcemlokbadohgkifijomclgjgif
  - firefox – https://addons.mozilla.org/en-US/firefox/addon/switchyomega/

PARTICIPANT'S ROI

Concise and practical knowledge of applying stream processing to solve business problems.
Hands-on coding experience under supervision of experience Flink engineers.
Tips about real world applications and best practices.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Hadoop cluster. If you want to redo exercises later on your own you can use virtual machine (eg. Hortonworks Sandbox or Cloudera Quickstart that can be downloaded from each vendor’s site)

TIME BOX

The workshop will last for 8 full hours, so you should reserve yourself a full 1 day. Of course there will be coffee and lunch breaks during the training.

We will work in a group of no more than 20 people.

AGENDA

8.45 - 9.15

Coffee and socializing

9.15 - 10.15

Session #1 - Introduction to Apache Kafka + hands-on exercises

10.15 - 10.30

Coffee break

10.30 - 12.30

Session #2 - Apache Flink

Introduction and key concepts
Basic Flink API
Hands-on exercises

12.30 - 13.30

Lunch

13.30 - 15.00

Session #3 - Flink cont.

Time & Windows
Integration with Kafka
Hands-on exercises

15.00 - 15.15

Coffee break

15.15 - 16.45

Session #4 - Flink c.d.

Stateful operations
Best practices
Daemons and cluster infrastructure
Hands-on exercises

16.45 - 17.00

Coffee break

17.00 - 17.30

Session #5 - Summary and comparison with other stream processing engines

Stateful operations
Best practices
Daemons and cluster infrastructure
Hands-on exercises

Keywords: Kafka, Flink, Real Time Processing, Low Latency Stream Processing,

Workshop trainer:

Grzegorz Kołakowski

Data Engineer, GetInData

Krzysztof Zarzycki

Big Data Architect, GetInData

Krzysztof Zarzycki

Big Data Architect, CTO and Co-founder, GetInData

Big Data on Kubernetes

26 Feb. – 15 th floor, room no. 5

28 Feb. – 15 th floor, room no. 8

DESCRIPTION

This one day workshop teaches participants how to use Kubernetes in AWS and run different Big Data tools on top of it.

During the course we simulate real-world architecture – data processing real-time pipeline: reading data from web applications, processing it and storing results to distributed storage.
The technologies that we will be using include Kafka, Spark and S3.

All exercises will be done on the remote Kubernetes clusters.

TARGET AUDIENCE

Engineers who are interested in Big Data and Kubernetes.

REQUIREMENTS

Some experience with Docker and programming
Working computer (rather personal than company)
Working SSH client (for Windows it may be PuTTy)
Possibility to log in to any machines via SSH port (here corporate rules can mix a lot)

PARTICIPANT'S ROI

Concise and practical knowledge of using Kubernetes
Hands-on experience on simulated real-life use-cases
Tips about real world applications and best practices from experienced professionals.

TRAINING MATERIALS

All participants will get training materials in the form of PDF files containing slides with theory and exercise manual with the detailed description of all exercises. During the workshop exercises will be done on remote Kubernetes cluster. If you want to redo exercises later on your own you can use minikube.

TIME BOX

This is one-day event, there will be coffee breaks and one-hour lunch break (included in price).

We will work in a group of no more than 20 people.

AGENDA

Session 1 – Introduction to Kubernetes

Docker recap
Basic Kubernetes concepts and architecture
Hands-on exercise: connecting to Kubernetes cluster

Session 2 – Helm

Introduction to Helm
Hands-on exercise: deploying Helm app

Session 3 – Apache Kafka

Running Apache Kafka on Kubernetes
Using Kafka Connect to migrate data from Kafka to S3
Leverage Kafka REST in your web application
Hands-on exercise: deploying data pipeline on Kubernetes

Session 4 – Apache Spark

Spark as streaming processing engine
Deploying Spark on Kubernetes
Hands-on exercise: Real-time data aggregation using Spark Streaming

Keywords: Kubernetes, Docker, Helm, Kafka, Spark

Workshop trainer:

Maciej Bryński

Big Data Architect, DXC Technology

EDITION 2020

Workshops*: 26 or 28th of February – date of your choice upon registration

Time:

9 am – 5 pm

Place:

Golden Floor a Conference & Workshops Center, Aleje Jerozolimskie 123A, 02-017 Warsaw.

*We will work in a group of no more than 20 people.

Developing production-ready Spark application

26 Feb. – 15 th floor, room no.2

28 Feb. – 15 th floor, room no.5

DESCRIPTION

REQUIREMENTS

AGENDA

TIME BOX

Real-Time stream processing

26 Feb. – 15 th floor, room no. 4

28 Feb. – 15 th floor, room no. 6

DESCRIPTION

TARGET AUDIENCE

REQUIREMENTS

PARTICIPANT'S ROI

TRAINING MATERIALS

TIME BOX

AGENDA

Big Data on Kubernetes

26 Feb. – 15 th floor, room no. 5

28 Feb. – 15 th floor, room no. 8

DESCRIPTION

TARGET AUDIENCE

REQUIREMENTS

PARTICIPANT'S ROI

TRAINING MATERIALS

TIME BOX

AGENDA

**Workshops*: 26 or 28th of February – date of your choice upon registration**