AGENDA IN 2017 EDITION:
8.00 - 9.00
Registration and coffee
9.00 - 9.15
Conference opening

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Adam Kawa
Data Engineer and Founder, GetInData

Adam Kawa
CEO and Co-founder, GetInData
9.15 - 9.45
The data horizon 2017: Vision of Google Team
How cloud can address big data tasks: practical application of big data analytics and machine learning inside and outside Google. Ours vision of Cloud.

Magdalena Dziewguć

Michał Sapiński
Software engineer, Google
9.45 - 10.15
Meta-Experimentation at Etsy
Experimentation abounds, but how do we test our tests? I’ll share some ways we at Etsy proved our experimentation methods broken, and the approach we took to fixing them. I’ll discuss multiple ways of running A/A tests (as opposed to A/B tests), and a statistical method called bootstrapping, which we used to remedy our experiment analysis.

Emily Sommer
Software Engineer, Etsy
10.15 - 10.45
Managing the Margins: Big Data case study - Prescriptive Analysis for Semiconductor Manufacturing
The semiconductor industry is the backbone of the digital age. Sector innovations drive the ability to do more on ever smaller machines, but perhaps equally important is the ability to optimize the manufacturing processes. For example, in the digital printing of semiconductor components, 1 in a billion failure rate for droplets may sound like an acceptable rate. This is less so when you consider that up to 50 million droplets can be pushed per second, leading to an unacceptable defect rate of one every 20 seconds. Pre-emptive analytics on streaming sensor and image data play a key role in finding indications of where and when defects are looming. This presentation will focus on an industry use case for combining SAS and open source analytics to tackle these essential big data challenges, and will also provide some insights on applications in other sectors.

Sascha Schubert
Advisory Business Solutions Manager, Global Technology Practice, SAS Institute
10.45 - 11.15
Coffee break
Simultaneous sessions
Operations & Deployment
This track is dedicated to system administrators and people with DevOps skills who are interested in technologies and best practices for planning, installing, managing and securing their Big Data infrastructure in enterprise environments – both on-premise and the cloud.
Data Application Development
This track is the place for developers to learn about tools, techniques and innovative solutions to collect and process large volumes of data. It covers topics like data ingestion, ETL, process scheduling, metadata and schema management, distributed datastores and more.
Analytics & Data Science
This track includes real case-studies demonstrating how Big Data is used to address a wide range of business problems. You can find here talks about large-scale Machine Learning, A/B tests, visualizing data as well as various analysis that enable making data-driven decisions and feed personalized features of data-driven products.
Real-Time Processing
This track covers technologies, strategies and use-cases for real-time data ingestion and deriving real-time actionable insights from the flow of events coming from sensors, devices, users, and front-end systems.
Session chairs

Piotr Bednarek
Administrator Hadoop, GetInData

Piotr Bednarek
Administrator Hadoop, GetInData

Piotr Krewski
Data Consultant and Co-founder, GetInData

Piotr Krewski
Big Data Consultant and Co-founder, GetInData

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Klaudia Zduńczyk
Business Development Specialist, GetInData
11.15 - 11.45
That won’t fit into RAM
SentiOne is one of the leading solutions in Europe for social media listening and analysis. We monitor over 26 European markets including CEE, Scandinavia,

Michał Brzezicki
Wiceprezes Zarządu, SentiOne
11.15 - 11.45
Enabling 'Log Everything' at Skyscanner
Skyscanner is a leading global travel search site offering a comprehensive and free flight search service as well as online

Robin Tweedie
Senior Software Engineer, SkyScanner

Arthur Vivian
Software Engineer, SkyScanner
11.15 - 11.45
Alchemists 2.0: Turning data into gold
How to bring money to the table with Data Science. Practical examples of

Paweł Godula
Senior data Scientist, BCG Gamma
11.15 - 11.45
Real-Time Data Processing at RTB House – Architecture & Lessons Learned
Our platform, which purchases and runs advertisements in the Real-Time Bidding model, processes 250K bid
We will preent: Our first approach to the problem (end-of-day batch jobs) and final solution (real-time stream processing) 2. detailed description of the current architecture 3. how we had tested new data flow before it was deployed and in which way it is being monitored now 4. our one-click deployment process 5. decisions which we made with its advantages and disadvantages and our future plans to improve our current solution.
We would like to share our experience connected with scaling solution over clusters of computers in several data centers. We will focus on the current architecture but also on testing and monitoring issues with our deployment process. Finally, we would like to provide an overview of engaged projects like Kafka, Mirrormaker, Storm, Aerospike, Flume, Docker etc. We will describe what we have achieved from given open source and some problems we have come across.

Bartosz Łoś
Tech Lead, RTB House
11.45 - 11.50
Technical break
11.50 - 12.20
Scalable Analytics for Microservices Architecture
Avito is the third biggest classified site in the world after Craigslit and 58.com from China. Avito nowadays
The introduction of microservice architecture in Avito spawned hundreds of new services. In this situation is is critical to implement common BI infrastructure, able to collect, process, combine and analyse data from all those microservices and persistent to constant changes.
Avito Analytics is based on HP Vertica MPP database, highly normalized data lake and an asynchronous event bus. Those tools give Avito the ability to use all types of Machine Learning and Reporting tools, manage sites, applications and microservices.
Avito is the Russian OLX. Moreover, nowadays, Avito and OLX are both part of the Naspers group, we do the same business in different counties and share experience.

Nikolay Golov
Chief Data Warehousing Architect, Avito
11.50 - 12.20
DataOps or how I learned to love production
A plethora of data processing tools, most of them open source, is available to us. But who actually runs data

Michael Hausenblas
Developer Advocate, Mesosphere
11.50 - 12.20
SAS Viya – the fundamentals of analytics architecture of the future
Since the inception of modern analytical platforms, companies have been trying to out-smart each other to perform analytics faster than ever.

Muhammad Asif Abbasi
Principal Business Solutions Manager, SAS Institute
11.50 - 12.20
Streaming analytics better than batch - when and why
While a lot of problems can be solved in batch, the stream processing approach currently gives you more benefits. And it’s not only sub-second latency at scale. But mainly possibility…

Adam Kawa
Data Engineer and Founder, GetInData

Adam Kawa
CEO and Co-founder, GetInData

Krzysztof Zarzycki
Big Data Architect, GetInData

Krzysztof Zarzycki
Big Data Architect, CTO and Co-founder, GetInData

Dawid Wysakowicz
Software Engineer, Ververica (former data Artisans)
12.20 - 12.25
Technical break
12.25 - 12.55
Creating effective, scalable and easily manageable environment for real-time big data processing and analytics
Creating effective, scalable and easily manageable environment for big data processing is a challenge which touches multiple domains. Business ideas, data science, analytic algorithms and analytic
software tools as well as scalable infrastructure which has to fit specific use case and be open for dynamic changes. Cisco and Alterdata understands together all stages of this process and are able to guide companies through this journey.
During the session we will describe use case of real-time big data analytics related to location tracking and how it leverages automated and scalable Cisco platform.
How to effectively use C-store DBMS analytics platform as well as Cisco Validated Design for Big Data architecture which combines tools as Cisco UCS (Unified Computing System), Cisco ACI (Application Centric Infrastructure) and UCS Director for Big Data which provides a single-touch solution that automates Hadoop deployment and provides a single management pane across both physical infrastructure and Hadoop software.

Krzysztof Baczyński
Cisco Big Data Lead for Poland, Cisco

Kamil Ciukszo
Founder and CEO, Alterdata
12.25 - 12.55
One System One Architecture Many Applications
AB Initio software is a general-purpose data processing and metadata management platform. It has a single architecture
for processing Hadoop, files, database tables, message queues (kafka,jms, etc), web services, and metadata. This architecture enables virtually any technical or business rule to be graphically defined, shared, and executed in timely manner. It is a true Big Data architecture whereby it processes data in parallel across multiple processors, even processors on different servers such as Hadoop. It can run the same rules in batch and real-time, and within a service-oriented architecture. It is fully production ready and supports distributed checkpoint restart with application monitoring and alerting. And it enables end-to-end metadata to be collected, versioned, and analysed by nontechnical users.
AB Initio delivers a rich set of software products that work together in a way that makes it easy to rapidly develop big data systems. The building block of these systems is the AB Initio graph, which combines AB Initio processing components, third-party programs, and any necessary custom codes into a high-performance parallel and distributed application.

Firat Tekiner
Data Scientist and Big Data Architect, AB Initio
12.25 - 12.55
Anomaly detection made easy
Imagine such situation: you have deployed a service to production and everything seems to work. After some time your phone rings and an analyst says ‘Could you
@Allegro we found a solution for this use-case. I am going to tell you how we managed to detect anomalies (heavy web traffic after successful commercial, or fall of search events, or no clicks on Ad).
We tested all available solutions (Twitter detector, HTM algorithms) and came to conclusion that all machine learning models are too complicated. We didn’t understand them. We created our own simple model. I will show you how we moved from promising idea in R language to final working solution in Scala.
If you like buzzwords these might be for you: #Machine Learning, #Scala, #R, #Statistics, #Simplicity, #Real-time processing

Piotr Guzik
Software Engineer, Grupa Allegro
12.25 - 12.55
Stream Analytics with SQL on Apache Flink
SQL is undoubtedly the most widely used language for data analytics for many good reasons. It is declarative,
Apache Flink is a distributed stream processing system with very good support for streaming analytics. Flink features two relational APIs, the Table API and SQL. The Table API is a language-integrated relational API with stream-specific features. Flink’s SQL interface implements the plain SQL standard. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite.
In this talk we present the future of Apache Flink’s relational APIs for stream analytics, discuss their conceptual model, and showcase their usage. The central concept of these APIs are dynamic tables. We explain how streams are converted into dynamic tables and vice versa without losing information due to the stream-table duality. Relational queries on dynamic tables behave similar to materialized view definitions and produce new dynamic tables. We show how dynamic tables are converted back into changelog streams or are written as materialized views to external systems, such as Apache Kafka or Apache Cassandra, and are updated in place with low latency. We conclude our talk demonstrating the power and expressiveness of Flink’s relational APIs by presenting how common stream analytics use cases can be realized.

Fabian Hueske
Software Engineer, Ververica
12.55 - 13.50
Lunch
Operations & Deployment
_
Data Application Development
Analytics & Data Science
_
Real-Time Processing
_
Session chairs

Piotr Bednarek
Administrator Hadoop, GetInData

Piotr Bednarek
Administrator Hadoop, GetInData

Piotr Krewski
Data Consultant and Co-founder, GetInData

Piotr Krewski
Big Data Consultant and Co-founder, GetInData

Klaudia Zduńczyk
Business Development Specialist, GetInData

Dawid Wysakowicz
Software Engineer, Ververica (former data Artisans)
13.50 - 14.20
Creating Redundancy for Big Hadoop Clusters is Hard
Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and

Stuart Pook,
Senior DevOps Engineer, Criteo
13.50 - 14.20
2 Use Cases from Sky Bet
Sky Bet is one of the largest UK online bookmakers and introduced a Hadoop platform 4 years ago. This session

Mark Pybus
Head of Data Engineering, Sky Betting & Gaming
13.50 - 14.20
H2O Deep Water - Making Deep Learning Accessible to Everyone
Deep Water is H2O’s integration with multiple open source deep learning libraries such as TensorFlow, MXNet

Jo-fai Chow
Data Scientist, H2O.ai
13.50 - 14.20
RealTime AdTech reporting & targeting with Apache Apex
AdTech companies need to address data increase at breakneck speed along with customer demands of insights &

Ashish Tadose
Senior Data Architect, PubMatic
14.20 - 14.25
Technical break
14.25 - 14.55
Spotify’s Event Delivery
Spotify is currently one of the most popular music streaming services in the world with over 100 million monthly
Our event delivery system is a key component in our data infrastructure, that delivers billions of events per day with predictable latency and well defined interface for our developers. This data is used to produce Discover Weekly, Spotify Party, Year in music and many more Spotify features. In this talk will be about focus on the evolution of the event delivery service, the lessons learned and present the design of our new system based on Google Cloud Platform technologies.

Nelson Arapé
Backend Developer, Spotify
14.25 - 14.55
Data Engineering in Facebook Ads teams
Facebook serves ads from over 4 million advertisers to more than a billion people each day. Every day we face the challenge of building the best products
to such a large user base. In order to focus the right ones, we have to make well informed decisions, which we can prove with data. This is why using making information easily accessible and understandable is crucial for success of the whole team. This talk provides an overview of how Facebook uses data to run the Ads products teams. We will discuss embedding Data Engineers work within engineering teams, their impact on the product, have a look at techniques which help with standardization and organization of metrics to manage the complexity of data in a scalable way.

Paweł Koperek
Data Engineer, Facebook
14.25 - 14.55
One Jupyter to rule them all
If you tell your colleagues you develop Hadoop applications, they probably find you a geek that knows Java,

Mariusz Strzelecki
Senior Data Engineer, Allegro Group
14.25 - 14.55
ING CoreIntel - collect and process network logs across data centers in near realtime
Security is at the core of every bank activity. ING set an ambitious goal to have an insight into the overall network data activity. The purpose is
Since the inception of the CoreIntel project we knew we were going to face the challenges of capturing, storing and processing vast amount of data of a various type from all over the world. In our session we would like to share our experience in building scalable, distributed system architecture based on Kafka, Spark Streaming, Hadoop and Elasticsearch to help us achieving these goals.
Why choosing good data format matters? How to manage kafka offsets? Why dealing with Elasticsearch is a love-hate relationship for us or how we just managed to put it all these pieces together.

Krzysztof Żmij
Expert IT / Hadoop, ING Services Poland

Krzysztof Adamski
Team Lead w zespole Hadoop w ING Services Polska, Konsultant w GetInData

Krzysztof Adamski
Data Infrastructure Architect, ING
14.55 - 15.00
Technical break
15.00 - 15.30
Key challenges in building large distributed full-text search systems based on Apache Solr and Elasticsearch
There are large distributed search platforms based on the most popular two search engines: Apache Solr and

Tomasz Sobczak
Senior Consultant & Talent Manager, Findwise

Tomasz Sobczak
Senior Consultant, Findwise
15.00 - 15.30
Orchestrating Big Data pipelines @ Fandom
Fandom is the largest entertainment fan site in the world. With more than 360,000 fan communities and a global

Krystian Mistrzak
Data Engineer, Fandom Powered by WIkia

Thejas Murthy
Data Engineer, Fandom Powered by WIkia
15.00 - 15.30
Big data in genomics
Genomic population studies incorporates storing, analyzing and interpretation of various kinds of

Marek Wiewiórka
Big Data Architect, GetInData

Marek Wiewiórka
Big Data Architect, GetInData
15.00 - 15.30
Hopsworks: Secure Streaming-as-a-Service with Kafka/Flink/Spark
Since June 2016, Kafak, Spark and Flink-as-a-service have been available to researchers and companies in

Theofilos Kakantousis
Co-founder, Logical Clocks AB
15.30 - 15.55
Coffee break
ROUNDTABLE SESSIONS
15.55 - 16.00
Intro
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable disussion – they are selected professionals with a vast knowledge and experience.
16.00 - 16.45
Round I
- Best tools for alerting and monitoring of the clusters

Tomasz Sujkowski
Big Data Administrator, Agora SA
2. Machine Learning and Big Data: perfect solution for all problems?

Andrzej Dydyński
Data Scientist, Samsung
3. Latest advances in machine learning and their impact on our industries
During the discussion we will focus on the latest advancements in machine learning – mostly in the area of artificial neural networks – and their impact on the landscape of industries, tools and IT professions. Should we expect another AI ice age or this time is different and we are on the good way to solving intelligence? https://www.youtube.com/watch?v=aygSMgK3BEM

Michał Sapiński
Software engineer, Google
4. Effective tools and environment for data scientists

Artur Maliszewski
Head of Business Intelligence, Currency One
5. How to hire data scientists?

Przemysław Biecek
Co-founder, SmarterPoland.pl
6. Major challenges in project based on Hadoop environment (lack of measureable results, staffing problems, high cost of keeping up to date source code, the necessity to deal with many different and fast changing technologies). Data Governance in BigData.

Konrad Hoszowski
Technical Account Manager, AB Initio

Firat Tekiner
Data Scientist and Big Data Architect, AB Initio
7. Building an EDW using the Big Data technologies – challenges and opportunities
How to successfully build an EDW using the Big Data technologies stack. Adapting the EDW methodologies, techniques and best practices (Kimball, Inmon, Data Vault, Anchor, Hub and Spoke) to the Big Data realities. How to plan the program, build the team, choose technologies, infrastructure (cloud vs on prem), model and process the data, etc.

Marcin Choiński
Head of Big Data & Analytics Ecosystem, TVN
8. Being efficient data engineer. Tools, ecosystem, skills, ways of learning
Big Data Engineer is quite a new profession. Yet, Bi Data ecosystem is big and it is growing rapidly and changing fast. There are a lot of frameworks, tools which are suposed to make us efficient. Some of them can help, while others are obsolete. There are specific use-cases when we should apply different tools and approach. I would like to talk about usage of common frameworks like Spark, Kafka, Hadoop, Camus, Oozie, AirBnBWorkflow and others in order to make our life easier. We can discuss typical issues that occur in daily work, and the way we handled them @Allegro. We might also talk about different ways of learning Big Data technologies. To sum up, questions that should be asked in this table are: Where can we find good learning materials? How can we improve ? What skills do we need to succeed ? How to write custom tools, and is it worth the effort ?

Piotr Guzik
Software Engineer, Grupa Allegro
9. How to overcome challenges which you can expect while designing and managing environment, both software and hardware, for big data analytics.

Krzysztof Baczyński
Cisco Big Data Lead for Poland, Cisco

Kamil Ciukszo
Founder and CEO, Alterdata
10. Real-time stream processing frameworks – available technologies, their pros & cons, deployment techniques, interesting features.

Fabian Hueske
Software Engineer, Ververica
11. Beyond pre-computed answers – interactive, sub-second OLAP queries with Druid / Kylin
BigData stands for volume, velocity and last, but not the least, variety. Variety in data translates to variety of business use-cases and questions we may want to ask about it. One of major challenges in modern data engineering, is how to produce systems, which not only satisfy the need of our businesses today, but are also capable of keeping up with the ever increasing pace of evolving business requirements – at a palatable cost. One emerging segment of BigData technologies enabling us to build such systems, are distributed OLAP engines such as Druid and Kylin. Let’s chat about: ideal and not-so-ideal use-cases, success and failure stories, operational trade-offs and issues, scaling and optimising, getting data in (also in real time) and out (quickly), our experiences and ideas. Let’s share and learn from each other.

Piotr Turek
Big Data Software Architect, DreamLab
16.45 - 17.30
Round II
- Enterprise requirements for clusters: security, audit, encryption, backups *

Artur Szymański
Hadoop Administrator, Vodafone
2. Machine Learning and Big Data: perfect solution for all problems?

Andrzej Dydyński
Data Scientist, Samsung
3. Latest advances in machine learning and their impact on our industries
During the discussion we will focus on the latest advancements in machine learning – mostly in the area of artificial neural networks – and their impact on the landscape of industries, tools and IT professions. Should we expect another AI ice age or this time is different and we are on the good way to solving intelligence? https://www.youtube.com/watch?v=aygSMgK3BEM

Michał Sapiński
Software engineer, Google
4. Effective tools and environment for data scientists

Artur Maliszewski
Head of Business Intelligence, Currency One
5. Expensive mistakes to avoid when building a data platform
6. Release process when deploying production data applications

Paweł Cejrowski
Big Data Engineer, Grupa Wirtualna Polska
7. On-click deployment – how to automate the platform properly and efficiently

Piotr Bednarek
Administrator Hadoop, GetInData

Piotr Bednarek
Administrator Hadoop, GetInData
8. Being efficient data engineer. Tools, ecosystem, skills, ways of learning
Big Data Engineer is quite a new profession. Yet, Bi Data ecosystem is big and it is growing rapidly and changing fast. There are a lot of frameworks, tools which are suposed to make us efficient. Some of them can help, while others are obsolete. There are specific use-cases when we should apply different tools and approach. I would like to talk about usage of common frameworks like Spark, Kafka, Hadoop, Camus, Oozie, AirBnBWorkflow and others in order to make our life easier. We can discuss typical issues that occur in daily work, and the way we handled them @Allegro. We might also talk about different ways of learning Big Data technologies. To sum up, questions that should be asked in this table are: Where can we find good learning materials? How can we improve ? What skills do we need to succeed ? How to write custom tools, and is it worth the effort ?

Piotr Guzik
Software Engineer, Grupa Allegro
9. Fast SQL solutions for Hadoop
Hadoop was developed as a batch processing solution but it quickly became important also for data scientists and analysts. There are plenty products that give you opportunity to do fast ad-hoc analysis on big data like Spark, Impala, Presto or Drill, to mention just few of them. In this session we will share our experience with various “SQL on Hadoop” solutions, hear some success stories and also discuss common pitfalls.

Jakub Pieprzyk
Data Science Developer, RyanAir
10. Data visualisation – why, how and when?

Przemysław Biecek
Co-founder, SmarterPoland.pl
11. Large-scale data collection and ingestion – Kylo and other projects (Gobblin, Nifi, Kafka Connect, Camus)

Tomasz Domański
Senior Data Engineer, ThinkBig (a Teradata company)
* the table will be hosted in Polish language only
17.30 - 17.45
Coffee break
17.45 - 18.15
Panel discussion - BigData implementations – how to make justified ROI
BigData brings a lot of promises about potential benefits, but life proves it’s not always so easy. How to make BigData projects great? How to get quick-wins? How to avoid expensive mistakes? How to communicate with the others – business side or a client – to make it a viable project? What are the major success factors and where are the easily to be missed out obstacles that can derail Big Data projects?
Hosts:

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Krzysztof Zarzycki
Big Data Architect, GetInData

Krzysztof Zarzycki
Big Data Architect, CTO and Co-founder, GetInData

Grzegorz Bartler
Head of Business Intelligence Departament, Polkomtel, Cyfrowy Polsat

dr hab. Piotr Gawrysiak
Chief Data Scientist, mBank S.A.

Derek Yeung
Head of Platform Engineering, Nordea

Olaf Piotrowski
Head of Data and Machine Learning, Allegro
18.15 - 18.30
Closing & Summary

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Przemysław Gamdzyk
CEO & Meeting Designer, Evention

Adam Kawa
Data Engineer and Founder, GetInData

Adam Kawa
CEO and Co-founder, GetInData