AGENDA 2018
Changes in the order of presentation might occur
8.00 - 9.00 -
Registration, coffee and networking session
During registration, we cordially invite you to participate in a networking session, which aims to get to know each other and exchange experiences
9.00 - 9.15
Conference opening
Przemysław Gamdzyk
CEO & Meeting Designer, Evention
Przemysław Gamdzyk
CEO & Meeting Designer, Evention
Adam Kawa
Data Engineer and Founder, GetInData
Adam Kawa
CEO and Co-founder, GetInData
9.15 – 10.45 Plenary session
9.15 - 9.45
Transforming our relationship with clients by AI
Bolke de Bruin
Head of Advanced Analytics Technology, ING
9.45 - 10.15
Never Underestimate the Power of a Single Node
Recent developments in GPU hardware and storage technology have changed how we do data analysis and machine learning. These technologies on a single node have grown many folds in the last five years while the growth in network speed has lagged behind. I will talk about the overall ML lifecycle and challenges we face in doing ML at scale, from protecting your Uber accounts to making self driving cars a reality. Then I want to focus on an important part of ML lifecycle which is data/ML exploration and experimentation. In large companies like Uber, data scientists are inclined to use shared Hadoop infra for all their needs. For data exploration, this is inefficient for the user and also makes the cluster run slow. I will talk about our new solution to tackle this problem by using a high powered node that lets us to work with 100s of GB to few TBs of data interactively without paying the overhead of a distributed system. I will also talk about some of the interesting machine learning and infrastructure problems that I face in my new role in Uber’s self driving team.
Karthik Ramasamy
Machine Learning Engineer, Google
10.15 - 10.45
Assisting millions of active users in real-time
Nowadays many companies become data rich and intensive. They have millions of users generating billions of interactions and events per day.
These massive streams of complex events can be processed and reacted upon to e.g. offer new products, next best actions, communicate to users or detect frauds, and quicker we can do it, the higher value we can generate.
In this talk we will present, how in joint development with our client and in just few months effort we have built from ground up a complex event processing platform for their intensive data streams. We will share how the system runs marketing campaigns or detect frauds by following behavior of millions users in real-time and reacting on it instantly. The platform designed and built with Big Data technologies to infinitely and cost-effectively scale already ingests and processes billions of messages or terabytes of data per day on a still small cluster. We will share how we leveraged the current best of breed open-source projects including Apache Flink, Apache Nifi and Apache Kafka, but also what interesting problems we needed to solve. Finally, we will share where we’re heading next, what next use cases we’re going to implement and how.
Alexey Brodovshuk
Software Development Supervisor, Kcell
Krzysztof Zarzycki
Big Data Architect, GetInData
Krzysztof Zarzycki
Big Data Architect, CTO and Co-founder, GetInData
10.45 - 11.15
Coffee break
11.15 – 15.30 Simultaneous sessions
GALAXY I, 1st floor
GALAXY II, 1st floor
GALAXY III, 1st floor
CARAVELE, ground floor
Architecture, Operations & Deployment
This track is dedicated to system architects, administrators and people with DevOps skills who are interested in technologies and best practices for planning, building, installing, managing and securing their Big Data infrastructure in enterprise environments – both on-premise and the cloud.
Data Engineering
This track is the place for developers to learn about tools, techniques and innovative solutions to collect, store and process large volumes of data. It covers topics like data ingestion, ETL, distributed engineers, process scheduling, metadata and schema management, distributed datastores and more.
Analytics & Data Science
This track includes real case-studies demonstrating how Big Data is used to address a wide range of business problems. You can find here talks about large-scale Machine Learning, A/B tests, visualizing data as well as various analysis that enable making data-driven decisions and feed personalized features of data-driven products.
Real-Time Analytics
This track covers technologies, strategies and use-cases for real-time data ingestion and deriving real-time actionable insights from the flow of events coming from sensors, devices, users, and front-end systems.
Host:
Piotr Bednarek, GetInData
Host:
Łukasz Suchenek, Evention
Host:
Klaudia Zduńczyk, GetInData
Host:
Dawid Wysakowicz, GetInData
11.15 - 11.45
Edge to Enterprise analytics platform – a case study
During the presentation you will learn about real life scenarios of usage Edge to Enterprise analytics platform, and how it simplifies implementation and maintenance of all needed components.
The platform enables a cohesive guidance layer to align the expertise of partners/providers of sensors, applications, data analysis, security, services oversight and to get them and their assets all operating effectively together, to promote timely decisions supporting revenue generation and cost control. This architecture enables multiple use cases such as Energy and utilities, Connected automobile, Smart manufacturing and many more. Some of them will be covered during this presentation.
Ernst Kratky
Big Data Analytics & AI Sales Lead – Datacenter EMEA, Cisco
Michał Kudelski
Senior Business Solutions Manager, SAS Institute
11.15 - 11.45
Building a Modern Data Pipeline: Lessons Learned
Adform is one of the biggest European ad-tech companies – for example, our RTB engine at peak handles ~1m requests per second, each in under 100 ms, producing ~20TB of data daily.
Keywords: stream processing, kafka, event sourcing, big data
Saulius Valatka
Technical Lead, Adform
11.15 - 11.45
Executing the Data 180, moving from explaining surprises to predicting the future
The explosion in data and data technologies in the last decade has opened an opportunity for traditional enterprises to exploit their legacy. Nordea bank can trace its history back over 100 years.
Alasdair Anderson
Executive Vice President, Nordea
11.15 - 11.45
Apache Flink: Better, Faster & Uncut
This talk will start with brief introduction to streaming processing and Flink itself. Next, we will take a look at some of the most interesting recent improvements in Flink such as incremental checkpointing,
Keywords: Apache Flink, streaming, data processing engine
Piotr Nowojski
Software Engineer, data Artisans
11.45 - 11.50
Technical break
11.50 - 12.20
Data Fabric Bridging On-Premise and Cloud
Ab Initio’s approach is to provide consistent set of capabilities and applications which can span across diverse systems that are hosted either on-premise and off-premise. In other words, Ab Initio acts as the data fabric which simplifies and tightly integrates data movement across systems.
.. and all this presented based on out real-world customers’ cases from the banking and media sector.
Firat Tekiner
Data Scientist and Big Data Architect, AB Initio
11.50 - 12.20
Time Series Jobs Scheduling at Criteo With Cuttle
At Criteo we run something like 300k jobs, processing around 4PB of logs to produce trillions of new records each day. We do that using several frameworks such as Hive, raw Map/Reduce, Scalding or Spark.
Keywords: workflow, scheduling, hadoop, scala
Guillaume Bort
Technical lead, Data Realiability Engineering, Criteo
11.50 - 12.20
7 Days of Playing Minesweeper, or How to Shut Down Whistleblower Defense with Analytics
The next time you find yourself thinking there isn’t enough time in a week, consider what Drinker Biddle did for their client in 7 days.
Keywords: machine learning, analytics, workflow
Elise Tropiano
Senior Technical Product Manager, Relativity
11.50 - 12.20
Thinking in Data Flows
In this presentation we’ll look at how far one can push the notion of batch = streaming, how processor-oriented architectures like Apache NiFi and Apache Streams work
Keywords: streaming, data flow, NiFi, Streams
Joey Frazee
Solutions Engineer, Hortonworks
Steve Blackmon
VP Technology, People Pattern
12.20 - 12.25
Technical break
12.25 - 12.55
Elephants in the cloud or how to become cloud ready
The way you operate your Big Data environment is not going to be the same anymore. This session is based on our experience managing on-premise environments
Keywords: hadoop, private cloud, google compute platform, migration, hybrid platforms
Krzysztof Adamski
Team Lead w zespole Hadoop w ING Services Polska, Konsultant w GetInData
Krzysztof Adamski
Data Infrastructure Architect, ING
12.25 - 12.55
Privacy by Design
Privacy and personal integrity has become a focus topic, due to the upcoming GDPR deadline in May 2018 and it’s requirements for data storage, retention, and access. This talk provides an engineering perspective on privacy and highlights pitfalls and topics that require early attention.
Keywords: Privacy, GDPR, data pipelines, data engineering
Lars Albertsson
Founder & data engineering consultant, Mapflat
12.25 - 12.55
The Factorization Machines algorithm for building recommendation system
One of successful examples of data science applications in the Big Data domain are recommendation systems. The goal of my talk is to present the Factorization Machines algorithm, available in the SAS Viya platform.
Keywords: SAS Viya, Factorization Machines, recommendation system, sparse data
Paweł Łagodziński
Sr Business Solutions Manager, SAS Institute
12.25 - 12.55
Deriving Actionable Insights from High Volume Media Streams
In this talk we describe how to analyze high volumes of real-time streams of news feeds, social media, blogs in scalable and distributed way using Apache Flink
Keywords: nlp, streaming, news, machine learning
Jörn Kottmann
Senior Software Developer, Sandstone SA
Peter Thygesen
Partner & Senior Software Engineer, Paqle A/S
12.55 - 13.50
Lunch
13.50 - 14.20
Bringing Druid to production; the possibilities and pitfalls
Druid, a high-performance, column-oriented, distributed data store. This database allows you to query petabytes of columnar data in a realtime fashion.
Firstly, an introduction Druid’s architecture and the many components within the database system and their role. Secondly, the two ways (batch/realtime) of ingesting data into Druid and their pro’s and con’s. Finally, a case will be presented of using Druid into production. The focus is a cost effective implementation that allows Druid to scale using an OpenStack private cloud. The take-aways of the session are insights in when to use Druid and help you to identify and common pitfalls when running Druid in Production.
Keywords: Druid, Databases, Scale
Fokko Driesprong
Data Engineer , GoDataDriven
13.50 - 14.20
Software Engineer in the world of Machine Learning
Given the example of one of Ocado’s ML projects, called Order Forecasting, I will explain how old software engineering enables the success of ML projects.
Keywords: machine learning, software engineering, google cloud platform, user story
Przemysław Pastuszka
Machine Learning Engineer, Ocado Technology
13.50 - 14.20
Machine learning security
Despite rapid progress of tools and methods, security has been almost entirely overlooked in the mainstream machine learning. Unfortunately, even the most sophisticated and carefully crafted models can become victims of using the so-called adversarial examples.
Keywords: machine learning, security, adversarial examples
13.50 - 14.20
Near Real-Time Fraud Detection in Telecommunication Industry
In general, fraud is the common painful area in the telecom sector, and detecting fraud is like finding a needle in the haystack due to volume and velocity of data. There are 2 key factors to detect fraud:
(1). Speed: If you can’t detect in time, you’re doomed to loose because they’ve already got what they need. Simbox detection is one of the use case for this situation. Frauders use it to bypass interconnection fee. In this use case we’re talking about our real time architecture using Spark SQL to detect simbox within 5 minutes.
(2). Accuracy: Frauders changes their method all the time. But our job is finding their behaviour using machine learning algorithms accurately. Anomaly detection is one of the use case for this situation. In this use case we’re talking about data mining architecture to make fraud models using Spark ML within 1 hour. We also discuss some ML algorithm performance on Spark such as K-means, three sigma rule, T-digest and so on. In order to accomplish these factors, we processes 8-10 billion records which size is 4-5 TB every day. Our solution combines end-to-end data ingestion, processing, and mining the high volume data to detect some use cases of fraud in near real time using CDR and IPTDR to save millions, and better user experience.
Keywords: fraud detection, realtime processing, Spark SQL, Spark ML, Machine Learning Algorithms
Burak Işıklı
Software Engineer, Turkcell
14.20 - 14.25
Technical break
14.25 - 14.55
Cloud operations with streaming analytics using Apache NiFi and Apache Flink
The amount of information coming from a Cloud deployment that can be used to have a better situational awareness and operate it efficiently is huge.
Keywords: Apache Flink, Apache NiFi, Cloud monitoring, Apache Kafka
Suneel Marthi
Principal Technologist - AI/ML, Amazon Web Services
14.25 - 14.55
Big data serving with Vespa
Offline processing with big data sets can be done with tools such as Hadoop or Spark and streams of data processed with Storm. But what do you do when you need to process data at the time a user is making a request?
Keywords: Vespa, recommendations, targeting, search
Jon Bratseth
Distinguished architect, Oath (former Yahoo)
14.25 - 14.55
A/B testing powered by Big data
At Booking we have more than a million properties selling their rooms to our customers. We have approximately 1000 events per minute from them leading to total 500 GB of data for partner events alone.
In my talk I ll be talking about A/B testing at Booking, different technologies like Hadoop, Hbase, Cassandra, Kafka etc that we use to store and process large volumes of data and building up of metrics to measure the success of our experiments.
Saurabh Goyal
Backend Developer, Booking.com
14.25 - 14.55
Enhancing Spark - increase streaming capabilities of your applications
During this session we’ll discuss the pros and cons of a new structured streaming data processing model in Spark and a nifty way of enhancing Spark with SnappyData, an open-source framework providing great features for both persistent and in-motion data analysis.
Based on a real-life use case, where we designed and implemented a streaming application filtering, consuming and aggregating tons of events, we will talk the role of the persistent back-end and stream processing integration in the real-time applications in terms of performance, robustness and scalability of the solution.
Keywords : Spark, structured streaming, snappy, in-memory
Kamil Folkert
CTO, Member of the Board, 3Soft
Tomasz Mirowski
IT Architect , 3Soft
14.55 - 15.00
Technical break
15.00 - 15.30
Big Data Journey at a Big Corp
We will present the journey of Orange Polska evolving from a proprietary ecosystem towards significantly open-source ecosystem based on Hadoop and friends
Keywords: Entreprise Adoption, Hadoop integration in BI ecosystem, scaling solutions in enterprise, data teams organization
Tomasz Burzyński
Business Insights Director, Orange
Maciej Czyżowicz
Technical Leader for Analytics Stream, Orange
Maciej Czyżowicz
Architekt Korporacyjny, Orange Polska
15.00 - 15.30
Airflow as a Service
Oozie is still a popular workflow scheduler for Hadoop. It is a good choice if you like programming within XML file. Engineers at Allegro don’t.
Keywords: Workflow, Automation, Orchestration, Docker
Robert Mroczkowski
Data Platform Engineer and Technical Owner of Hadoop Cluster, Grupa Allegro
15.00 - 15.30
Data Science Lessons I have learned in 5 years
Since 2013 I have been working as Data Scientist – one of today’s hottest jobs in IT industry. During this time, I got the opportunities to experience the evolution of data science landscape — to see what worked and what didn’t.
Keywords: Data Science, Data Sciencist, teamwork, work skills
Boxun Zhang
Sr. Data Scientist, GoEuro
15.00 - 15.30
Design Patterns for Calculating User Profiles in Real Time
At mobile.de, Germany’s largest online vehicle marketplace, we calculate user profile in real-time to optimize the user journey on the e-marketplace platform by presenting relevant products to the user,
Keywords: Big Data, Stateful Stream Processing
Igor Mazor
senior data engineer, mobile.de
15.30 - 16.00
Coffee break
16.00 – 17.25 Roundtables sessions
16.00 - 16.05
Intro
Parallel roundtables discussions are the part of the conference that engage all participants. It has few purposes. First of all, participants have the opportunity to exchange their opinions and experiences about specific issue that is important to that group. Secondly, participants can meet and talk with the leader/host of the roundtable discussion – they are selected professionals with a vast knowledge and experience.
There will be 2 rounds of discussion, hence every conference participants can take part in 2 discussions
16.05 – 16.45 1st round
16.50 – 17.25 2nd round
16.05 - 16.45
1st ROUND
Paweł Leszczyński
Hadoop Product Owner, Grupa Allegro
Data lake is like a snowball. Most of us have started with proof of concepts that filled data lake with stream data and batch imports from external data sources. Camus, Gobblin, Spark ingestion, Sqoop, NiFi and more. They all start as shiny snowflakes which get doubled within an eye blink. How to survive the flood on a data lake and successfully solve problems like: small files on HDFS, data retention, auditing and monitoring imports, (near) real-time ingestion, late and out-of-order events?
Grzegorz Łyczba
Lead software engineer, OpenX
Adam Karwan
Senior Data Scientist, Groupon
During panel we are going to discuss best techniques of power full data visualization. Classification data visualization tools according to their strengths and weaknesses would be another stage of discussion.
We will also talk about storytelling and targeting presentations for the audience i.e. customers, stakeholders, students, etc. https://www.youtube.com/watch?v=AdSZJzb-aX8
Plan of discussion:
- Present yourself: name, current position, experience in data visualization
- Describe tools for data manipulation and visualization, your favorite ones and most painful.
- What are the features of good data visualization?
- Did it happen that customer incorrectly interpret your visualization?
- What should the correct presentation look like from the perspective of storytelling?
- Dirty and missing data how to deal with that issue?
- Reality, Complexity, Simplicity – what is the best strategy for visualizations?
Radosław Kita
Team Lead, Adform
Becoming a scientist date seems to be temptingly easy. Finishing a specialisation at Coursera and waiting for job offers. I would like to share my observations what skills are worth having and what challenges will be set by the reality in the practical implementation of large data science projects
Marek Wiewiórka
Big Data Architect, GetInData
Marek Wiewiórka
Big Data Architect, GetInData
Barbara Rychalska
Senior Data Scientist and Data Science Section Leader, Findwise
In today’s world, whose functioning is practically based on insights drawn from data, the ability to understand data in depth and communicate findings are increasingly welcome skill sets. Scientific communication is a science in itself, so what does it take to be both a good data scientist/big data engineer AND a good communicator? Should we strain to be both, anyway?
During the discussion we will try to answer this question and touch upon others such as:
– Data visualisation: is it an extra perk or an integral part of a data science project; what are good visualisations
– How to report your data science findings to fully convey the result in a persuasive manner
– What is and how important is exploratory data analysis (EDA)
With this talk I’d like to increase interest in scientific communication techniques and help bridge the gap between the scientist and the audience. We will explore how to fit data storytelling into a data science project development cycle to make it all more practical. Also, we will talk about how rational humans are – and how this (presumed) lack of total rationality influences the techniques of communication.
Firat Tekiner
Data Scientist and Big Data Architect, AB Initio
Sujatha Subramanian
Data Scientist, Lingaro
In this round table, we dive into various topics in the Realm of Artificial Intelligence and Big Data. Understand the driving forces and the technologies powering the AI journey. Especially the growth of data both structured and unstructured in the current times creating a strong synergy between AI and Big Data. How AI and Big Data is instrumental in transformation of businesses? Discuss about applications of deep learning in Big data Analytics and proliferation of conversational AI like chatbots ,voice assistants in enterprises
Theofilos Kakantousis
Co-founder, Logical Clocks AB
Hadoop has evolved into a vast Big Data ecosystem of different frameworks and services, which means selecting a distribution that matches one’s needs has become a tricky task. In this session, we discuss the services and features that users should consider when opting for a modern Hadoop distribution. We focus on the main distributions, namely HDP, CDH, MapR, Hops, AWS EMR, Google Dataproc and we discuss how each one would fit their needs based on the following all important aspects:
- Performance
- Security
- Platform Installation & Administration
- Application Monitoring & Control
- Big Data Processing frameworks
- Data Governance
- SQL & Business Intelligence
- Deep Learning
Vera Matei
Data Engineer, ING
Tal Sliwowicz
Director R&D - Scale, Performance & Data, Taboola
Tal Sliwowicz
R&D Director, Taboola
Taboola provides 500 Billion fully personalized content recommendations per month to 1.5 Billion unique visitors of the most prominent publishers across the world. To do that, we are processing 40TB+ a day in real time using Hadoop, Cassandra, Kafka and Spark. A significant part of the system is running SQL queries in Spark. In this table, We want to talk about people’s experience with running SQL on top of Spark, Presto, Drill, etc and share our experience.
Lars Albertsson
Founder & data engineering consultant, Mapflat
16.45 - 16.50
Technical break
16.50 - 17.25
2nd ROUND
Paweł Leszczyński
Hadoop Product Owner, Grupa Allegro
Data lake is like a snowball. Most of us have started with proof of concepts that filled data lake with stream data and batch imports from external data sources. Camus, Gobblin, Spark ingestion, Sqoop, NiFi and more. They all start as shiny snowflakes which get doubled within an eye blink. How to survive the flood on a data lake and successfully solve problems like: small files on HDFS, data retention, auditing and monitoring imports, (near) real-time ingestion, late and out-of-order events?
Grzegorz Łyczba
Lead software engineer, OpenX
Adam Karwan
Senior Data Scientist, Groupon
During panel we are going to discuss best techniques of power full data visualization. Classification data visualization tools according to their strengths and weaknesses would be another stage of discussion.
We will also talk about storytelling and targeting presentations for the audience i.e. customers, stakeholders, students, etc. https://www.youtube.com/watch?v=AdSZJzb-aX8
Plan of discussion:
- Present yourself: name, current position, experience in data visualization
- Describe tools for data manipulation and visualization, your favorite ones and most painful.
- What are the features of good data visualization?
- Did it happen that customer incorrectly interpret your visualization?
- What should the correct presentation look like from the perspective of storytelling?
- Dirty and missing data how to deal with that issue?
- Reality, Complexity, Simplicity – what is the best strategy for visualizations?
Radosław Kita
Team Lead, Adform
Becoming a scientist date seems to be temptingly easy. Finishing a specialisation at Coursera and waiting for job offers. I would like to share my observations what skills are worth having and what challenges will be set by the reality in the practical implementation of large data science projects
Marek Wiewiórka
Big Data Architect, GetInData
Marek Wiewiórka
Big Data Architect, GetInData
Mateusz Buśkiewicz
Tech Lead, Data Products Team, Base CRM
When Data Science meets large datasets, this can create a very effective mix and allows us to build more powerful data products. However, this is not always easy and effective.
How to be pragmatic about this topic in order to accelerate Data Science while avoiding pitfalls?
At this round-table, we will discuss a wide range of topics on what makes Data Scientist effective, from exploratory data analysis to the deployment of finished models on production. The questions we will ask ourselves include: Which Big Data tools are the most Data Scientist friendly? When should we use Big Data, and when is it more practical to stay with a single machine? When does it make sense to use distributed machine learning algorithms? How to visualize large datasets? How do you switch from prototyping to deploying scalable models on production?
Let’s share and learn from each other!
Marcin Pękalski
Data Scientist, Kambi Sports Solutions, Kaggler
Many organisations assume that Business Intelligence will be able to answer all their data related questions. But for that working with the data should not be a bottleneck. For that we need to have a proper BI platform, but what is that and what requirements should it satisfy to provide the most benefits?
During the discussion we will try to answer a couple of questions:
– what is a BI platform?
– who should it serve?
– what are typical requirements on the platform?
– where can we expect the bottlenecks?
Artur Fejklowicz
Data Architect / Data Engineering and Science Team Leader, TVN
I would like to talk with you about your experiences with security issues. Is it possible to implement AAA hadoop security stack without commercial software? How to secure rows of data in Hive? Do you think Java Reflections are safe? Where personal data should be processed? How to prepare data access to logs where the only personal data is cookie? Who should have access to data hash’ing? Kerberized it all – can we enable security in several steps or do we have to start with everything kerberized at the beginning?
Christophe Salperwyck
ABB Ability Platform Engineer, ABB
When dealing with huge amount of data coming as streams you might not have the possibility to see these data again, in that case you need to use one-pass incremental algorithms.
These algorithms usually trade accuracy for performance. Real life examples are “filtering” using Bloom filter like in Chrome/HBase…, “count distinct” using HyperLogLog like in Spark/Redis/AtScale…), quantiles estimation…
We can have the same trade off in Machine Learning too: algorithms exist for both supervised and unsupervised learning that can learn incrementally on data streams.
The idea for this round table is to discuss real use cases of streaming algorithms/structures and stream mining.
Tomasz Szczechura
Team Leader of Data Systems Team, Grupa Wirtualna Polska
Business requirements for large data systems are growing. Time costs more and more. We will discuss the tools we can use to minimize time of loading and query this data with sub-second OLAP queries and how to scale this architecture to petabytes of data. We will discuss tools such as Druid, Kylin and Clickhouse. We will compare them and exchange experiences.
Arunabh Singh
Lead Data Scientist, HiQ International AB
IoT data often comes with its own unique challenges: lack of structure and the extreme diversity of source devices from light sensors to cars, unreliability around hardware integration with the “internet,” and a lack of an established best practices and ecosystem of tools to process and analyze the data. However, with many of the consumer-side “big data” problems cracked, processing and harnessing the value of IoT data is the next logical progression for the “big data” discipline to figure out on a large-scale. At Springworks, a connected cars platform based in Stockholm, we work with IoT data from cars using a telematic unit and also face many of these challenges. Key questions of this session include:
- What are the best design choices for IoT data, especially to overcome the unreliability around hardware integration component?
- What are similarities/differences with “regular” big data processing?
- Which organizations are the leaders in leveraging IoT data, and what are their learnings?
Jacek Laskowski
,
Apache Spark™ is a fast and general engine for distributed in-memory computations on massive scale. Let’s talk about what’s coming in Apache Spark 2.3 and how to use it for large data processing in batch or streaming modes. Bring all your questions about Apache Spark in general and Spark SQL, Spark MLlib, Spark Structured Streaming in particular. The roundtable is to help you fine-tune existing Spark workloads as well as prepare for future ones
17.25 - 17.45
Coffee break
17.45 - 18.15
Panel discussion - Getting more out of your data in 2018
Building an efficient Big Data platform and mining large volumes of data seems to be a never-ending story for data-driven companies. It’s an ongoing journey with many pitfalls, twists and unclear future. Each year, there is something that changes the game, brings new value, promises rewards or wastes our time. During this panel, our experts will talk about their plans and hopes for 2018 – small and big improvements to their big data strategy that will help them to get more out of data in 2018. This includes data monetization, new use-cases that become mainstream, new technologies that get significant adoption, new challenges that more and more companies face. The discussion won’t be about distant future, but about actions that you can take in 2018.
-
Host:
Adam Kawa
Data Engineer and Founder, GetInData
Adam Kawa
CEO and Co-founder, GetInData
Tomasz Burzyński
Business Insights Director, Orange
Karthik Ramasamy
Machine Learning Engineer, Google
Boxun Zhang
Sr. Data Scientist, GoEuro
18.15 - 18.30
Closing & Summary
Przemysław Gamdzyk
CEO & Meeting Designer, Evention
Przemysław Gamdzyk
CEO & Meeting Designer, Evention
Adam Kawa
Data Engineer and Founder, GetInData
Adam Kawa
CEO and Co-founder, GetInData
19.00 - 22.00
Networking party for all participants and speakers
At the end of the conference we would like to invite all the attendees for the informal evening meeting in BOLEK Pub.
Estimated rank of the presentation, where: 1- very technical, 5 – mostly business related