Master theses

Taming decentralized Big Data Streams

Keywords: Big Data, Decentralization , Stream Processing

Supervision: Ruben Verborgh Femke Ongenae

Students: max 1

Big Data is on the verge of a paradigm shift towards decentralization due to the ever-increasing growing amounts of generated real-time data streams and users insisting to regain control over their own data. As the volume and velocity of these streams keep increasing, it is becoming infeasible to transmit all produced data streams to a centralized cloud infrastructure for processing.

In this thesis, you will have the opportunity to work on the upcoming decentralized Big Data paradigm, where you will be able to investigate how to optimally push processing operators as close to its source as possible. You will learn how to handle the heterogeneity of data, which becomes very important in a decentralized paradigm. You will create a deep understanding of current stream processing engines (e.g. Apache Flink, Apache Spark, Google Dataflow, etc.) and how their principles can be further extended towards taming decentralized Big Data streams.