Inhalt des Dokuments
Master Thesis Kaikai Yang
Scalable Stream Processing
A growing number of interet-capable devices can produce vast
amounts of data in the form of streams, e.g. video and audio. This
data poses new challenges for all parts of the underlying
infrastructure, especially when it needs to processed under real-time
Frameworks for massively-parallel data processing in large compute clusters have become popular in both industry and acedemia. These frameworks, such as Hadoop, usually process data in a batch-job fashion, an approach which unfortunately does not address the requirements of streamed data, where results are often required within a very short timespan, e.g. less than a second.
The scope of this thesis is the design and implementation of a
programming model for parallel, distributed stream processing based on
Stratosphere's PACTs model. The main task is the definition of window
semantics for the existing PACTs operators and their adaptation.
Prerequisite to work on this topic are profound knowledge of the Java programming language, interest in current research topics, as well as the willingness to familiarize with an existing system.