Inhalt des Dokuments
MA: Data Stream Query Language
Tuesday, 14. February 2017
Data streams appear in many digital applications like social networks (Twitter, Netflix, Spotify), sensor networks (IoT, Industry 4.0), or monitoring applications. Data streams are composed of high-frequency, live time series data which needs to be analysed and visualised to make it understandable for humans.
The data stream processing framework Bitflow, developed at the CIT group, enables real-time processing and analysis of arbitrary data streams. With Bitflow, algorithm pipelines can be created and chained to transform, analyse and visualize data points received from live streaming data sources. Data can be streamed between different instances of algorithm pipelines over the network to scale out and optimize the anlysis. In one use case we use Bitflow to detect anomalies in cloud infrastructures by analysing monitoring data collected from a large number of physical and virtual hosts.
The topic of this thesis is to design and implement a query language for transforming live data streams within the Bitflow platform. The query language should be modelled close to SQL (or other existing query languages) and be optimized for the streaming use case. The language will enable building quick and flexible queries on real-time data, which can be fed into further analysis steps or live charts.
Prerequisites for working on this topic are basic knowledge in algorithmic design, databases/SQL and advanced programming skills in either Go or Java.