direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Publications of the Research Group

The Stratosphere platform for big data analytics
Citation key Alexandrov2014
Author Alexandrov, Alexander and Bergmann, Rico and Ewen, Stephan and Freytag, Johann-Christoph and Hueske, Fabian and Heise, Arvid and Kao, Odej and Leich, Marcus and Leser, Ulf and Markl, Volker and Naumann, Felix and Peters, Mathias and Rheinländer, Astrid and Sax, Matthias J. and Schelter, Sebastian and Höger, and Tzoumas, Kostas and Warneke, Daniel
Pages 939–964
Year 2014
ISSN 0949-877X
DOI 10.1007/s00778-014-0357-y
Journal The VLDB Journal
Volume 23
Number 6
Abstract We present Stratosphere, an open-source software stack for parallel data analysis. Stratosphere brings together a unique set of features that allow the expressive, easy, and efficient programming of analytical applications at very large scale. Stratosphere's features include ``in situ'' data processing, a declarative query language, treatment of user-defined functions as first-class citizens, automatic program parallelization and optimization, support for iterative programs, and a scalable and efficient execution engine. Stratosphere covers a variety of ``Big Data'' use cases, such as data warehousing, information extraction and integration, data cleansing, graph analysis, and statistical analysis applications. In this paper, we present the overall system architecture design decisions, introduce Stratosphere through example queries, and then dive into the internal workings of the system's components that relate to extensibility, programming model, optimization, and query execution. We experimentally compare Stratosphere against popular open-source alternatives, and we conclude with a research outlook for the next years.
Link to publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions