We work on resource management and fault tolerance for distributed systems in context of cloud computing, grid computing, HPC, Big Data, and IoT. Currently, we mainly focus on the following two areas:
Adaptive Resource Management (ARM)
Many applications today have to deal with large volumes of data, including for instance IoT monitoring applications in context of urban infrastructures and the data processing jobs of many sciences. Such applications are usually implemented on top of distributed systems and often run on heterogeneous infrastructures. Users regularly have specific expectations for the performance and dependability of these applications. This is why we develop and evaluate methods that allow systems to adapt effectively to specific data-intensive workloads, distributed computing environments, and performance as well as dependability requirements.
Learn more about this sub-group 
Optimization and Fault Tolerance in Distributed Systems
Living in a data-driven interconnected world undeniable increases the relevance of distributed IT systems. The modern technological discurs is dominated by terms, such as Internet of Things (IoT), smart cities, sensor networks, 5G, autonomous transportation, Industry 4.0 and many more, and their necessity for IT. While bringing the potential great technologies like autonomous driving, virtual reality or remote surgery - just to state a few - their increased complexity and distribution breaks traditional system operation concepts. In general, systems with an increased numbers of highly interconnected components are hard to operate by human experts alone which demands for novel solutions in the areas of fault tolerance and distributed system operation.
Learn more about our research projects in this area 
Aside of the projects in our two main research areas we are also part of other ongoing research efforts, including the ECDF , HEIBRiDS , and TU WimiPlus .