Refine
Year of publication
- 2020 (3) (remove)
Document Type
- Journal article (1)
- Conference proceeding (1)
- Doctoral Thesis (1)
Language
- English (3)
Has full text
- yes (3)
Is part of the Bibliography
- yes (3)
Institute
- Informatik (3) (remove)
Publisher
- Springer (2)
- Universität Stuttgart (1)
High Performance Computing (HPC) enables significant progress in both science and industry. Whereas traditionally parallel applications have been developed to address the grand challenges in science, as of today, they are also heavily used to speed up the time-to-result in the context of product design, production planning, financial risk management, medical diagnosis, as well as research and development efforts. However, purchasing and operating HPC clusters to run these applications requires huge capital expenditures as well as operational knowledge and thus is reserved to large organizations that benefit from economies of scale. More recently, the cloud evolved into an alternative execution environment for parallel applications, which comes with novel characteristics such as on-demand access to compute resources, pay-per-use, and elasticity. Whereas the cloud has been mainly used to operate interactive multi-tier applications, HPC users are also interested in the benefits offered. These include full control of the resource configuration based on virtualization, fast setup times by using on-demand accessible compute resources, and eliminated upfront capital expenditures due to the pay-per-use billing model. Additionally, elasticity allows compute resources to be provisioned and decommissioned at runtime, which allows fine-grained control of an application's performance in terms of its execution time and efficiency as well as the related monetary costs of the computation. Whereas HPC-optimized cloud environments have been introduced by cloud providers such as Amazon Web Services (AWS) and Microsoft Azure, existing parallel architectures are not designed to make use of elasticity. This thesis addresses several challenges in the emergent field of High Performance Cloud Computing. In particular, the presented contributions focus on the novel opportunities and challenges related to elasticity. First, the principles of elastic parallel systems as well as related design considerations are discussed in detail. On this basis, two exemplary elastic parallel system architectures are presented, each of which includes (1) an elasticity controller that controls the number of processing units based on user-defined goals, (2) a cloud-aware parallel execution model that handles coordination and synchronization requirements in an automated manner, and (3) a programming abstraction to ease the implementation of elastic parallel applications. To automate application delivery and deployment, novel approaches are presented that generate the required deployment artifacts from developer-provided source code in an automated manner while considering application-specific non-functional requirements. Throughout this thesis, a broad spectrum of design decisions related to the construction of elastic parallel system architectures is discussed, including proactive and reactive elasticity control mechanisms as well as cloud-based parallel processing with virtual machines (Infrastructure as a Service) and functions (Function as a Service). To evaluate these contributions, extensive experimental evaluations are presented.
Elasticity is considered to be the most beneficial characteristic of cloud environments, which distinguishes the cloud from clusters and grids. Whereas elasticity has become mainstream for web-based, interactive applications, it is still a major research challenge how to leverage elasticity for applications from the high-performance computing (HPC) domain, which heavily rely on efficient parallel processing techniques. In this work, we specifically address the challenges of elasticity for parallel tree search applications. Well-known meta-algorithms based on this parallel processing technique include branch-and-bound and backtracking search. We show that their characteristics render static resource provisioning inappropriate and the capability of elastic scaling desirable. Moreover, we discuss how to construct an elasticity controller that reasons about the scaling behavior of a parallel system at runtime and dynamically adapts the number of processing units according to user-defined cost and efficiency thresholds. We evaluate a prototypical elasticity controller based on our findings by employing several benchmarks for parallel tree search and discuss the applicability of the proposed approach. Our experimental results show that, by means of elastic scaling, the performance can be controlled according to user-defined thresholds, which cannot be achieved with static resource provisioning.
Cloud resources can be dynamically provisioned according to application-specific requirements and are payed on a per-use basis. This gives rise to a new concept for parallel processing: Elastic parallel computations. However, it is still an open research question to which extent parallel applications can benefit from elastic scaling, which requires resource adaptation at runtime and corresponding coordination mechanisms. In this work, we analyze how to address these system-level challenges in the context of developing and operating elastic parallel tree search applications. Based on our findings, we discuss the design and implementation of TASKWORK, a cloud-aware runtime system specifically designed for elastic parallel tree search, which enables the implementation of elastic applications by means of higher-level development frameworks. We show how to implement an elastic parallel branch-and-bound application based on an exemplary development framework and report on our experimental evaluation that also considers several benchmarks for parallel tree search.