Elastic parallel systems for high performance cloud computing
- High Performance Computing (HPC) enables significant progress in both science and industry. Whereas traditionally parallel applications have been developed to address the grand challenges in science, as of today, they are also heavily used to speed up the time-to-result in the context of product design, production planning, financial risk management, medical diagnosis, as well as research and development efforts. However, purchasing and operating HPC clusters to run these applications requires huge capital expenditures as well as operational knowledge and thus is reserved to large organizations that benefit from economies of scale. More recently, the cloud evolved into an alternative execution environment for parallel applications, which comes with novel characteristics such as on-demand access to compute resources, pay-per-use, and elasticity. Whereas the cloud has been mainly used to operate interactive multi-tier applications, HPC users are also interested in the benefits offered. These include full control of the resource configuration based on virtualization, fast setup times by using on-demand accessible compute resources, and eliminated upfront capital expenditures due to the pay-per-use billing model. Additionally, elasticity allows compute resources to be provisioned and decommissioned at runtime, which allows fine-grained control of an application's performance in terms of its execution time and efficiency as well as the related monetary costs of the computation. Whereas HPC-optimized cloud environments have been introduced by cloud providers such as Amazon Web Services (AWS) and Microsoft Azure, existing parallel architectures are not designed to make use of elasticity. This thesis addresses several challenges in the emergent field of High Performance Cloud Computing. In particular, the presented contributions focus on the novel opportunities and challenges related to elasticity. First, the principles of elastic parallel systems as well as related design considerations are discussed in detail. On this basis, two exemplary elastic parallel system architectures are presented, each of which includes (1) an elasticity controller that controls the number of processing units based on user-defined goals, (2) a cloud-aware parallel execution model that handles coordination and synchronization requirements in an automated manner, and (3) a programming abstraction to ease the implementation of elastic parallel applications. To automate application delivery and deployment, novel approaches are presented that generate the required deployment artifacts from developer-provided source code in an automated manner while considering application-specific non-functional requirements. Throughout this thesis, a broad spectrum of design decisions related to the construction of elastic parallel system architectures is discussed, including proactive and reactive elasticity control mechanisms as well as cloud-based parallel processing with virtual machines (Infrastructure as a Service) and functions (Function as a Service). To evaluate these contributions, extensive experimental evaluations are presented.
Author of HS Reutlingen | Kehrer, Stefan |
---|---|
URN: | urn:nbn:de:bsz:rt2-opus4-29210 |
DOI: | https://doi.org/10.18419/opus-11058 |
Publisher: | Universität Stuttgart |
Place of publication: | Stuttgart |
Referee: | Wolfgang BlochingerORCiD, Frank Leymann |
Referee of HS Reutlingen: | Blochinger, Wolfgang |
Document Type: | Doctoral Thesis |
Language: | English |
Publication year: | 2020 |
Date of final exam: | 2020/08/05 |
Page Number: | 239 |
City of University: | Stuttgart |
Dissertation note: | Dissertation, Universität Stuttgart, 2020 |
DDC classes: | 004 Informatik |
Open access?: | Ja |
Licence (German): | Open Access |