OPUS 4 | Search

TASKWORK: a cloud-aware runtime system for elastic task-parallel HPC applications (2019)

With the capability of employing virtually unlimited compute resources, the cloud evolved into an attractive execution environment for applications from the High Performance Computing (HPC) domain. By means of elastic scaling, compute resources can be provisioned and decommissioned at runtime. This gives rise to a new concept in HPC: Elasticity of parallel computations. However, it is still an open research question to which extent HPC applications can benefit from elastic scaling and how to leverage elasticity of parallel computations. In this paper, we discuss how to address these challenges for HPC applications with dynamic task parallelism and present TASKWORK, a cloud-aware runtime system based on our findings. TASKWORK enables the implementation of elastic HPC applications by means of higher level development frameworks and solves corresponding coordination problems based on Apache ZooKeeper. For evaluation purposes, we discuss a development framework for parallel branch-and-bound based on TASKWORK, show how to implement an elastic HPC application, and report on measurements with respect to parallel efficiency and elastic scaling.

Serverless skeletons for elastic parallel processing (2019)

Kehrer, Stefan ; Scheffold, Jochen ; Blochinger, Wolfgang

Serverless computing is an emerging cloud computing paradigm with the goal of freeing developers from resource management issues. As of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other. These workloads benefit from on-demand and elastic compute resources as well as per-function billing. However, it is still an open research question to which extent parallel applications, which comprise most often complex coordination and communication patterns, can benefit from serverless computing. In this paper, we introduce serverless skeletons for parallel cloud programming to free developers from both parallelism and resource management issues. In particular, we investigate on the well known and widely used farm skeleton, which supports the implementation of a wide range of applications. To evaluate our concepts, we present a prototypical development and runtime framework and implement two applications based on our framework: Numerical integration and hyperparameter optimization - a commonly applied technique in machine learning. We report on performance measurements for both applications and discuss the usefulness of our approach.

Model-based generation of self-adaptive cloud services (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

An important shift in software delivery is the definition of a cloud service as an independently deployable unit by following the microservices architectural style. Container virtualization facilitates development and deployment by ensuring independence from the runtime environment. Thus, cloud services are built as container based systems - a set of containers that control the lifecycle of software and middleware components. However, using containers leads to a new paradigm for service development and operation: Self service environments enable software developers to deploy and operate container based systems on their own - you build it, you run it. Following this approach, more and more operational aspects are transferred towards the responsibility of software developers. In this work, we propose a concept for self-adaptive cloud services based on container virtualization in line with the microservices architectural style and present a model-based approach that assists software developers in building these services. Based on operational models specified by developers, the mechanisms required for self-adaptation are automatically generated. As a result, each container automatically adapts itself in a reactive, decentralized manner. We evaluate a prototype which leverages the emerging TOSCA standard to specify operational behavior in a portable manner.

Cost-optimized parallel computations using volatile cloud resources (2019)

Haußmann, Jens ; Blochinger, Wolfgang ; Kuechlin, Wolfgang

In recent years, the parallel computing community has shown increasing interest in leveraging cloud resources for executing parallel applications. Clouds exhibit several fundamental features of economic value, like on-demand resource provisioning and a pay-per-use model. Additionally, several cloud providers offer their resources with significant discounts; however, possessing limited availability. Such volatile resources are an auspicious opportunity to reduce the costs arising from computations, thus achieving higher cost efficiency. In this paper, we propose a cost model for quantifying the monetary costs of executing parallel applications in cloud environments, leveraging volatile resources. Using this cost model, one is able to determine a configuration of a cloud-based parallel system that minimizes the total costs of executing an application.

Container-based module isolation for cloud services (2019)

Kehrer, Stefan ; Riebandt, Florian ; Blochinger, Wolfgang

Due to frequently changing requirements, the internal structure of cloud services is highly dynamic. To ensure flexibility, adaptability, and maintainability for dynamically evolving services, modular software development has become the dominating paradigm. By following this approach, services can be rapidly constructed by composing existing, newly developed and publicly available third-party modules. However, newly added modules might be unstable, resource-intensive, or untrustworthy. Thus, satisfying non-functional requirements such as reliability, efficiency, and security while ensuring rapid release cycles is a challenging task. In this paper, we discuss how to tackle these issues by employing container virtualization to isolate modules from each other according to a specification of isolation constraints. We satisfy non-functional requirements for cloud services by automatically transforming the modules comprised into a container-based system. To deal with the increased overhead that is caused by isolating modules from each other, we calculate the minimum set of containers required to satisfy the isolation constraints specified. Moreover, we present and report on a prototypical transformation pipeline that automatically transforms cloud services developed based on the Java Platform Module System into container-based systems.

A survey on cloud migration strategies for high performance computing (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications from the High Performance Computing (HPC) domain. Existing research recognized that parallel applications require architectural refactoring to benefit from cloud-specific properties (most importantly elasticity). However, architectural refactoring comes with many challenges and cannot be applied to all applications due to fundamental performance issues. Thus, during the last years, different cloud migration strategies have been considered for different classes of parallel applications. In this paper, we provide a survey on HPC cloud migration research. We investigate on the approaches applied and the parallel applications considered. Based on our findings, we identify and describe three cloud migration strategies.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has full text

Is part of the Bibliography

Institute

Publisher

6 search hits