OPUS 4 | Search

TASKWORK: a cloud-aware runtime system for elastic task-parallel HPC applications (2019)

With the capability of employing virtually unlimited compute resources, the cloud evolved into an attractive execution environment for applications from the High Performance Computing (HPC) domain. By means of elastic scaling, compute resources can be provisioned and decommissioned at runtime. This gives rise to a new concept in HPC: Elasticity of parallel computations. However, it is still an open research question to which extent HPC applications can benefit from elastic scaling and how to leverage elasticity of parallel computations. In this paper, we discuss how to address these challenges for HPC applications with dynamic task parallelism and present TASKWORK, a cloud-aware runtime system based on our findings. TASKWORK enables the implementation of elastic HPC applications by means of higher level development frameworks and solves corresponding coordination problems based on Apache ZooKeeper. For evaluation purposes, we discuss a development framework for parallel branch-and-bound based on TASKWORK, show how to implement an elastic HPC application, and report on measurements with respect to parallel efficiency and elastic scaling.

Serverless skeletons for elastic parallel processing (2019)

Kehrer, Stefan ; Scheffold, Jochen ; Blochinger, Wolfgang

Serverless computing is an emerging cloud computing paradigm with the goal of freeing developers from resource management issues. As of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other. These workloads benefit from on-demand and elastic compute resources as well as per-function billing. However, it is still an open research question to which extent parallel applications, which comprise most often complex coordination and communication patterns, can benefit from serverless computing. In this paper, we introduce serverless skeletons for parallel cloud programming to free developers from both parallelism and resource management issues. In particular, we investigate on the well known and widely used farm skeleton, which supports the implementation of a wide range of applications. To evaluate our concepts, we present a prototypical development and runtime framework and implement two applications based on our framework: Numerical integration and hyperparameter optimization - a commonly applied technique in machine learning. We report on performance measurements for both applications and discuss the usefulness of our approach.

Model-based generation of self-adaptive cloud services (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

An important shift in software delivery is the definition of a cloud service as an independently deployable unit by following the microservices architectural style. Container virtualization facilitates development and deployment by ensuring independence from the runtime environment. Thus, cloud services are built as container based systems - a set of containers that control the lifecycle of software and middleware components. However, using containers leads to a new paradigm for service development and operation: Self service environments enable software developers to deploy and operate container based systems on their own - you build it, you run it. Following this approach, more and more operational aspects are transferred towards the responsibility of software developers. In this work, we propose a concept for self-adaptive cloud services based on container virtualization in line with the microservices architectural style and present a model-based approach that assists software developers in building these services. Based on operational models specified by developers, the mechanisms required for self-adaptation are automatically generated. As a result, each container automatically adapts itself in a reactive, decentralized manner. We evaluate a prototype which leverages the emerging TOSCA standard to specify operational behavior in a portable manner.

Development and operation of elastic parallel tree search applications using TASKWORK (2020)

Kehrer, Stefan ; Blochinger, Wolfgang

Cloud resources can be dynamically provisioned according to application-specific requirements and are payed on a per-use basis. This gives rise to a new concept for parallel processing: Elastic parallel computations. However, it is still an open research question to which extent parallel applications can benefit from elastic scaling, which requires resource adaptation at runtime and corresponding coordination mechanisms. In this work, we analyze how to address these system-level challenges in the context of developing and operating elastic parallel tree search applications. Based on our findings, we discuss the design and implementation of TASKWORK, a cloud-aware runtime system specifically designed for elastic parallel tree search, which enables the implementation of elastic applications by means of higher-level development frameworks. We show how to implement an elastic parallel branch-and-bound application based on an exemplary development framework and report on our experimental evaluation that also considers several benchmarks for parallel tree search.

Cost-optimized parallel computations using volatile cloud resources (2019)

Haußmann, Jens ; Blochinger, Wolfgang ; Kuechlin, Wolfgang

In recent years, the parallel computing community has shown increasing interest in leveraging cloud resources for executing parallel applications. Clouds exhibit several fundamental features of economic value, like on-demand resource provisioning and a pay-per-use model. Additionally, several cloud providers offer their resources with significant discounts; however, possessing limited availability. Such volatile resources are an auspicious opportunity to reduce the costs arising from computations, thus achieving higher cost efficiency. In this paper, we propose a cost model for quantifying the monetary costs of executing parallel applications in cloud environments, leveraging volatile resources. Using this cost model, one is able to determine a configuration of a cloud-based parallel system that minimizes the total costs of executing an application.

Container-based module isolation for cloud services (2019)

Kehrer, Stefan ; Riebandt, Florian ; Blochinger, Wolfgang

Due to frequently changing requirements, the internal structure of cloud services is highly dynamic. To ensure flexibility, adaptability, and maintainability for dynamically evolving services, modular software development has become the dominating paradigm. By following this approach, services can be rapidly constructed by composing existing, newly developed and publicly available third-party modules. However, newly added modules might be unstable, resource-intensive, or untrustworthy. Thus, satisfying non-functional requirements such as reliability, efficiency, and security while ensuring rapid release cycles is a challenging task. In this paper, we discuss how to tackle these issues by employing container virtualization to isolate modules from each other according to a specification of isolation constraints. We satisfy non-functional requirements for cloud services by automatically transforming the modules comprised into a container-based system. To deal with the increased overhead that is caused by isolating modules from each other, we calculate the minimum set of containers required to satisfy the isolation constraints specified. Moreover, we present and report on a prototypical transformation pipeline that automatically transforms cloud services developed based on the Java Platform Module System into container-based systems.

AUTOGENIC: automated generation of self-configuring microservices (2018)

Kehrer, Stefan ; Blochinger, Wolfgang

The state of the art proposes the microservices architectural style to build applications. Additionally, container virtualization and container management systems evolved into the perfect fit for developing, deploying, and operating microservices in line with the DevOps paradigm. Container virtualization facilitates deployment by ensuring independence from the runtime environment. However, microservices store their configuration in the environment. Therefore, software developers have to wire their microservice implementation with technologies provided by the target runtime environment such as configuration stores and service registries. These technological dependencies counteract the portability benefit of using container virtualization. In this paper, we present AUTOGENIC - a model-based approach to assist software developers in building microservices as self configuring containers without being bound to operational technologies. We provide developers with a simple configuration model to specify configuration operations of containers and automatically generate a self-configuring microservice tailored for the targeted runtime environment. Our approach is supported by a method, which describes the steps to automate the generation of self-configuring microservices. Additionally, we present and evaluate a prototype, which leverages the emerging TOSCA standard.

An elasticity description language for task-parallel cloud applications (2020)

Haußmann, Jens ; Blochinger, Wolfgang ; Kuechlin, Wolfgang

In recent years, the cloud has become an attractive execution environment for parallel applications, which introduces novel opportunities for versatile optimizations. Particularly promising in this context is the elasticity characteristic of cloud environments. While elasticity is well established for client-server applications, it is a fundamentally new concept for parallel applications. However, existing elasticity mechanisms for client-server applications can be applied to parallel applications only to a limited extent. Efficient exploitation of elasticity for parallel applications requires novel mechanisms that take into account the particular runtime characteristics and resource requirements of this application type. To tackle this issue, we propose an elasticity description language. This language facilitates users to define elasticity policies, which specify the elasticity behavior at both cloud infrastructure level and application level. Elasticity at the application level is supported by an adequate programming and execution model, as well as abstractions that comply with the dynamic availability of resources. We present the underlying concepts and mechanisms, as well as the architecture and a prototypical implementation. Furthermore, we illustrate the capabilities of our approach through real-world scenarios.

A survey on cloud migration strategies for high performance computing (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications from the High Performance Computing (HPC) domain. Existing research recognized that parallel applications require architectural refactoring to benefit from cloud-specific properties (most importantly elasticity). However, architectural refactoring comes with many challenges and cannot be applied to all applications due to fundamental performance issues. Thus, during the last years, different cloud migration strategies have been considered for different classes of parallel applications. In this paper, we provide a survey on HPC cloud migration research. We investigate on the approaches applied and the parallel applications considered. Based on our findings, we identify and describe three cloud migration strategies.

Open Access

Refine

Author

Year of publication

Document Type

Language

Has full text

Is part of the Bibliography

Institute

Publisher

9 search hits