OPUS 4 | Informatik

Self-tuning serverless task farming using proactive elasticity control (2021)

Kehrer, Stefan ; Zietlow, Dominik ; Scheffold, Jochen ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications, which make use of compute resources to speed up the computation of large problems in science and industry. Whereas Infrastructure as a Service (IaaS) offerings have been commonly employed, more recently, serverless computing emerged as a novel cloud computing paradigm with the goal of freeing developers from resource management issues. However, as of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other and benefit from on-demand and elastic compute resources as well as per-function billing. In this work, we discuss how to employ serverless computing platforms to operate parallel applications. We specifically focus on the class of parallel task farming applications and introduce a novel approach to free developers from both parallelism and resource management issues. Our approach includes a proactive elasticity controller that adapts the physical parallelism per application run according to user-defined goals. Specifically, we show how to consider a user-defined execution time limit after which the result of the computation needs to be present while minimizing the associated monetary costs. To evaluate our concepts, we present a prototypical elastic parallel system architecture for self-tuning serverless task farming and implement two applications based on our framework. Moreover, we report on performance measurements for both applications as well as the prediction accuracy of the proposed proactive elasticity control mechanism and discuss our key findings.

Development and operation of elastic parallel tree search applications using TASKWORK (2020)

Kehrer, Stefan ; Blochinger, Wolfgang

Cloud resources can be dynamically provisioned according to application-specific requirements and are payed on a per-use basis. This gives rise to a new concept for parallel processing: Elastic parallel computations. However, it is still an open research question to which extent parallel applications can benefit from elastic scaling, which requires resource adaptation at runtime and corresponding coordination mechanisms. In this work, we analyze how to address these system-level challenges in the context of developing and operating elastic parallel tree search applications. Based on our findings, we discuss the design and implementation of TASKWORK, a cloud-aware runtime system specifically designed for elastic parallel tree search, which enables the implementation of elastic applications by means of higher-level development frameworks. We show how to implement an elastic parallel branch-and-bound application based on an exemplary development framework and report on our experimental evaluation that also considers several benchmarks for parallel tree search.

Equilibrium : an elasticity controller for parallel tree search in the cloud (2020)

Kehrer, Stefan ; Blochinger, Wolfgang

Elasticity is considered to be the most beneficial characteristic of cloud environments, which distinguishes the cloud from clusters and grids. Whereas elasticity has become mainstream for web-based, interactive applications, it is still a major research challenge how to leverage elasticity for applications from the high-performance computing (HPC) domain, which heavily rely on efficient parallel processing techniques. In this work, we specifically address the challenges of elasticity for parallel tree search applications. Well-known meta-algorithms based on this parallel processing technique include branch-and-bound and backtracking search. We show that their characteristics render static resource provisioning inappropriate and the capability of elastic scaling desirable. Moreover, we discuss how to construct an elasticity controller that reasons about the scaling behavior of a parallel system at runtime and dynamically adapts the number of processing units according to user-defined cost and efficiency thresholds. We evaluate a prototypical elasticity controller based on our findings by employing several benchmarks for parallel tree search and discuss the applicability of the proposed approach. Our experimental results show that, by means of elastic scaling, the performance can be controlled according to user-defined thresholds, which cannot be achieved with static resource provisioning.

Serverless skeletons for elastic parallel processing (2019)

Kehrer, Stefan ; Scheffold, Jochen ; Blochinger, Wolfgang

Serverless computing is an emerging cloud computing paradigm with the goal of freeing developers from resource management issues. As of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other. These workloads benefit from on-demand and elastic compute resources as well as per-function billing. However, it is still an open research question to which extent parallel applications, which comprise most often complex coordination and communication patterns, can benefit from serverless computing. In this paper, we introduce serverless skeletons for parallel cloud programming to free developers from both parallelism and resource management issues. In particular, we investigate on the well known and widely used farm skeleton, which supports the implementation of a wide range of applications. To evaluate our concepts, we present a prototypical development and runtime framework and implement two applications based on our framework: Numerical integration and hyperparameter optimization - a commonly applied technique in machine learning. We report on performance measurements for both applications and discuss the usefulness of our approach.

Cost-optimized parallel computations using volatile cloud resources (2019)

Haußmann, Jens ; Blochinger, Wolfgang ; Kuechlin, Wolfgang

In recent years, the parallel computing community has shown increasing interest in leveraging cloud resources for executing parallel applications. Clouds exhibit several fundamental features of economic value, like on-demand resource provisioning and a pay-per-use model. Additionally, several cloud providers offer their resources with significant discounts; however, possessing limited availability. Such volatile resources are an auspicious opportunity to reduce the costs arising from computations, thus achieving higher cost efficiency. In this paper, we propose a cost model for quantifying the monetary costs of executing parallel applications in cloud environments, leveraging volatile resources. Using this cost model, one is able to determine a configuration of a cloud-based parallel system that minimizes the total costs of executing an application.

A survey on cloud migration strategies for high performance computing (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications from the High Performance Computing (HPC) domain. Existing research recognized that parallel applications require architectural refactoring to benefit from cloud-specific properties (most importantly elasticity). However, architectural refactoring comes with many challenges and cannot be applied to all applications due to fundamental performance issues. Thus, during the last years, different cloud migration strategies have been considered for different classes of parallel applications. In this paper, we provide a survey on HPC cloud migration research. We investigate on the approaches applied and the parallel applications considered. Based on our findings, we identify and describe three cloud migration strategies.

Migrating parallel applications to the cloud: assessing cloud readiness based on parallel design decisions (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

Parallel applications are the computational backbone of major industry trends and grand challenges in science. Whereas these applications are typically constructed for dedicated High Performance Computing clusters and supercomputers, the cloud emerges as attractive execution environment, which provides on-demand resource provisioning and a pay-per-use model. However, cloud environments require specific application properties that may restrict parallel application design. As a result, design trade-offs are required to simultaneously maximize parallel performance and benefit from cloud-specific characteristics. In this paper, we present a novel approach to assess the cloud readiness of parallel applications based on the design decisions made. By discovering and understanding the implications of these parallel design decisions on an application’s cloud readiness, our approach supports the migration of parallel applications to the cloud.We introduce an assessment procedure, its underlying meta model, and a corresponding instantiation to structure this multi-dimensional design space. For evaluation purposes, we present an extensive case study comprising three parallel applications and discuss their cloud readiness based on our approach.

Cost-efficient parallel processing of irregularly structured problems in cloud computing environments (2019)

Haußmann, Jens ; Blochinger, Wolfgang ; Kuechlin, Wolfgang

In this paper, we deal with optimizing the monetary costs of executing parallel applications in cloud-based environments. Specifically, we investigate on how scalability characteristics of parallel applications impact the total costs of computations. We focus on a specific class of irregularly structured problems, where the scalability typically depends on the input data. Consequently, dynamic optimization methods are required for minimizing the costs of computation. For quantifying the total monetary costs of individual parallel computations, the paper presents a cost model that considers the costs for the parallel infrastructure employed as well as the costs caused by delayed results. We discuss a method for dynamically finding the number of processors for which the total costs based on our cost model are minimal. Our extensive experimental evaluation gives detailed insights into the performance characteristics of our approach.

Model-based generation of self-adaptive cloud services (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

An important shift in software delivery is the definition of a cloud service as an independently deployable unit by following the microservices architectural style. Container virtualization facilitates development and deployment by ensuring independence from the runtime environment. Thus, cloud services are built as container based systems - a set of containers that control the lifecycle of software and middleware components. However, using containers leads to a new paradigm for service development and operation: Self service environments enable software developers to deploy and operate container based systems on their own - you build it, you run it. Following this approach, more and more operational aspects are transferred towards the responsibility of software developers. In this work, we propose a concept for self-adaptive cloud services based on container virtualization in line with the microservices architectural style and present a model-based approach that assists software developers in building these services. Based on operational models specified by developers, the mechanisms required for self-adaptation are automatically generated. As a result, each container automatically adapts itself in a reactive, decentralized manner. We evaluate a prototype which leverages the emerging TOSCA standard to specify operational behavior in a portable manner.

TASKWORK: a cloud-aware runtime system for elastic task-parallel HPC applications (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

With the capability of employing virtually unlimited compute resources, the cloud evolved into an attractive execution environment for applications from the High Performance Computing (HPC) domain. By means of elastic scaling, compute resources can be provisioned and decommissioned at runtime. This gives rise to a new concept in HPC: Elasticity of parallel computations. However, it is still an open research question to which extent HPC applications can benefit from elastic scaling and how to leverage elasticity of parallel computations. In this paper, we discuss how to address these challenges for HPC applications with dynamic task parallelism and present TASKWORK, a cloud-aware runtime system based on our findings. TASKWORK enables the implementation of elastic HPC applications by means of higher level development frameworks and solves corresponding coordination problems based on Apache ZooKeeper. For evaluation purposes, we discuss a development framework for parallel branch-and-bound based on TASKWORK, show how to implement an elastic HPC application, and report on measurements with respect to parallel efficiency and elastic scaling.

Open Access

Informatik

Refine

Author

Year of publication

Document Type

Language

Has full text

Is part of the Bibliography

Institute

Publisher

13 search hits