OPUS 4 | Informatik

Self-tuning serverless task farming using proactive elasticity control (2021)

Kehrer, Stefan ; Zietlow, Dominik ; Scheffold, Jochen ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications, which make use of compute resources to speed up the computation of large problems in science and industry. Whereas Infrastructure as a Service (IaaS) offerings have been commonly employed, more recently, serverless computing emerged as a novel cloud computing paradigm with the goal of freeing developers from resource management issues. However, as of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other and benefit from on-demand and elastic compute resources as well as per-function billing. In this work, we discuss how to employ serverless computing platforms to operate parallel applications. We specifically focus on the class of parallel task farming applications and introduce a novel approach to free developers from both parallelism and resource management issues. Our approach includes a proactive elasticity controller that adapts the physical parallelism per application run according to user-defined goals. Specifically, we show how to consider a user-defined execution time limit after which the result of the computation needs to be present while minimizing the associated monetary costs. To evaluate our concepts, we present a prototypical elastic parallel system architecture for self-tuning serverless task farming and implement two applications based on our framework. Moreover, we report on performance measurements for both applications as well as the prediction accuracy of the proposed proactive elasticity control mechanism and discuss our key findings.

Development and operation of elastic parallel tree search applications using TASKWORK (2020)

Kehrer, Stefan ; Blochinger, Wolfgang

Cloud resources can be dynamically provisioned according to application-specific requirements and are payed on a per-use basis. This gives rise to a new concept for parallel processing: Elastic parallel computations. However, it is still an open research question to which extent parallel applications can benefit from elastic scaling, which requires resource adaptation at runtime and corresponding coordination mechanisms. In this work, we analyze how to address these system-level challenges in the context of developing and operating elastic parallel tree search applications. Based on our findings, we discuss the design and implementation of TASKWORK, a cloud-aware runtime system specifically designed for elastic parallel tree search, which enables the implementation of elastic applications by means of higher-level development frameworks. We show how to implement an elastic parallel branch-and-bound application based on an exemplary development framework and report on our experimental evaluation that also considers several benchmarks for parallel tree search.

Elastic parallel systems for high performance cloud computing (2020)

Kehrer, Stefan

High Performance Computing (HPC) enables significant progress in both science and industry. Whereas traditionally parallel applications have been developed to address the grand challenges in science, as of today, they are also heavily used to speed up the time-to-result in the context of product design, production planning, financial risk management, medical diagnosis, as well as research and development efforts. However, purchasing and operating HPC clusters to run these applications requires huge capital expenditures as well as operational knowledge and thus is reserved to large organizations that benefit from economies of scale. More recently, the cloud evolved into an alternative execution environment for parallel applications, which comes with novel characteristics such as on-demand access to compute resources, pay-per-use, and elasticity. Whereas the cloud has been mainly used to operate interactive multi-tier applications, HPC users are also interested in the benefits offered. These include full control of the resource configuration based on virtualization, fast setup times by using on-demand accessible compute resources, and eliminated upfront capital expenditures due to the pay-per-use billing model. Additionally, elasticity allows compute resources to be provisioned and decommissioned at runtime, which allows fine-grained control of an application's performance in terms of its execution time and efficiency as well as the related monetary costs of the computation. Whereas HPC-optimized cloud environments have been introduced by cloud providers such as Amazon Web Services (AWS) and Microsoft Azure, existing parallel architectures are not designed to make use of elasticity. This thesis addresses several challenges in the emergent field of High Performance Cloud Computing. In particular, the presented contributions focus on the novel opportunities and challenges related to elasticity. First, the principles of elastic parallel systems as well as related design considerations are discussed in detail. On this basis, two exemplary elastic parallel system architectures are presented, each of which includes (1) an elasticity controller that controls the number of processing units based on user-defined goals, (2) a cloud-aware parallel execution model that handles coordination and synchronization requirements in an automated manner, and (3) a programming abstraction to ease the implementation of elastic parallel applications. To automate application delivery and deployment, novel approaches are presented that generate the required deployment artifacts from developer-provided source code in an automated manner while considering application-specific non-functional requirements. Throughout this thesis, a broad spectrum of design decisions related to the construction of elastic parallel system architectures is discussed, including proactive and reactive elasticity control mechanisms as well as cloud-based parallel processing with virtual machines (Infrastructure as a Service) and functions (Function as a Service). To evaluate these contributions, extensive experimental evaluations are presented.

Equilibrium : an elasticity controller for parallel tree search in the cloud (2020)

Kehrer, Stefan ; Blochinger, Wolfgang

Elasticity is considered to be the most beneficial characteristic of cloud environments, which distinguishes the cloud from clusters and grids. Whereas elasticity has become mainstream for web-based, interactive applications, it is still a major research challenge how to leverage elasticity for applications from the high-performance computing (HPC) domain, which heavily rely on efficient parallel processing techniques. In this work, we specifically address the challenges of elasticity for parallel tree search applications. Well-known meta-algorithms based on this parallel processing technique include branch-and-bound and backtracking search. We show that their characteristics render static resource provisioning inappropriate and the capability of elastic scaling desirable. Moreover, we discuss how to construct an elasticity controller that reasons about the scaling behavior of a parallel system at runtime and dynamically adapts the number of processing units according to user-defined cost and efficiency thresholds. We evaluate a prototypical elasticity controller based on our findings by employing several benchmarks for parallel tree search and discuss the applicability of the proposed approach. Our experimental results show that, by means of elastic scaling, the performance can be controlled according to user-defined thresholds, which cannot be achieved with static resource provisioning.

Serverless skeletons for elastic parallel processing (2019)

Kehrer, Stefan ; Scheffold, Jochen ; Blochinger, Wolfgang

Serverless computing is an emerging cloud computing paradigm with the goal of freeing developers from resource management issues. As of today, serverless computing platforms are mainly used to process computations triggered by events or user requests that can be executed independently of each other. These workloads benefit from on-demand and elastic compute resources as well as per-function billing. However, it is still an open research question to which extent parallel applications, which comprise most often complex coordination and communication patterns, can benefit from serverless computing. In this paper, we introduce serverless skeletons for parallel cloud programming to free developers from both parallelism and resource management issues. In particular, we investigate on the well known and widely used farm skeleton, which supports the implementation of a wide range of applications. To evaluate our concepts, we present a prototypical development and runtime framework and implement two applications based on our framework: Numerical integration and hyperparameter optimization - a commonly applied technique in machine learning. We report on performance measurements for both applications and discuss the usefulness of our approach.

A survey on cloud migration strategies for high performance computing (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

The cloud evolved into an attractive execution environment for parallel applications from the High Performance Computing (HPC) domain. Existing research recognized that parallel applications require architectural refactoring to benefit from cloud-specific properties (most importantly elasticity). However, architectural refactoring comes with many challenges and cannot be applied to all applications due to fundamental performance issues. Thus, during the last years, different cloud migration strategies have been considered for different classes of parallel applications. In this paper, we provide a survey on HPC cloud migration research. We investigate on the approaches applied and the parallel applications considered. Based on our findings, we identify and describe three cloud migration strategies.

Migrating parallel applications to the cloud: assessing cloud readiness based on parallel design decisions (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

Parallel applications are the computational backbone of major industry trends and grand challenges in science. Whereas these applications are typically constructed for dedicated High Performance Computing clusters and supercomputers, the cloud emerges as attractive execution environment, which provides on-demand resource provisioning and a pay-per-use model. However, cloud environments require specific application properties that may restrict parallel application design. As a result, design trade-offs are required to simultaneously maximize parallel performance and benefit from cloud-specific characteristics. In this paper, we present a novel approach to assess the cloud readiness of parallel applications based on the design decisions made. By discovering and understanding the implications of these parallel design decisions on an application’s cloud readiness, our approach supports the migration of parallel applications to the cloud.We introduce an assessment procedure, its underlying meta model, and a corresponding instantiation to structure this multi-dimensional design space. For evaluation purposes, we present an extensive case study comprising three parallel applications and discuss their cloud readiness based on our approach.

Model-based generation of self-adaptive cloud services (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

An important shift in software delivery is the definition of a cloud service as an independently deployable unit by following the microservices architectural style. Container virtualization facilitates development and deployment by ensuring independence from the runtime environment. Thus, cloud services are built as container based systems - a set of containers that control the lifecycle of software and middleware components. However, using containers leads to a new paradigm for service development and operation: Self service environments enable software developers to deploy and operate container based systems on their own - you build it, you run it. Following this approach, more and more operational aspects are transferred towards the responsibility of software developers. In this work, we propose a concept for self-adaptive cloud services based on container virtualization in line with the microservices architectural style and present a model-based approach that assists software developers in building these services. Based on operational models specified by developers, the mechanisms required for self-adaptation are automatically generated. As a result, each container automatically adapts itself in a reactive, decentralized manner. We evaluate a prototype which leverages the emerging TOSCA standard to specify operational behavior in a portable manner.

TASKWORK: a cloud-aware runtime system for elastic task-parallel HPC applications (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

With the capability of employing virtually unlimited compute resources, the cloud evolved into an attractive execution environment for applications from the High Performance Computing (HPC) domain. By means of elastic scaling, compute resources can be provisioned and decommissioned at runtime. This gives rise to a new concept in HPC: Elasticity of parallel computations. However, it is still an open research question to which extent HPC applications can benefit from elastic scaling and how to leverage elasticity of parallel computations. In this paper, we discuss how to address these challenges for HPC applications with dynamic task parallelism and present TASKWORK, a cloud-aware runtime system based on our findings. TASKWORK enables the implementation of elastic HPC applications by means of higher level development frameworks and solves corresponding coordination problems based on Apache ZooKeeper. For evaluation purposes, we discuss a development framework for parallel branch-and-bound based on TASKWORK, show how to implement an elastic HPC application, and report on measurements with respect to parallel efficiency and elastic scaling.

Container-based module isolation for cloud services (2019)

Kehrer, Stefan ; Riebandt, Florian ; Blochinger, Wolfgang

Due to frequently changing requirements, the internal structure of cloud services is highly dynamic. To ensure flexibility, adaptability, and maintainability for dynamically evolving services, modular software development has become the dominating paradigm. By following this approach, services can be rapidly constructed by composing existing, newly developed and publicly available third-party modules. However, newly added modules might be unstable, resource-intensive, or untrustworthy. Thus, satisfying non-functional requirements such as reliability, efficiency, and security while ensuring rapid release cycles is a challenging task. In this paper, we discuss how to tackle these issues by employing container virtualization to isolate modules from each other according to a specification of isolation constraints. We satisfy non-functional requirements for cloud services by automatically transforming the modules comprised into a container-based system. To deal with the increased overhead that is caused by isolating modules from each other, we calculate the minimum set of containers required to satisfy the isolation constraints specified. Moreover, we present and report on a prototypical transformation pipeline that automatically transforms cloud services developed based on the Java Platform Module System into container-based systems.

Elastic parallel systems for high performance cloud computing: state-of-the-art and future directions (2019)

Kehrer, Stefan ; Blochinger, Wolfgang

With on-demand access to compute resources, pay-per-use, and elasticity, the cloud evolved into an attractive execution environment for High Performance Computing (HPC). Whereas elasticity, which is often referred to as the most beneficial cloud-specific property, has been heavily used in the context of interactive (multi-tier) applications, elasticity-related research in the HPC domain is still in its infancy. Existing parallel computing theory as well as traditional metrics to analytically evaluate parallel systems do not comprehensively consider elasticity, i.e., the ability to control the number of processing units at runtime. To address these issues, we introduce a conceptual framework to understand elasticity in the context of parallel systems, define the term elastic parallel system, and discuss novel metrics for both elasticity control at runtime as well as the ex post performance evaluation of elastic parallel systems. Based on the conceptual framework, we provide an in depth analysis of existing research in the field to describe the state-of-the art and compile our findings into a research agenda for future research on elastic parallel systems.

AUTOGENIC: automated generation of self-configuring microservices (2018)

Kehrer, Stefan ; Blochinger, Wolfgang

The state of the art proposes the microservices architectural style to build applications. Additionally, container virtualization and container management systems evolved into the perfect fit for developing, deploying, and operating microservices in line with the DevOps paradigm. Container virtualization facilitates deployment by ensuring independence from the runtime environment. However, microservices store their configuration in the environment. Therefore, software developers have to wire their microservice implementation with technologies provided by the target runtime environment such as configuration stores and service registries. These technological dependencies counteract the portability benefit of using container virtualization. In this paper, we present AUTOGENIC - a model-based approach to assist software developers in building microservices as self configuring containers without being bound to operational technologies. We provide developers with a simple configuration model to specify configuration operations of containers and automatically generate a self-configuring microservice tailored for the targeted runtime environment. Our approach is supported by a method, which describes the steps to automate the generation of self-configuring microservices. Additionally, we present and evaluate a prototype, which leverages the emerging TOSCA standard.

TOSCA-based container orchestration on Mesos (2018)

Kehrer, Stefan ; Blochinger, Wolfgang

Container virtualization evolved into a key technology for deployment automation in line with the DevOps paradigm. Whereas container management systems facilitate the deployment of cloud applications by employing container based artifacts, parts of the deployment logic have been applied before to build these artifacts. Current approaches do not integrate these two deployment phases in a comprehensive manner. Limited knowledge on application software and middleware encapsulated in container-based artifacts leads to maintainability and configuration issues. Besides, the deployment of cloud applications is based on custom orchestration solutions leading to lock in problems. In this paper, we propose a two-phase deployment method based on the TOSCA standard. We present integration concepts for TOSCA-based orchestration and deployment automation using container-based artifacts. Our two-phase deployment method enables capturing and aligning all the deployment logic related to a software release leading to better maintainability. Furthermore, we build a container management system, which is composed of a TOSCA-based orchestrator on Apache Mesos, to deploy container-based cloud applications automatically.

Multi-perspective decision management for digitization architecture and governance (2016)

Zimmermann, Alfred ; Jugel, Dierk ; Sandkuhl, Kurt ; Schmidt, Rainer ; Bogner, Justus ; Kehrer, Stefan

The internet of things, enterprise social networks, adaptive case management, mobility systems, analytics for big data, and cloud environments are emerging to support smart connected i.e. digital products and services and the digital transformation. Biological metaphors for living and adaptable ecosystems are currently providing the logical foundation for resilient run-time environments with serviceoriented digitization architectures and for self-optimizing intelligent business services and related distributed information systems. We are investigating mechanisms for flexible adaptation and evolution of information systems with digital architecture in the context of the ongoing digital transformation. The goal is to support flexible and agile transformations for both business and related information systems through adaptation and dynamical evolution of their digital architectures. The present research paper investigates mechanisms of decision analytics for digitization architectures, putting a spotlight to internet of things micro-granular architectures, by extending original enterprise architecture reference models with digitization architectures and their multi-perspective architectural decision management.

Categorizing requirements for enterprise architecture management in big data literature (2016)

Kehrer, Stefan ; Jugel, Dierk ; Zimmermann, Alfred

Organizations identified the opportunities of big data analytics to support the business with problem-specific insights through the exploitation of generated data. Sociotechnical solutions are developed in big data projects to reach competitive advantage. Although these projects are aligned to specific business needs, common architectural challenges are not addressed in a comprehensive manner. Enterprise architecture management is a holistic approach to tackle complex business and IT architectures. The transformation of an organization’s EA is influenced by big data transformation processes and their data-driven approach on all layers. In this paper, we review big data literature to analyze which requirements for the EA management discipline are proposed. Based on a systematic literature identification, conceptual categories of requirements for EA management are elicited utilizing an inductive category formation. These conceptual categories of requirements constitute a category system that facilitates a new perspective on EA management and fosters the innovation-driven evolution of the EA management. discipline.

A systematic literature review of big data literature for EA evolution (2016)

Kehrer, Stefan ; Jugel, Dierk ; Zimmermann, Alfred

Many organizations identified the opportunities of big data analytics to support the business with problem-specific insights through the exploitation of generated data. Socio-technical solutions are developed in big data projects to reach competitive advantage. Although these projects are aligned to specific business needs, common architectural challenges are not addressed in a comprehensive manner. Enterprise architecture management is a holistic approach to tackle the complex business and IT architecture. The transformation of an organization's EA is influenced by big data projects and their data-driven approach on all layers. To enable strategy oriented development of the EA it is essential to synchronize these projects supported by EA management. In this paper, we conduct a systematic review of big data literature to analyze which requirements for the EA management discipline are proposed. Thereby, a broad overview about existing research is presented to facilitate a more detailed exploration and to foster the evolution o the EA management discipline.

A decision-making case for collaborative enterprise architecture engineering (2015)

Jugel, Dierk ; Kehrer, Stefan ; Schweda, Christian ; Zimmermann, Alfred

In modern times markets are very dynamic. This situation requires agile enterprises to have the ability to react fast on market influences. Thereby an enterprise’ IT is especially affected, because new or changed business models have to be realized. However, enterprise architectures (EA) are complex structures consisting of many artifacts and relationships between them. Thus analyzing an EA becomes to a complex task for stakeholders. In addition, many stakeholders are involved in decision-making processes, because Enterprise Architecture Management (EAM) targets providing a holistic view of the enterprise. In this article we use concepts of Adaptive Case Management (ACM) to design a decision-making case consisting of a combination of different analysis techniques to support stakeholders in decision-making. We exemplify the case with a scenario of a fictive enterprise.

Goal-oriented decision support in collaborative enterprise architecture (2015)

Hamm, Thomas ; Kehrer, Stefan

Decision-making in the field of Enterprise Architecture (EA) is a complex task. Many organizations establish a set of complex processes and hierarchical structures to enable strategy-driven development of their EA. This leads to slow and inefficient decision-making entailing bad time-to-market and discontented stakeholders. Collaborative EA delineates a lightweight approach to enable EA decisions but often neglects strategic alignment. In this paper, we present an approach to integrate the concept of collaborative EA and goal-driven decision-making through collaborative modeling of goal-oriented information demands based on ArchiMate’s motivation extension to reach a goal-oriented EA decision support in a collaborative EA environment.

Providing EA decision support for stakeholders by automated analyses (2015)

Jugel, Dierk ; Kehrer, Stefan ; Schweda, Christian ; Zimmermann, Alfred

Enterprise architecture management (EAM) is a holistic approach to tackle the complex Business and IT architecture. The transformation of an organization’s EA towards a strategy-oriented system is a continuous task. Many stakeholders have to elaborate on various parts of the EA to reach the best decisions to shape the EA towards an optimized support of the organizations’ capabilities. Since the real world is too complex, analyzing techniques are needed to detect optimization potentials and to get all information needed about an issue. In practice visualizations are commonly used to analyze EAs. However these visualizations are mostly static and do not provide analyses. In this article we combine analyzing techniques from literature and interactive visualizations to support stakeholders in EA decision-making.

Open Access

Informatik

Refine

Author

Year of publication

Document Type

Language

Has full text

Is part of the Bibliography

Institute

Publisher

19 search hits