Informatik
Refine
Document Type
- Conference proceeding (34)
- Journal article (4)
Is part of the Bibliography
- yes (38)
Institute
- Informatik (38)
- Technik (1)
Publisher
- Association for Computing Machinery (38) (remove)
Many modern DBMS architectures require transferring data from storage to process it afterwards. Given the continuously increasing amounts of data, data transfers quickly become a scalability limiting factor. Near-Data Processing and smart/computational storage emerge as promising trends allowing for decoupled in-situ operation execution, data transfer reduction and better bandwidth utilization. However, not every operation is suitable for an in-situ execution and a careful placement and optimization is needed.
In this paper we present an NDP-aware cost model. It has been implemented in MySQL and evaluated with nKV. We make several observations underscoring the need for optimization.
In this paper we describe an interactive web-based tool for visual analysis of Formula 1 data. A calendar-like representation provides an overview of all races on a yearly basis, either in absolute or normalized time. After selecting a dedicated race more details about this race can be explored. Furthermore it is possible to compare up to three different races. Beside visualizing details on dedicated races it is also possible to analyse driver and team performance over time. A user study was applied to get feedback about the usage of the application and decide between different visualization options.
Painting galleries typically provide a wealth of data composed of several data types. Those multivariate data are too complex for laymen like museum visitors to first, get an overview about all paintings and to look for specific categories. Finally, the goal is to guide the visitor to a specific painting that he wishes to have a more closer look on. In this paper we describe an interactive visualization tool that first provides such an overview and lets people experiment with the more than 41,000 paintings collected in the web gallery of art. To generate such an interactive tool, our technique is composed of different steps like data handling, algorithmic transformations, visualizations, interactions, and the human user working with the tool with the goal to detect insights in the provided data. We illustrate the usefulness of the visualization tool by applying it to such characteristic data and show how one can get from an overview about all paintings to specific paintings.
In a time of digital transformation, the ability to quickly and efficiently adapt software systems to changed business requirements becomes more important than ever. Measuring the maintainability of software is therefore crucial for the long-term management of such products. With service-based systems (SBSs) being a very important form of enterprise software, we present a holistic overview of such metrics specifically designed for this type of system, since traditional metrics – e.g. object oriented ones – are not fully applicable in this case. The selected metric candidates from the literature review were mapped to 4 dominant design properties: size, complexity, coupling, and cohesion. Microservice-based systems (μSBSs) emerge as an agile and fine grained variant of SBSs. While the majority of identified metrics are also applicable to this specialization (with some limitations), the large number of services in combination with technological heterogeneity and decentralization of control significantly impacts automatic metric collection in such a system. Our research therefore suggests that specialized tool support is required to guarantee the practical applicability of the presented metrics to μSBSs.
The introduction of smart contracts has expanded the applicability of blockchains to many domains beyond finance and cryptocurrencies. Moreover, different blockchain technologies have evolved that target special requirements. As a result, in practice, often a combination of different blockchain systems is required to achieve an overall goal. However, due to the heterogeneity of blockchain protocols, the execution of distributed business transactions that span several blockchains leads to multiple interoperability and integration challenges. Therefore, in this article, we examine the domain of Cross-Chain Smart Contract Invocations (CCSCIs), which are distributed transactions that involve the invocation of smart contracts hosted on two or more blockchain systems. We conduct a systematic multi-vocal literature review to get an overview of the available CCSCI approaches. We select 20 formal literature studies and 13 high-quality gray literature studies, extract data from them, and analyze it to derive the CCSCI Classification Framework. With the help of the framework, we group the approaches into two categories and eight subcategories. The approaches differ in multiple characteristics, e.g., the mechanisms they follow, and the capabilities and transaction processing semantics they offer. Our analysis indicates that all approaches suffer from obstacles that complicate real-world adoption, such as the low support for handling heterogeneity and the need for trusted third parties.
Through increasing market dynamics, rapidly evolving technologies and shifting user expectations coupled with the adoption of lean and agile practices, companies are struggling with their ability to provide reliable product roadmaps by applying traditional approaches. Currently, most companies are seeking opportunities to improve their product roadmapping practices. As a first challenge they have to assess their current product roadmapping capabilities in order to better understand how to improve their practices and how to switch to a new approach. The aim of this article is to provide an initial maturity model for product roadmapping practices that is especially suited for assessing the roadmapping capabilities of companies operating in dynamic and uncertain market environments. Based on interviews with 15 experts from 13 various companies the current state of practice regarding product roadmapping was identified. Afterwards, the model development was conducted in the context of expert workshops with the Robert Bosch GmbH and researchers. The study results in the so-called DEEP 1.0 product roadmap maturity model which allows companies to conduct a self assessment of their product roadmapping practice.
Deep learning-based EEG detection of mental alertness states from drivers under ethical aspects
(2021)
One of the most critical factors for a successful road trip is a high degree of alertness while driving. Even a split second of inattention or sleepiness in a crucial moment, will make the difference between life and death. Several prestigious car manufacturers are currently pursuing the aim of automated drowsiness identification to resolve this problem. The path between neuro-scientific research in connection with artificial intelligence and the preservation of the dignity of human individual’s and its inviolability, is very narrow. The key contribution of this work is a system of data analysis for EEGs during a driving session, which draws on previous studies analyzing heart rate (ECG), brain waves (EEG), and eye function (EOG). The gathered data is hereby treated as sensitive as possible, taking ethical regulations into consideration. Obtaining evaluable signs of evolving exhaustion includes techniques that obtain sleeping stage frequencies, problematic are hereby the correlated interference’s in the signal. This research focuses on a processing chain for EEG band splitting that involves band-pass filtering, principal component analysis (PCA), independent component analysis (ICA) with automatic artefact severance, and fast fourier transformation (FFT). The classification is based on a step-by-step adaptive deep learning analysis that detects theta rhythms as a drowsiness predictor in the pre-processed data. It was possible to obtain an offline detection rate of 89% and an online detection rate of 73%. The method is linked to the simulated driving scenario for which it was developed. This leaves space for more optimization on laboratory methods and data collection during wakefulness-dependent operations.
Selecting a suitable development method for a specific project context is one of the most challenging activities in process design. Every project is unique and, thus, many context factors have to be considered. Recent research took some initial steps towards statistically constructing hybrid development methods, yet, paid little attention to the peculiarities of context factors influencing method and practice selection. In this paper, we utilize exploratory factor analysis and logistic regression analysis to learn such context factors and to identify methods that are correlated with these factors. Our analysis is based on 829 data points from the HELENA dataset. We provide five base clusters of methods consisting of up to 10 methods that lay the foundation for devising hybrid development methods. The analysis of the five clusters using trained models reveals only a few context factors, e.g., project/product size and target application domain, that seem to significantly influence the selection of methods. An extended descriptive analysis of these practices in the context of the identified method clusters also suggests a consolidation of the relevant practice sets used in specific project contexts.
Blockchains yield to new workloads in database management systems and K/V-stores. Distributed Ledger Technology (DLT) is a technique for managing transactions in ’trustless’ distributed systems. Yet, clients of nodes in blockchain networks are backed by ’trustworthy’ K/V-Stores, like LevelDB or RocksDB in Ethereum, which are based on Log-Structured Merge Trees (LSM Trees). However, LSM-Trees do not fully match the properties of blockchains and enterprise workloads.
In this paper, we claim that Partitioned B-Trees (PBT) fit the properties of this DLT: uniformly distributed hash keys, immutability, consensus, invalid blocks, unspent and off-chain transactions, reorganization and data state / version ordering in a distributed log-structure. PBT can locate records of newly inserted key-value pairs, as well as data of unspent transactions, in separate partitions in main memory. Once several blocks acquire consensus, PBTs evict a whole partition, which becomes immutable, to secondary storage. This behavior minimizes write amplification and enables a beneficial sequential write pattern on modern hardware. Furthermore, DLT implicate some type of log-based versioning. PBTs can serve as MV-store for data storage of logical blocks and indexing in multi-version concurrency control (MVCC) transaction processing.
First International Workshop on Hybrid dEveLopmENt Approaches in Software Systems Development
(2017)
A software process is the game plan to organize project teams and run projects. Yet, it still is a challenge to select the appropriate development approach for the respective context. A multitude of development approaches compete for the users’ favor, but there is no silver bullet serving all possible setups. Moreover, recent research as well as experience from practice shows companies utilizing different development approaches to assemble the bestfitting approach for the respective company: a more traditional process provides the basic framework to serve the organization, while project teams embody this framework with more agile (and/or lean) practices to keep their flexibility. The first HELENA workshop aims to bring together the community to discuss recent findings and to steer future work.
Under update intensive workloads (TPC, LinkBench) small updates dominate the write behavior, e.g. 70% of all updates change less than 10 bytes across all TPC OLTP workloads. These are typically performed as in-place updates and result in random writes in page-granularity, causing major write-overhead on Flash storage, a write amplification of several hundred times and lower device longevity.
In this paper we propose an approach that transforms those small in-place updates into small update deltas that are appended to the original page. We utilize the commonly ignored fact that modern Flash memories (SLC, MLC, 3D NAND) can handle appends to already programmed physical pages by using various low-level techniques such as ISPP to avoid expensive erases and page migrations. Furthermore, we extend the traditional NSM page-layout with a delta-record area that can absorb those small updates. We propose a scheme to control the write behavior as well as the space allocation and sizing of database pages.
The proposed approach has been implemented under Shore- MT and evaluated on real Flash hardware (OpenSSD) and a Flash emulator. Compared to In-Page Logging it performs up to 62% less reads and writes and up to 74% less erases on a range of workloads. The experimental evaluation indicates: (i) significant reduction of erase operations resulting in twice the longevity of Flash devices under update-intensive workloads; (ii) 15%-60% lower read/write I/O latencies; (iii) up to 45% higher transactional throughput; (iv) 2x to 3x reduction in overall write
amplification.
Database Management Systems (DBMS) need to handle large updatable datasets in on-line transaction processing (OLTP) workloads. Most modern DBMS provide snapshots of data in multi-version concurrency control (MVCC) transaction management scheme. Each transaction operates on a snapshot of the database, which is calculated from a set of tuple versions. High parallelism and resource-efficient append-only data placement on secondary storage is enabled. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes.
Partitioned B-Trees (PBT) [5] is based on the structure of the ubiquitous B+ Tree [8]. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. Yet they have not been implemented in a MVCC enabled DBMS to date.
In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL’s B+–Trees PBTs have 50% better transaction throughput under TPC-C and a 30% improvement to standard PostgreSQL with Heap-Only Tuples.
This is a report from a one-day fourth international workshop on "Information Systems in Distributed Environments" (ISDE), which was organized in conjunction with the OnTheMove Federated Conferences & Workshops (OTM 2014) October 29-30, 2014, Amantea, Calabria, Italy. The main focus of this event was to provide a venue for the discussion of challenges related to the development, operation, and maintenance of distributed information systems, and their creation in the context of global development projects. Further dissemination of research results will lead to an improvement of distributed information system development and deployment across the globe.
Unter dem Begriff Innovation Enabling wird im Folgenden ein Konzept für die ganzheitliche Unterstützung interdisziplinärer Teams beim kreativen und innovativen Problemlösen vor-gestellt. Dieses Konzept unterstützt Moderatoren und Teilnehmergleichermaßen und ein damit realisiertes System bleibt durch die implizite Interaktion für den Nutzer im Hintergrund. Eine zentrale Rolle spielt das Konzept der Awareness Pipeline zur Implementation einer impliziten Interaktion auf Basis eines Sensor-Aktor-Systems, welches in diesem Artikel vorgestellt wird. Die Unterstützung der begleitenden Moderations- und Administrationsaufgaben, wie beispielsweise der automatisierten Dokumentation der Sitzung, sollen in Zukunft einen deutlichen Mehrwert gegenüber einer klassischen Brainstorming-Sitzung bieten.
We introduce IPA-IDX – an approach to handle index modifications modern storage technologies (NVM, Flash) as physical in-place appends, using simplified physiological log records. IPA-IDX provides similar performance and longevity advantages for indexes as basic IPA [5] does for tables. The selective application of IPA-IDX and basic IPA to certain regions and objects, lowers the GC overhead by over 60%, while keeping the total space overhead to 2%. The combined effect of IPA and IPA-IDX increases performance by 28%.
Maintainability assurance techniques are used to control this quality attribute and limit the accumulation of potentially unknown technical debt. Since the industry state of practice and especially the handling of service- and microservice-based systems in this regard are not well covered in scientific literature, we created a survey to gather evidence for a) used processes, tools, and metrics in the industry, b) maintainability-related treatment of systems based on service orientation, and c) influences on developer satisfaction w.r.t. maintainability. 60 software professionals responded to our online questionnaire. The results indicate that using explicit and systematic techniques has benefits for maintainability. The more sophisticated the applied methods the more satisfied participants were with the maintainability of their software while no link to a hindrance in productivity could be established. Other important findings were the absence of architecture-level evolvability control mechanisms as well as a significant neglect of service-oriented particularities for quality assurance. The results suggest that industry has to improve its quality control in these regards to avoid problems with long living service-based software systems.
Characteristics of modern computing and storage technologies fundamentally differ from traditional hardware. There is a need to optimally leverage their performance, endurance and energy consumption characteristics. Therefore, existing architectures and algorithms in modern high performance database management systems have to be redesigned and advanced. Multi Version Concurrency Control (MVCC) approaches in data-base management systems maintain multiple physically independent tuple versions. Snapshot isolation approaches enable high parallelism and concurrency in workloads with almost serializable consistency level. Modern hardware technologies benefit from multi-version approaches. Indexing multi-version data on modern hardware is still an open research area. In this paper, we provide a survey of popular multi-version indexing approaches and an extended scope of high performance single-version approaches. An optimal multi-version index structure brings look-up efficiency of tuple versions, which are visible to transactions, and effort on index maintenance in balance for different workloads on modern hardware technologies.
An index in a Multi-Version DBMS (MV-DBMS) has to reflect different tuple versions of a single data item. Existing approaches follow the paradigm of logically separating the tuple version data from the data item, e.g. an index is only allowed to return at most one version of a single data item (while it may return multiple data items that match a search criteria). Hence to determine the valid (and therefore visible) tuple version of a data item, the MV-DBMS first fetches all tuple versions that match the search criteria and subsequently filters visible versions using visibility checks. This involves I/O storage accesses to tuple versions that do not have to be fetched. In this vision paper we present the Multi Version Index (MV-IDX) approach that allows index-only visibility checks which significantly reduce the amount of I/O storage accesses as well as the index maintenance overhead. The MV-IDX achieves significantly lower response times and higher transactional throughput on OLTP workloads.
nKV in action: accelerating KVstores on native computational storage with NearData processing
(2020)
Massive data transfers in modern data intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, has yet to see widespread use.
In this paper we demonstrate various NDP alternatives in nKV, which is a key/value store utilizing native computational storage and near-data processing. We showcase the execution of classical operations (GET, SCAN) and complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4x-2.7x better performance due to NDP. nKV runs on real hardware - the COSMOS+ platform.
Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use.
In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4×-2.7× better performance on real hardware – the COSMOS+ platform.