Refine
Document Type
- Conference proceeding (38)
- Journal article (5)
- Book chapter (5)
Has full text
- yes (48)
Is part of the Bibliography
- yes (48)
Institute
- Informatik (48)
Publisher
Introduction to the special issue on self‑managing and hardware‑optimized database systems 2022
(2023)
Data management systems have evolved in terms of functionality, performance characteristics, complexity, and variety during the last 40 years. Particularly, the relational database management systems and the big data systems (e.g., Key-Value stores, Document stores, Graph stores and Graph Computation Systems, Spark, MapReduce/Hadoop, or Data Stream Processing Systems) have evolved with novel additions and extensions. However, the systems administration and tasks have become highly complex and expensive, especially given the simultaneous and rapid hardware evolution in processors, memory, storage, or networking. These developments present new open problems and challenges to data management systems as well as new opportunities.
The SMDB (International Workshop on Self-Managing Database Systems) and HardBD&Active (Joint International Workshop on Big Data Management on Emerging Hardware and Data Management on Virtualized Active Systems) workshops organized in conjunction with the IEEE ICDE (International Conference on Data Engineering) offered two distinct platforms for examining the above system-related challenges from different perspectives. The SMDB workshop looks into developing autonomic or self-* features in database and data management systems to tackle complex administrative tasks, while the HardBD&Active workshop focuses on harnessing hardware technologies to enhance efficiency and performance of data processing and management tasks. As a result of these workshops, we are delighted to present the third special issue of DAPD titled “Self-Managing and Hardware-Optimized Database Systems 2022,” which showcases the best contributions from the SMDB 2021/2022 and HardBD&Active 2021/2022 workshops.
Database Management Systems (DBMS) need to handle large updatable datasets in on-line transaction processing (OLTP) workloads. Most modern DBMS provide snapshots of data in multi-version concurrency control (MVCC) transaction management scheme. Each transaction operates on a snapshot of the database, which is calculated from a set of tuple versions. High parallelism and resource-efficient append-only data placement on secondary storage is enabled. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes.
Partitioned B-Trees (PBT) [5] is based on the structure of the ubiquitous B+ Tree [8]. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. Yet they have not been implemented in a MVCC enabled DBMS to date.
In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL’s B+–Trees PBTs have 50% better transaction throughput under TPC-C and a 30% improvement to standard PostgreSQL with Heap-Only Tuples.
In the present paper we demonstrate a novel approach to handling small updates on Flash called In-Place Appends (IPA). It allows the DBMS to revisit the traditional write behavior on Flash. Instead of writing whole database pages upon an update in an out-of-place manner on Flash, we transform those small updates into update deltas and append them to a reserved area on the very same physical Flash page. In doing so we utilize the commonly ignored fact that under certain conditions Flash memories can support in-place updates to Flash pages without a preceding erase operation.
The approach was implemented under Shore-MT and evaluated on real hardware. Under standard update-intensive workloads we observed 67% less page invalidations resulting in 80% lower garbage collection overhead, which yields a 45% increase in transactional throughput, while doubling Flash longevity at the same time. The IPA outperforms In-Page Logging (IPL) by more than 50%.
We showcase a Shore-MT based prototype of the above approach, operating on real Flash hardware – the OpenSSD Flash research platform. During the demonstration we allow the users to interact with the system and gain hands on experience of its performance under different demonstration scenarios. These involve various workloads such as TPC-B, TPC-C or TATP.
A transaction is a demarcated sequence of application operations, for which the following properties are guaranteed by the underlying transaction processing system (TPS): atomicity, consistency, isolation, and durability (ACID). Transactions are therefore a general abstraction, provided by TPS that simplifies application development by relieving transactional applications from the burden of concurrency and failure handling. Apart from the ACID properties, a TPS must guarantee high and robust performance (high transactional throughput and low response times), high reliability (no data loss, ability to recover last consistent state, fault tolerance), and high availability (infrequent outages, short recovery times).
The architectures and workhorse algorithms of a high-performance TPS are built around the properties of the underlying hardware. The introduction of nonvolatile memories (NVM) as novel storage technology opens an entire new problem space, with the need to revise aspects such as the virtual memory hierarchy, storage management and data placement, access paths, and indexing. NVM are also referred to as storage-class memory (SCM).
Under update intensive workloads (TPC, LinkBench) small updates dominate the write behavior, e.g. 70% of all updates change less than 10 bytes across all TPC OLTP workloads. These are typically performed as in-place updates and result in random writes in page-granularity, causing major write-overhead on Flash storage, a write amplification of several hundred times and lower device longevity.
In this paper we propose an approach that transforms those small in-place updates into small update deltas that are appended to the original page. We utilize the commonly ignored fact that modern Flash memories (SLC, MLC, 3D NAND) can handle appends to already programmed physical pages by using various low-level techniques such as ISPP to avoid expensive erases and page migrations. Furthermore, we extend the traditional NSM page-layout with a delta-record area that can absorb those small updates. We propose a scheme to control the write behavior as well as the space allocation and sizing of database pages.
The proposed approach has been implemented under Shore- MT and evaluated on real Flash hardware (OpenSSD) and a Flash emulator. Compared to In-Page Logging it performs up to 62% less reads and writes and up to 74% less erases on a range of workloads. The experimental evaluation indicates: (i) significant reduction of erase operations resulting in twice the longevity of Flash devices under update-intensive workloads; (ii) 15%-60% lower read/write I/O latencies; (iii) up to 45% higher transactional throughput; (iv) 2x to 3x reduction in overall write
amplification.
Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash.It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications.The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.
Blockchains yield to new workloads in database management systems and K/V-stores. Distributed Ledger Technology (DLT) is a technique for managing transactions in ’trustless’ distributed systems. Yet, clients of nodes in blockchain networks are backed by ’trustworthy’ K/V-Stores, like LevelDB or RocksDB in Ethereum, which are based on Log-Structured Merge Trees (LSM Trees). However, LSM-Trees do not fully match the properties of blockchains and enterprise workloads.
In this paper, we claim that Partitioned B-Trees (PBT) fit the properties of this DLT: uniformly distributed hash keys, immutability, consensus, invalid blocks, unspent and off-chain transactions, reorganization and data state / version ordering in a distributed log-structure. PBT can locate records of newly inserted key-value pairs, as well as data of unspent transactions, in separate partitions in main memory. Once several blocks acquire consensus, PBTs evict a whole partition, which becomes immutable, to secondary storage. This behavior minimizes write amplification and enables a beneficial sequential write pattern on modern hardware. Furthermore, DLT implicate some type of log-based versioning. PBTs can serve as MV-store for data storage of logical blocks and indexing in multi-version concurrency control (MVCC) transaction processing.
In this paper we build on our research in data management on native Flash storage. In particular we demonstrate the advantages of intelligent data placement strategies. To effectively manage phsical Flash space and organize the data on it, we utilize novel storage structures such as regions and groups. These are coupled to common DBMS logical structures, thus require no extra overhead for the DBA. The experimental results indicate an improvement of up to 2x, which doubles the longevity of Flash SSD. During the demonstration the audience can experience the advantages of the proposed approach on real Flash hardware.
In the present tutorial we perform a cross-cut analysis of database systems from the perspective of modern storage technology, namely Flash memory. We argue that neither the design of modern DBMS, nor the architecture of flash storage technologies are aligned with each other. The result is needlessly suboptimal DBMS performance and inefficient flash utilisation as well as low flash storage endurance and reliability. We showcase new DBMS approaches with improved algorithms and leaner architectures, designed to leverage the properties of modern storage technologies. We cover the area of transaction management and multi-versioning, putting a special emphasis on: (i) version organisation models and invalidation mechanisms in multi-versioning DBMS; (ii) Flash storage management especially on append-based storage in tuple granularity; (iii) Flash-friendly buffer management; as well as (iv) improvements in the searching and indexing models. Furthermore, we present our NoFTL approach to native Flash access that integrates parts of the flash-management functionality into the DBMS yielding significant performance increase and simplification of the I/O stack. In addition, we cover the basics of building large Flash storage for DBMS and revisit some of the RAID techniques and principles.
The use of Wireless Sensor and Actuator Networks (WSAN) as an enabling technology for Cyber-Physical Systems has increased significantly in recent past. The challenges that arise in different application areas of Cyber- Physical Systems, in general, and in WSAN in particular, are getting the attention of academia and industry both. Since reliability issues for message delivery in wireless communication are of critical importance for certain safety related applications, it is one of the areas that has received significant focus in the research community. Additionally, the diverse needs of different applications put different demands on the lower layers in the protocol stack, thus necessitating such mechanisms in place in the lower layers which enable them to dynamically adapt. Another major issue in the realization of networked wirelessly communicating cyber-physical systems, in general, and WSAN, in particular, is the lack of approaches that tackle the reliability, configurability and application awareness issues together. One could consider tackling these issues in isolation. However, the interplay between these issues create such challenges that make the application developers spend more time on meeting these challenges, and that too not in very optimal ways, than spending their time on solving the problems related to the application being developed. Starting from some fundamental concepts, general issues and problems in cyber-physical systems, this chapter discusses such issues like energy-efficiency, application and channel-awareness for networked wirelessly communicating cyber-physical systems. Additionally, the chapter describes a middleware approach called CEACH, which is an acronym for Configurable, Energy-efficient, Application- and Channel-aware Clustering based middleware service for cyber-physical systems. The state of-the art in the area of cyberphysical systems with a special focus on communication reliability, configurability, application- and channel-awareness is described in the chapter. The chapter also describes how these features have been considered in the CEACH approach. Important node level and network level characteristics and their significance vis-àvis the design of applications for cyber physical systems is also discussed. The issue of adaptively controlling the impact of these factors vis-à-vis the application demands and network conditions is also discussed. The chapter also includes a description of Fuzzy-CEACH which is an extension of CEACH middleware service and which uses fuzzy logic principles. The fuzzy descriptors used in different stages of Fuzzy-CEACH have also been described. The fuzzy inference engine used in the Fuzzy-CEACH cluster head election process is described in detail. The Rule-Bases used by fuzzy inference engine in different stages of Fuzzy-CEACH is also included to show an insightful description of the protocol. The chapter also discusses in detail the experimental results validating the authenticity of the presented concepts in the CEACH approach. The applicability of the CEACH middleware service in different application scenarios in the domain of cyberphysical systems is also discussed. The chapter concludes by shedding light on the Publish-Subscribe mechanisms in distributed event-based systems and showing how they can make use of the CEACH middleware to reliably communicate detected events to the event-consumers or the actuators if the WSAN is modeled as a distributed event-based system.