Refine
Document Type
- Conference proceeding (18)
- Journal article (3)
- Book chapter (3)
- Doctoral Thesis (1)
Language
- English (25)
Has full text
- yes (25)
Is part of the Bibliography
- yes (25)
Institute
- Informatik (25)
Publisher
Modern persistent Key/Value stores are designed to meet the demand for high transactional throughput and high data ingestion rates. Still, they rely on backwards-compatible storage stack and abstractions to ease space management, foster seamless proliferation and system integration. Their dependence on the traditional I/O stack has negative impact on performance, causes unacceptably high write-amplification, and limits the storage longevity.
In the present paper we present NoFTL KV, an approach that results in a lean I/O stack, integrating physical storage management natively in the Key/Value store. NoFTL-KV eliminates backwards compatibility, allowing the Key/Value store to directly consume the characteristics of modern storage technologies. NoFTLKV is implemented under RocksDB. The performance evaluation under LinkBench shows that NoFTL-KV improves transactional throughput by 33%, while response times improve up to 2.3x. Furthermore, NoFTL KV reduces write-amplification 19x and improves storage longevity by imately the same factor.
Modern mixed (HTAP)workloads execute fast update-transactions and long running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead.
Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table – not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets.
In this paper we propose the Multi Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads. MV-PBT offers 40% higher tx. throughput compared to WiredTiger’s LSM-Tree implementation under YCSB.
Rapidly growing data volumes push today's analytical systems close to the feasible processing limit. Massive parallelism is one possible solution to reduce the computational time of analytical algorithms. However, data transfer becomes a significant bottleneck since it blocks system resources moving data-to-code. Technological advances allow to economically place compute units close to storage and perform data processing operations close to data, minimizing data transfers and increasing scalability. Hence the principle of Near Data Processing (NDP) and the shift towards code-to-data. In the present paper we claim that the development of NDP-system architectures becomes an inevitable task in the future. Analytical DBMS like HPE Vertica have multiple points of impact with major advantages which are presented within this paper.
Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications.
At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs.
However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound.
Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located.
Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects.
Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP.
Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput.
Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times.
Even though near-data processing (NDP) can provably reduce data transfers and increase performance, current NDP is solely utilized in read-only settings. Slow or tedious to implement synchronization and invalidation mechanisms between host and smart storage make NDP support for data-intensive update operations difficult. In this paper, we introduce a low-latency cache-coherent shared lock table for update NDP settings in disaggregated memory environments. It utilizes the novel CCIX interconnect technology and is integrated in neoDBMS, a near-data processing DBMS for smart storage. Our evaluation indicates end-to-end lock latencies of ∼80-100ns and robust performance under contention.
Near-data processing in database systems on native computational storage under HTAP workloads
(2022)
Today’s Hybrid Transactional and Analytical Processing (HTAP) systems, tackle the ever-growing data in combination with a mixture of transactional and analytical workloads. While optimizing for aspects such as data freshness and performance isolation, they build on the traditional data-to-code principle and may trigger massive cold data transfers that impair the overall performance and scalability. Firstly, in this paper we show that Near-Data Processing (NDP) naturally fits in the HTAP design space. Secondly, we propose an NDP database architecture, allowing transactionally consistent in-situ executions of analytical operations in HTAP settings. We evaluate the proposed architecture in state-of-the-art key/value-stores and multi-versioned DBMS. In contrast to traditional setups, our approach yields robust, resource- and cost-effcient performance.
Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.
Real Time Charging (RTC) applications that reside in the telecommunications domain have the need for extremely fast database transactions. Today´s providers rely mostly on in-memory databases for this kind of information processing. A flexible and modular benchmark suite specifically designed for this domain provides a valuable framework to test the performance of different DB candidates. Besides a data and a load generator, the suite also includes decoupled database connectors and use case components for convenient customization and extension. Such easily produced test results can be used as guidance for choosing a subset of candidates for further tuning/testing and finally evaluating the database most suited to the chosen use cases. This is why our benchmark suite can be of value for choosing databases for RTC use cases.
Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash.It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications.The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.
Database Management Systems (DBMS) need to handle large updatable datasets in on-line transaction processing (OLTP) workloads. Most modern DBMS provide snapshots of data in multi-version concurrency control (MVCC) transaction management scheme. Each transaction operates on a snapshot of the database, which is calculated from a set of tuple versions. High parallelism and resource-efficient append-only data placement on secondary storage is enabled. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes.
Partitioned B-Trees (PBT) [5] is based on the structure of the ubiquitous B+ Tree [8]. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. Yet they have not been implemented in a MVCC enabled DBMS to date.
In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL’s B+–Trees PBTs have 50% better transaction throughput under TPC-C and a 30% improvement to standard PostgreSQL with Heap-Only Tuples.