Refine
Document Type
- Conference proceeding (38) (remove)
Has full text
- yes (38)
Is part of the Bibliography
- yes (38)
Institute
- Informatik (38)
Publisher
- Association for Computing Machinery (14)
- IEEE (7)
- OpenProceedings (4)
- Springer (4)
- Universität Konstanz (3)
- IARIA (2)
- CIDR (1)
- Gesellschaft für Informatik e.V (1)
- SciTePress (1)
- Universität Trier (1)
Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash.It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications.The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.
The amount of image data has been rising exponentially over the last decades due to numerous trends like social networks, smartphones, automotive, biology, medicine and robotics. Traditionally, file systems are used as storage. Although they are easy to use and can handle large data volumes, they are suboptimal for efficient sequential image processing due to the limitation of data organisation on single images. Database systems and especially column-stores support more stuctured storage and access methods on the raw data level for entiere series.
In this paper we propose definitions of various layouts for an efficient storage of raw image data and metadata in a column store. These schemes are designed to improve the runtime behaviour of image processing operations. We present a tool called column-store Image Processing Toolbox (cIPT) allowing to easily combine the data layouts and operations for different image processing scenarios.
The experimental evaluation of a classification task on a real world image dataset indicates a performance increase of up to 15x on a column store compared to a traditional row-store (PostgreSQL) while the space consumption is reduced 7x. With these results cIPT provides the basis for a future mature database feature.
Current data-intensive systems suffer from scalability as they transfer massive amounts of data to the host DBMS to process it there. Novel near-data processing (NDP) DBMS architectures and smart storage can provably reduce the impact of raw data movement. However, transferring the result-set of an NDP operation may increase the data movement, and thus, the performance overhead. In this paper, we introduce a set of in-situ NDP result-set management techniques, such as spilling, materialization, and reuse. Our evaluation indicates a performance improvement of 1.13 × to 400 ×.
Modern persistent Key/Value stores are designed to meet the demand for high transactional throughput and high data ingestion rates. Still, they rely on backwards-compatible storage stack and abstractions to ease space management, foster seamless proliferation and system integration. Their dependence on the traditional I/O stack has negative impact on performance, causes unacceptably high write-amplification, and limits the storage longevity.
In the present paper we present NoFTL KV, an approach that results in a lean I/O stack, integrating physical storage management natively in the Key/Value store. NoFTL-KV eliminates backwards compatibility, allowing the Key/Value store to directly consume the characteristics of modern storage technologies. NoFTLKV is implemented under RocksDB. The performance evaluation under LinkBench shows that NoFTL-KV improves transactional throughput by 33%, while response times improve up to 2.3x. Furthermore, NoFTL KV reduces write-amplification 19x and improves storage longevity by imately the same factor.
Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries.
In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.
Massive data transfers in modern data intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become viable.
The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under NoFTL-KV and the COSMOS hardware platform.
Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use.
In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4×-2.7× better performance on real hardware – the COSMOS+ platform.
For a long time, most discrete accelerators have been attached to host systems using various generations of the PCI Express interface. However, with its lack of support for coherency between accelerator and host caches, fine-grained interactions require frequent cache-flushes, or even the use of inefficient uncached memory regions. The Cache Coherent Interconnect for Accelerators (CCIX) was the first multi-vendor standard for enabling cache-coherent host-accelerator attachments, and already is indicative of the capabilities of upcoming standards such as Compute Express Link (CXL). In our work, we compare and contrast the use of CCIX with PCIe when interfacing an ARM-based host with two generations of CCIX-enabled FPGAs. We provide both low-level throughput and latency measurements for accesses and address translation, as well as examine an application-level use-case of using CCIX for fine-grained synchronization in an FPGA-accelerated database system. We can show that especially smaller reads from the FPGA to the host can benefit from CCIX by having roughly 33% shorter latency than PCIe. Small writes to the host have a latency roughly 32% higher than PCIe, though, since they carry a higher coherency overhead. For the database use-case, the use of CCIX allowed to maintain a constant synchronization latency even with heavy host-FPGA parallelism.
Near-Data Processing (NDP) is a key computing paradigm for reducing the ever growing time and energy costs of data transport versus computations. With their flexibility, FPGAs are an especially suitable compute element for NDP scenarios. Even more promising is the exploitation of novel and future non-volatile memory (NVM) technologies for NDP, which aim to achieve DRAM-like latencies and throughputs, while providing large capacity non-volatile storage.
Experimentation in using FPGAs in such NVM-NDP scenarios has been hindered, though, by the fact that the NVM devices/FPGA boards are still very rare and/or expensive. It thus becomes useful to emulate the access characteristics of current and future NVMs using off-the-shelf DRAMs. If such emulation is sufficiently accurate, the resulting FPGA-based NDP computing elements can be used for actual full-stack hardware/software benchmarking, e.g., when employed to accelerate a database.
For this use, we present NVMulator, an open-source easy-to-use hardware emulation module that can be seamlessly inserted between the NDP processing elements on the FPGA and a conventional DRAM-based memory system. We demonstrate that, with suitable parametrization, the emulated NVM can come very close to the performance characteristics of actual NVM technologies, specifically Intel Optane. We achieve 0.62% and 1.7% accuracy for cache line sized accesses for read and write operations, while utilizing only 0.54% of LUT logic resources on a Xilinx/AMD AU280 UltraScale+ FPGA board. We consider both file-system as well as database access patterns, examining the operation of the RocksDB database when running on real or emulated Optane-technology memories.
When forecasting sales figures, not only the sales history but also the future price of a product will influence the sales quantity. At first sight, multivariate time series seem to be the appropriate model for this task. Nontheless, in real life history is not always repeatable, i.e. in the case of sales history there is only one price for a product at a given time. This complicates the design of a multivariate time series. However, for some seasonal or perishable products the price is rather a function of the expiration date than of the sales history. This additional information can help to design a more accurate and causal time series model. The proposed solution uses an univariate time series model but takes the price of a product as a parameter that influences systematically the prediction. The price influence is computed based on historical sales data using correlation analysis and adjustable price ranges to identify products with comparable history. Compared to other techniques this novel approach is easy to compute and allows to preset the price parameter for predictions and simulations. Tests with data from the Data Mining Cup 2012 demonstrate better results than established sophisticated time series methods.
Blockchains yield to new workloads in database management systems and K/V-stores. Distributed Ledger Technology (DLT) is a technique for managing transactions in ’trustless’ distributed systems. Yet, clients of nodes in blockchain networks are backed by ’trustworthy’ K/V-Stores, like LevelDB or RocksDB in Ethereum, which are based on Log-Structured Merge Trees (LSM Trees). However, LSM-Trees do not fully match the properties of blockchains and enterprise workloads.
In this paper, we claim that Partitioned B-Trees (PBT) fit the properties of this DLT: uniformly distributed hash keys, immutability, consensus, invalid blocks, unspent and off-chain transactions, reorganization and data state / version ordering in a distributed log-structure. PBT can locate records of newly inserted key-value pairs, as well as data of unspent transactions, in separate partitions in main memory. Once several blocks acquire consensus, PBTs evict a whole partition, which becomes immutable, to secondary storage. This behavior minimizes write amplification and enables a beneficial sequential write pattern on modern hardware. Furthermore, DLT implicate some type of log-based versioning. PBTs can serve as MV-store for data storage of logical blocks and indexing in multi-version concurrency control (MVCC) transaction processing.
Characteristics of modern computing and storage technologies fundamentally differ from traditional hardware. There is a need to optimally leverage their performance, endurance and energy consumption characteristics. Therefore, existing architectures and algorithms in modern high performance database management systems have to be redesigned and advanced. Multi Version Concurrency Control (MVCC) approaches in data-base management systems maintain multiple physically independent tuple versions. Snapshot isolation approaches enable high parallelism and concurrency in workloads with almost serializable consistency level. Modern hardware technologies benefit from multi-version approaches. Indexing multi-version data on modern hardware is still an open research area. In this paper, we provide a survey of popular multi-version indexing approaches and an extended scope of high performance single-version approaches. An optimal multi-version index structure brings look-up efficiency of tuple versions, which are visible to transactions, and effort on index maintenance in balance for different workloads on modern hardware technologies.
Database management systems (DBMS) are critical performance components in large scale applications under modern update intensive workloads. Additional access paths accelerate look-up performance in DBMS for frequently queried attributes, but the required maintenance slows down update performance. The ubiquitous B+ tree is a commonly used key-indexed access path that is able to support many required functionalities with logarithmic access time to requested records. Modern processing and storage technologies and their characteristics require reconsideration of matured indexing approaches for today's workloads. Partitioned B-trees (PBT) leverage characteristics of modern hardware technologies and complex memory hierarchies as well as high update rates and changes in workloads by maintaining partitions within one single B+-Tree. This paper includes an experimental evaluation of PBTs optimized write pattern and performance improvements. With PBT transactional throughput under TPC-C increases 30%; PBT results in beneficial sequential write patterns even in presence of updates and maintenance operations.
Database Management Systems (DBMS) need to handle large updatable datasets in on-line transaction processing (OLTP) workloads. Most modern DBMS provide snapshots of data in multi-version concurrency control (MVCC) transaction management scheme. Each transaction operates on a snapshot of the database, which is calculated from a set of tuple versions. High parallelism and resource-efficient append-only data placement on secondary storage is enabled. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes.
Partitioned B-Trees (PBT) [5] is based on the structure of the ubiquitous B+ Tree [8]. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. Yet they have not been implemented in a MVCC enabled DBMS to date.
In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL’s B+–Trees PBTs have 50% better transaction throughput under TPC-C and a 30% improvement to standard PostgreSQL with Heap-Only Tuples.
Modern mixed (HTAP)workloads execute fast update-transactions and long running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead.
Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table – not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets.
In this paper we propose the Multi Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads. MV-PBT offers 40% higher tx. throughput compared to WiredTiger’s LSM-Tree implementation under YCSB.
Database management systems and K/V-Stores operate on updatable datasets – massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ -Trees [1, 4] allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees [16, 23] overcome this issue by horizontal partitioning fractions of data – small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.
Firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like K/V-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B+ -Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, yielding efficient write patterns and reducing write amplification.
We integrated MV-PBT in the WiredTiger [15] KV storage engine. MV-PBT offers an up to 2× increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B+ -Trees in a YCSB [5] workload.
In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies. We argue that neither the design of modern DBMS, nor the architecture of modern storage technologies are aligned with each other. Moreover, the majority of the systems rely on a complex multi-layer and compatibility oriented storage stack. The result is needlessly suboptimal DBMS performance, inefficient utilization, or significant write amplification due to outdated abstractions and interfaces. In the present tutorial we focus on the concept of native storage, which is storage operated without intermediate abstraction layers over an open native storage interface and is directly controlled by the DBMS.
In the present tutorial we perform a cross-cut analysis of database systems from the perspective of modern storage technology, namely Flash memory. We argue that neither the design of modern DBMS, nor the architecture of flash storage technologies are aligned with each other. The result is needlessly suboptimal DBMS performance and inefficient flash utilisation as well as low flash storage endurance and reliability. We showcase new DBMS approaches with improved algorithms and leaner architectures, designed to leverage the properties of modern storage technologies. We cover the area of transaction management and multi-versioning, putting a special emphasis on: (i) version organisation models and invalidation mechanisms in multi-versioning DBMS; (ii) Flash storage management especially on append-based storage in tuple granularity; (iii) Flash-friendly buffer management; as well as (iv) improvements in the searching and indexing models. Furthermore, we present our NoFTL approach to native Flash access that integrates parts of the flash-management functionality into the DBMS yielding significant performance increase and simplification of the I/O stack. In addition, we cover the basics of building large Flash storage for DBMS and revisit some of the RAID techniques and principles.
We introduce bloomRF as a unified method for approximate membership testing that supports both point- and range-queries. As a first core idea, bloomRF introduces novel prefix hashing to efficiently encode range information in the hash-code of the key itself. As a second key concept, bloomRF proposes novel piecewisemonotone hash-functions that preserve local order and support fast range-lookups with fewer memory accesses. bloomRF has near-optimal space complexity and constant query complexity. Although, bloomRF is designed for integer domains, it supports floating-points, and can serve as a multi-attribute filter. The evaluation in RocksDB and in a standalone library shows that it is more efficient and outperforms existing point-range-filters by up to 4× across a range of settings and distributions, while keeping the false-positive rate low.
Many modern DBMS architectures require transferring data from storage to process it afterwards. Given the continuously increasing amounts of data, data transfers quickly become a scalability limiting factor. Near-Data Processing and smart/computational storage emerge as promising trends allowing for decoupled in-situ operation execution, data transfer reduction and better bandwidth utilization. However, not every operation is suitable for an in-situ execution and a careful placement and optimization is needed.
In this paper we present an NDP-aware cost model. It has been implemented in MySQL and evaluated with nKV. We make several observations underscoring the need for optimization.