Refine
Document Type
- Conference proceeding (38) (remove)
Has full text
- yes (38)
Is part of the Bibliography
- yes (38)
Institute
- Informatik (38)
Publisher
- Association for Computing Machinery (14)
- IEEE (7)
- OpenProceedings (4)
- Springer (4)
- Universität Konstanz (3)
- IARIA (2)
- CIDR (1)
- Gesellschaft für Informatik e.V (1)
- SciTePress (1)
- Universität Trier (1)
Near-Data Processing is a promising approach to overcome the limitations of slow I/O interfaces in the quest to analyze the ever-growing amount of data stored in database systems. Next to CPUs, FPGAs will play an important role for the realization of functional units operating close to data stored in non-volatile memories such as Flash.It is essential that the NDP-device understands formats and layouts of the persistent data, to perform operations in-situ. To this end, carefully optimized format parsers and layout accessors are needed. However, designing such FPGA-based Near-Data Processing accelerators requires significant effort and expertise. To make FPGA-based Near-Data Processing accessible to non-FPGA experts, we will present a framework for the automatic generation of FPGA-based accelerators capable of data filtering and transformation for key-value stores based on simple data-format specifications.The evaluation shows that our framework is able to generate accelerators that are almost identical in performance compared to the manually optimized designs of prior work, while requiring little to no FPGA-specific knowledge and additionally providing improved flexibility and more powerful functionality.
Under update intensive workloads (TPC, LinkBench) small updates dominate the write behavior, e.g. 70% of all updates change less than 10 bytes across all TPC OLTP workloads. These are typically performed as in-place updates and result in random writes in page-granularity, causing major write-overhead on Flash storage, a write amplification of several hundred times and lower device longevity.
In this paper we propose an approach that transforms those small in-place updates into small update deltas that are appended to the original page. We utilize the commonly ignored fact that modern Flash memories (SLC, MLC, 3D NAND) can handle appends to already programmed physical pages by using various low-level techniques such as ISPP to avoid expensive erases and page migrations. Furthermore, we extend the traditional NSM page-layout with a delta-record area that can absorb those small updates. We propose a scheme to control the write behavior as well as the space allocation and sizing of database pages.
The proposed approach has been implemented under Shore- MT and evaluated on real Flash hardware (OpenSSD) and a Flash emulator. Compared to In-Page Logging it performs up to 62% less reads and writes and up to 74% less erases on a range of workloads. The experimental evaluation indicates: (i) significant reduction of erase operations resulting in twice the longevity of Flash devices under update-intensive workloads; (ii) 15%-60% lower read/write I/O latencies; (iii) up to 45% higher transactional throughput; (iv) 2x to 3x reduction in overall write
amplification.
In the present paper we demonstrate a novel approach to handling small updates on Flash called In-Place Appends (IPA). It allows the DBMS to revisit the traditional write behavior on Flash. Instead of writing whole database pages upon an update in an out-of-place manner on Flash, we transform those small updates into update deltas and append them to a reserved area on the very same physical Flash page. In doing so we utilize the commonly ignored fact that under certain conditions Flash memories can support in-place updates to Flash pages without a preceding erase operation.
The approach was implemented under Shore-MT and evaluated on real hardware. Under standard update-intensive workloads we observed 67% less page invalidations resulting in 80% lower garbage collection overhead, which yields a 45% increase in transactional throughput, while doubling Flash longevity at the same time. The IPA outperforms In-Page Logging (IPL) by more than 50%.
We showcase a Shore-MT based prototype of the above approach, operating on real Flash hardware – the OpenSSD Flash research platform. During the demonstration we allow the users to interact with the system and gain hands on experience of its performance under different demonstration scenarios. These involve various workloads such as TPC-B, TPC-C or TATP.
Database Management Systems (DBMS) need to handle large updatable datasets in on-line transaction processing (OLTP) workloads. Most modern DBMS provide snapshots of data in multi-version concurrency control (MVCC) transaction management scheme. Each transaction operates on a snapshot of the database, which is calculated from a set of tuple versions. High parallelism and resource-efficient append-only data placement on secondary storage is enabled. One major issue in indexing tuple versions on modern hardware technologies is the high write amplification for tree-indexes.
Partitioned B-Trees (PBT) [5] is based on the structure of the ubiquitous B+ Tree [8]. They achieve a near optimal write amplification and beneficial sequential writes on secondary storage. Yet they have not been implemented in a MVCC enabled DBMS to date.
In this paper we present the implementation of PBTs in PostgreSQL extended with SIAS. Compared to PostgreSQL’s B+–Trees PBTs have 50% better transaction throughput under TPC-C and a 30% improvement to standard PostgreSQL with Heap-Only Tuples.
We introduce IPA-IDX – an approach to handle index modifications modern storage technologies (NVM, Flash) as physical in-place appends, using simplified physiological log records. IPA-IDX provides similar performance and longevity advantages for indexes as basic IPA [5] does for tables. The selective application of IPA-IDX and basic IPA to certain regions and objects, lowers the GC overhead by over 60%, while keeping the total space overhead to 2%. The combined effect of IPA and IPA-IDX increases performance by 28%.
Many future Services Oriented Architecture (SOA) systems may be pervasive SmartLife applications that provide real-time support for users in everyday tasks and situations. Development of such applications will be challenging, but in this position paper we argue that their ongoing maintenance may be even more so. Ontological modelling of the application may help to ease this burden, but maintainers need to understand a system at many levels, from a broad architectural perspective down to the internals of deployed components. Thus we will need consistent models that span the range of views, from business processes through system architecture to maintainable code. We provide an initial example of such a modelling approach and illustrate its application in a semantic browser to aid in software maintenance tasks.
Characteristics of modern computing and storage technologies fundamentally differ from traditional hardware. There is a need to optimally leverage their performance, endurance and energy consumption characteristics. Therefore, existing architectures and algorithms in modern high performance database management systems have to be redesigned and advanced. Multi Version Concurrency Control (MVCC) approaches in data-base management systems maintain multiple physically independent tuple versions. Snapshot isolation approaches enable high parallelism and concurrency in workloads with almost serializable consistency level. Modern hardware technologies benefit from multi-version approaches. Indexing multi-version data on modern hardware is still an open research area. In this paper, we provide a survey of popular multi-version indexing approaches and an extended scope of high performance single-version approaches. An optimal multi-version index structure brings look-up efficiency of tuple versions, which are visible to transactions, and effort on index maintenance in balance for different workloads on modern hardware technologies.
An index in a Multi-Version DBMS (MV-DBMS) has to reflect different tuple versions of a single data item. Existing approaches follow the paradigm of logically separating the tuple version data from the data item, e.g. an index is only allowed to return at most one version of a single data item (while it may return multiple data items that match a search criteria). Hence to determine the valid (and therefore visible) tuple version of a data item, the MV-DBMS first fetches all tuple versions that match the search criteria and subsequently filters visible versions using visibility checks. This involves I/O storage accesses to tuple versions that do not have to be fetched. In this vision paper we present the Multi Version Index (MV-IDX) approach that allows index-only visibility checks which significantly reduce the amount of I/O storage accesses as well as the index maintenance overhead. The MV-IDX achieves significantly lower response times and higher transactional throughput on OLTP workloads.
Modern mixed (HTAP)workloads execute fast update-transactions and long running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead.
Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table – not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets.
In this paper we propose the Multi Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15% higher transactional throughput under HTAP workloads. MV-PBT offers 40% higher tx. throughput compared to WiredTiger’s LSM-Tree implementation under YCSB.
In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies. We argue that neither the design of modern DBMS, nor the architecture of modern storage technologies are aligned with each other. Moreover, the majority of the systems rely on a complex multi-layer and compatibility oriented storage stack. The result is needlessly suboptimal DBMS performance, inefficient utilization, or significant write amplification due to outdated abstractions and interfaces. In the present tutorial we focus on the concept of native storage, which is storage operated without intermediate abstraction layers over an open native storage interface and is directly controlled by the DBMS.