Refine
Document Type
- Book chapter (2)
- Conference proceeding (2)
Language
- English (4)
Has full text
- yes (4)
Is part of the Bibliography
- yes (4) (remove)
Institute
- Informatik (4)
Publisher
- Springer (4) (remove)
Database management systems and K/V-Stores operate on updatable datasets – massively exceeding the size of available main memory. Tree-based K/V storage management structures became particularly popular in storage engines. B+ -Trees [1, 4] allow constant search performance, however write-heavy workloads yield in inefficient write patterns to secondary storage devices and poor performance characteristics. LSM-Trees [16, 23] overcome this issue by horizontal partitioning fractions of data – small enough to fully reside in main memory, but require frequent maintenance to sustain search performance.
Firstly, we propose Multi-Version Partitioned BTrees (MV-PBT) as sole storage and index management structure in key-sorted storage engines like K/V-Stores. Secondly, we compare MV-PBT against LSM-Trees. The logical horizontal partitioning in MV-PBT allows leveraging recent advances in modern B+ -Tree techniques in a small transparent and memory resident portion of the structure. Structural properties sustain steady read performance, yielding efficient write patterns and reducing write amplification.
We integrated MV-PBT in the WiredTiger [15] KV storage engine. MV-PBT offers an up to 2× increased steady throughput in comparison to LSM-Trees and several orders of magnitude in comparison to B+ -Trees in a YCSB [5] workload.
Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries.
In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.
A transaction is a demarcated sequence of application operations, for which the following properties are guaranteed by the underlying transaction processing system (TPS): atomicity, consistency, isolation, and durability (ACID). Transactions are therefore a general abstraction, provided by TPS that simplifies application development by relieving transactional applications from the burden of concurrency and failure handling. Apart from the ACID properties, a TPS must guarantee high and robust performance (high transactional throughput and low response times), high reliability (no data loss, ability to recover last consistent state, fault tolerance), and high availability (infrequent outages, short recovery times).
The architectures and workhorse algorithms of a high-performance TPS are built around the properties of the underlying hardware. The introduction of nonvolatile memories (NVM) as novel storage technology opens an entire new problem space, with the need to revise aspects such as the virtual memory hierarchy, storage management and data placement, access paths, and indexing. NVM are also referred to as storage-class memory (SCM).
Active storage
(2018)
In brief, Active Storage refers to an architectural hardware and software paradigm, based on collocation storage and compute units. Ideally, it will allow to execute application-defined data ... within the physical data storage. Thus Active Storage seeks to minimize expensive data movement, improving performance, scalability, and resource efficiency. The effective use of Active Storage mandates new architectures, algorithms, interfaces, and development toolchains.