Refine
Document Type
- Conference Proceeding (36)
- Part of a Book (5)
- Article (4)
Has Fulltext
- yes (45)
Is part of the Bibliography
- yes (45)
Institute
- Informatik (45)
Publisher
Current data-intensive systems suffer from scalability as they transfer massive amounts of data to the host DBMS to process it there. Novel near-data processing (NDP) DBMS architectures and smart storage can provably reduce the impact of raw data movement. However, transferring the result-set of an NDP operation may increase the data movement, and thus, the performance overhead. In this paper, we introduce a set of in-situ NDP result-set management techniques, such as spilling, materialization, and reuse. Our evaluation indicates a performance improvement of 1.13 × to 400 ×.
Many future Services Oriented Architecture (SOA) systems may be pervasive SmartLife applications that provide real-time support for users in everyday tasks and situations. Development of such applications will be challenging, but in this position paper we argue that their ongoing maintenance may be even more so. Ontological modelling of the application may help to ease this burden, but maintainers need to understand a system at many levels, from a broad architectural perspective down to the internals of deployed components. Thus we will need consistent models that span the range of views, from business processes through system architecture to maintainable code. We provide an initial example of such a modelling approach and illustrate its application in a semantic browser to aid in software maintenance tasks.
Multi-versioning and MVCC are the foundations of many modern DBMSs. Under mixed workloads and large datasets, the creation of the transactional snapshot can become very expensive, as long-running analytical transactions may request old versions, residing on cold storage, for reasons of transactional consistency. Furthermore, analytical queries operate on cold data, stored on slow persistent storage. Due to the poor data locality, snapshot creation may cause massive data transfers and thus lower performance. Given the current trend towards computational storage and near-data processing, it has become viable to perform such operations in-storage to reduce data transfers and improve scalability. neoDBMS is a DBMS designed for near-data processing and computational storage. In this paper, we demonstrate how neoDBMS performs snapshot computation in-situ. We showcase different interactive scenarios, where neoDBMS outperforms PostgreSQL 12 by up to 5×.
Real Time Charging (RTC) applications that reside in the telecommunications domain have the need for extremely fast database transactions. Today´s providers rely mostly on in-memory databases for this kind of information processing. A flexible and modular benchmark suite specifically designed for this domain provides a valuable framework to test the performance of different DB candidates. Besides a data and a load generator, the suite also includes decoupled database connectors and use case components for convenient customization and extension. Such easily produced test results can be used as guidance for choosing a subset of candidates for further tuning/testing and finally evaluating the database most suited to the chosen use cases. This is why our benchmark suite can be of value for choosing databases for RTC use cases.
Active storage
(2019)
In brief, Active Storage refers to an architectural hardware and software paradigm, based on collocation storage and compute units. Ideally, it will allow to execute application-defined data ... within the physical data storage. Thus Active Storage seeks to minimize expensive data movement, improving performance, scalability, and resource efficiency. The effective use of Active Storage mandates new architectures, algorithms, interfaces, and development toolchains.
Rapidly growing data volumes push today's analytical systems close to the feasible processing limit. Massive parallelism is one possible solution to reduce the computational time of analytical algorithms. However, data transfer becomes a significant bottleneck since it blocks system resources moving data-to-code. Technological advances allow to economically place compute units close to storage and perform data processing operations close to data, minimizing data transfers and increasing scalability. Hence the principle of Near Data Processing (NDP) and the shift towards code-to-data. In the present paper we claim that the development of NDP-system architectures becomes an inevitable task in the future. Analytical DBMS like HPE Vertica have multiple points of impact with major advantages which are presented within this paper.
Flash SSDs are omnipresent as database storage. HDD replacement is seamless since Flash SSDs implement the same legacy hardware and software interfaces to enable backward compatibility. Yet, the price paid is high as backward compatibility masks the native behaviour, incurs significant complexity and decreases I/O performance, making it non-robust and unpredictable. Flash SSDs are black-boxes. Although DBMS have ample mechanisms to control hardware directly and utilize the performance potential of Flash memory, the legacy interfaces and black-box architecture of Flash devices prevent them from doing so.
In this paper we demonstrate NoFTL, an approach that enables native Flash access and integrates parts of the Flashmanagement functionality into the DBMS yielding significant performance increase and simplification of the I/O stack. NoFTL is implemented on real hardware based on the OpenSSD research platform. The contributions of this paper include: (i) a description of the NoFTL native Flash storage architecture; (ii) its integration in Shore-MT and (iii) performance evaluation of NoFTL on a real Flash SSD and on an on-line data-driven Flash emulator under TPCB, C,E and H workloads. The performance evaluation results indicate an improvement of at least 2.4x on real hardware over conventional Flash storage; as well as better utilisation of native Flash parallelism.
Near-data processing in database systems on native computational storage under HTAP workloads
(2022)
Today’s Hybrid Transactional and Analytical Processing (HTAP) systems, tackle the ever-growing data in combination with a mixture of transactional and analytical workloads. While optimizing for aspects such as data freshness and performance isolation, they build on the traditional data-to-code principle and may trigger massive cold data transfers that impair the overall performance and scalability. Firstly, in this paper we show that Near-Data Processing (NDP) naturally fits in the HTAP design space. Secondly, we propose an NDP database architecture, allowing transactionally consistent in-situ executions of analytical operations in HTAP settings. We evaluate the proposed architecture in state-of-the-art key/value-stores and multi-versioned DBMS. In contrast to traditional setups, our approach yields robust, resource- and cost-effcient performance.
Massive data transfers in modern data-intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-Data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become feasible. The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under RocksDB and the COSMOS hardware platform.
New storage technologies, such as Flash and Non- Volatile Memories, with fundamentally different properties are appearing. Leveraging their performance and endurance requires a redesign of existing architecture and algorithms in modern high performance databases. Multi-Version Concurrency Control (MVCC) approaches in database systems, maintain multiple timestamped versions of a tuple. Once a transaction reads a tuple the database system tracks and returns the respective version eliminating lock-requests. Hence under MVCC reads are never blocked, which leverages well the excellent read performance (high throughput, low latency) of new storage technologies. Upon tuple updates, however, established implementations of MVCC approaches (such as Snapshot Isolation) lead to multiple random writes – caused by (i) creation of the new and (ii) in-place invalidation of the old version – thus generating suboptimal access patterns for the new storage media. The combination of an append based storage manager operating with tuple granularity and snapshot isolation addresses asymmetry and in-place updates. In this paper, we highlight novel aspects of log-based storage, in multi-version database systems on new storage media. We claim that multi-versioning and append-based storage can be used to effectively address asymmetry and endurance. We identify multi-versioning as the approach to address dataplacement in complex memory hierarchies. We focus on: version handling, (physical) version placement, compression and collocation of tuple versions on Flash storage and in complex memory hierarchies. We identify possible read- and cacherelated optimizations.