Refine
Document Type
- Conference proceeding (15)
- Journal article (5)
- Book chapter (2)
Language
- English (22)
Has full text
- yes (22)
Is part of the Bibliography
- yes (22)
Institute
- Informatik (22)
Publisher
nKV in action: accelerating KVstores on native computational storage with NearData processing
(2020)
Massive data transfers in modern data intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, has yet to see widespread use.
In this paper we demonstrate various NDP alternatives in nKV, which is a key/value store utilizing native computational storage and near-data processing. We showcase the execution of classical operations (GET, SCAN) and complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4x-2.7x better performance due to NDP. nKV runs on real hardware - the COSMOS+ platform.
Massive data transfers in modern key/value stores resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) designs represent a feasible solution, which although not new, have yet to see widespread use.
In this paper we introduce nKV, which is a key/value store utilizing native computational storage and near-data processing. On the one hand, nKV can directly control the data and computation placement on the underlying storage hardware. On the other hand, nKV propagates the data formats and layouts to the storage device where, software and hardware parsers and accessors are implemented. Both allow NDP operations to execute in host-intervention-free manner, directly on physical addresses and thus better utilize the underlying hardware. Our performance evaluation is based on executing traditional KV operations (GET, SCAN) and on complex graph-processing algorithms (Betweenness Centrality) in-situ, with 1.4×-2.7× better performance on real hardware – the COSMOS+ platform.
Massive data transfers in modern data intensive systems resulting from low data-locality and data-to-code system design hurt their performance and scalability. Near-data processing (NDP) and a shift to code-to-data designs may represent a viable solution as packaging combinations of storage and compute elements on the same device has become viable.
The shift towards NDP system architectures calls for revision of established principles. Abstractions such as data formats and layouts typically spread multiple layers in traditional DBMS, the way they are processed is encapsulated within these layers of abstraction. The NDP-style processing requires an explicit definition of cross-layer data formats and accessors to ensure in-situ executions optimally utilizing the properties of the underlying NDP storage and compute elements. In this paper, we make the case for such data format definitions and investigate the performance benefits under NoFTL-KV and the COSMOS hardware platform.
Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries.
In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.
In the present tutorial we perform a cross-cut analysis of database storage management from the perspective of modern storage technologies. We argue that neither the design of modern DBMS, nor the architecture of modern storage technologies are aligned with each other. Moreover, the majority of the systems rely on a complex multi-layer and compatibility oriented storage stack. The result is needlessly suboptimal DBMS performance, inefficient utilization, or significant write amplification due to outdated abstractions and interfaces. In the present tutorial we focus on the concept of native storage, which is storage operated without intermediate abstraction layers over an open native storage interface and is directly controlled by the DBMS.
This work is a report on practical experiences with the issue of interoperability in German practice management systems (PMSs) from an ongoing clinical trial on teledermatology, the TeleDerm project. A proprietary and established web-platform for store-and-forward telemedicine is integrated with the IT in the GPs’ offices for automatic exchange of basic patient data. Most of the 19 different PMSs included in the study sample lack support of modern health data exchange standards, therefore the relatively old but widely available German health data exchange interface “Gerätedatentransfer” (GDT) is used. Due to the lack of enforcement and regulation of the GDT standard, several obstacles to interoperability are encountered. As a partial, but reusable working solution to cope with these issues, we present a custom middleware which is used in conjunction with GDT. We describe the design, technical implementation and observed hindrances with the existing infrastructure. A discussion on health care interfacing standards and the current state of interoperability in German PMS software is given.
Background: Internationally, teledermatology has proven to be a viable alternative to conventional physical referrals. Travel cost and referral times are reduced while patient safety is preserved. Especially patients from rural areas benefit from this healthcare innovation. Despite these established facts and positive experiences from EU neighboring countries like the Netherlands or the United Kingdom, Germany has not yet implemented store-and-forward teledermatology in routine care.
Methods: The TeleDerm study will implement and evaluate store-and-forward teledermatology in 50 general practitioner (GP) practices as an alternative to conventional referrals. TeleDerm aims to confirm that the possibility of store-and-forward teledermatology in GP practices is going to lead to a 15% (n = 260) reduction in referrals in the intervention arm. The study uses a cluster-randomized controlled trial design. Randomization is planned for the cluster “county”. The main observational unit is the GP practice. Poisson distribution of referrals is assumed. The evaluation of secondary outcomes like acceptance, enablers and barriers uses a mixed methods design with questionnaires and interviews.
Discussion: Due to the heterogeneity of GP practice organization, patient management software, information technology service providers, GP personal technical affinity and training, we expect several challenges in implementing teledermatology in German GP routine care. Therefore, we plan to recruit 30% more GPs than required by the power calculation. The implementation design and accompanying evaluation is expected to deliver vital insights into the specifics of implementing telemedicine in German routine care.
Modern persistent Key/Value stores are designed to meet the demand for high transactional throughput and high data ingestion rates. Still, they rely on backwards-compatible storage stack and abstractions to ease space management, foster seamless proliferation and system integration. Their dependence on the traditional I/O stack has negative impact on performance, causes unacceptably high write-amplification, and limits the storage longevity.
In the present paper we present NoFTL KV, an approach that results in a lean I/O stack, integrating physical storage management natively in the Key/Value store. NoFTL-KV eliminates backwards compatibility, allowing the Key/Value store to directly consume the characteristics of modern storage technologies. NoFTLKV is implemented under RocksDB. The performance evaluation under LinkBench shows that NoFTL-KV improves transactional throughput by 33%, while response times improve up to 2.3x. Furthermore, NoFTL KV reduces write-amplification 19x and improves storage longevity by imately the same factor.
We introduce IPA-IDX – an approach to handle index modifications modern storage technologies (NVM, Flash) as physical in-place appends, using simplified physiological log records. IPA-IDX provides similar performance and longevity advantages for indexes as basic IPA [5] does for tables. The selective application of IPA-IDX and basic IPA to certain regions and objects, lowers the GC overhead by over 60%, while keeping the total space overhead to 2%. The combined effect of IPA and IPA-IDX increases performance by 28%.
Many modern DBMS architectures require transferring data from storage to process it afterwards. Given the continuously increasing amounts of data, data transfers quickly become a scalability limiting factor. Near-Data Processing and smart/computational storage emerge as promising trends allowing for decoupled in-situ operation execution, data transfer reduction and better bandwidth utilization. However, not every operation is suitable for an in-situ execution and a careful placement and optimization is needed.
In this paper we present an NDP-aware cost model. It has been implemented in MySQL and evaluated with nKV. We make several observations underscoring the need for optimization.