nativeNDP: processing big data analytics on native storage nodes

  • Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries. In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.

Download full text files

  • 2442.pdf
    eng

Export metadata

Additional Services

Share in Twitter Search Google Scholar
Metadaten
Name:Vinçon, Tobias; Riegger, Christian; Petrov, Ilia
DOI:https://doi.org/10.1007/978-3-030-28730-6_9
ISBN:978-3-030-28730-6
Erschienen in:Advances in databases and information systems : 23rd European Conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019, proceedings. - (Lecture notes in computer science ; 11695)
Publisher:Springer
Place of publication:Cham
Editor:Tatjana Welzer
Document Type:Conference Proceeding
Language:English
Year of Publication:2019
Tag:cluster; in-storage processing; native storage; near-data processing
Pagenumber:12
First Page:139
Last Page:150
Catalogue entry:Im Katalog der Hochschule Reutlingen ansehen
Dewey Decimal Classification:004 Informatik
Open Access:Nein
Licence (German):License Logo  Lizenzbedingungen Springer