Volltext-Downloads (blau) und Frontdoor-Views (grau)

nativeNDP: processing big data analytics on native storage nodes

  • Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries. In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives.

Download full text files

  • 2442.pdf

Export metadata

Additional Services

Share in Twitter Search Google Scholar


Author of HS ReutlingenVinçon, Tobias; Riegger, Christian; Petrov, Ilia
Erschienen in:Advances in databases and information systems : 23rd European Conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019, proceedings. - (Lecture notes in computer science ; 11695)
Place of publication:Cham
Editor:Tatjana Welzer
Document Type:Conference proceeding
Publication year:2019
Tag:cluster; in-storage processing; native storage; near-data processing
Page Number:12
First Page:139
Last Page:150
PPN:Im Katalog der Hochschule Reutlingen ansehen
DDC classes:004 Informatik
Open access?:Nein
Licence (German):License Logo  In Copyright - Urheberrechtlich geschützt