TY - CHAP U1 - Konferenzveröffentlichung A1 - Vinçon, Tobias A1 - Hardock, Sergej A1 - Riegger, Christian A1 - Koch, Andreas A1 - Petrov, Ilia ED - Welzer, Tatjana T1 - nativeNDP: processing big data analytics on native storage nodes T2 - Advances in databases and information systems : 23rd European Conference, ADBIS 2019, Bled, Slovenia, September 8–11, 2019, proceedings. - (Lecture notes in computer science ; 11695) N2 - Data analytics tasks on large datasets are computationally intensive and often demand the compute power of cluster environments. Yet, data cleansing, preparation, dataset characterization and statistics or metrics computation steps are frequent. These are mostly performed ad hoc, in an explorative manner and mandate low response times. But, such steps are I/O intensive and typically very slow due to low data locality, inadequate interfaces and abstractions along the stack. These typically result in prohibitively expensive scans of the full dataset and transformations on interface boundaries. In this paper, we examine R as analytical tool, managing large persistent datasets in Ceph, a wide-spread cluster file-system. We propose nativeNDP – a framework for Near Data Processing that pushes down primitive R tasks and executes them in-situ, directly within the storage device of a cluster-node. Across a range of data sizes, we show that nativeNDP is more than an order of magnitude faster than other pushdown alternatives. KW - near-data processing KW - in-storage processing KW - cluster KW - native storage Y1 - 2019 SN - 978-3-030-28730-6 SB - 978-3-030-28730-6 U6 - https://doi.org/10.1007/978-3-030-28730-6_9 DO - https://doi.org/10.1007/978-3-030-28730-6_9 SP - 139 EP - 150 S1 - 12 PB - Springer CY - Cham ER -