Active Flash: Out-of-core Data Analytics on Flash Storage

Abstract

We explore methods to integrate flash storage in new efficient architectures. In particular, we propose a novel approach, Active Flash, to migrate data analysis in scientific computing to the location of the data, the flash device itself. Thus, data analysis is moved from the host CPU to the storage controller, closer to where the data already resides. Active Flash has multiple advantages: it reduces dependence on limited bandwidth to a central storage system, allows the data analysis to proceed in parallel with the data generating application, and saves energy by using the more power-efficient controller. We provide a detailed study of performance-energy and compute-IO tradeoffs of Active Flash, demonstrate its feasibility on real-world data analysis tasks, and examine potential Active Flash scheduling policies by simulation.