In the last years, an increasing amount of available data has led to new
application approaches and an application field that is now called data
science (DS). Such applications often require low runtimes while having
to deal with restricted compute resources. Up to now, we perceive that
the DS community lacks tool support for runtime and resource usage
investigations. Thus, we present an approach that combines DS and
performance analysis from the High Performance Computing domain. Our
concept integrates the measurement framework Score-P in Jupyter, a
popular editor for the development of DS applications. We designed and
implemented a custom Jupyter kernel that collects runtime data and
applied it to a natural language processing application. The measurement
overhead was 12.55 seconds. The benefits are, that the collected data
can then be visualised using established performance analysis tools.
Location:
Andreas-Pfitzmann-Bau (APB-1020)
Nöthnitzer Str. 46
01187 Dresden