Name: Big Data Processing on HPC
Start: 2022-12-08T10:00:00+01:00
End: 2022-12-08T15:00:00+01:00
Location: TU Dresden

Description

Apache Spark and Apache Flink are two typical Big Data analytics frameworks. Their APIs allow the development and testing of an application on a local workstation and later, without changing the source code of the application, distribute work to many computers when the local workstation is not sufficient anymore due to limited resources.

The course focuses on the step from a local workstation to an HPC environment and presents how the typical Big Data analysis workflow can be organized in an HPC environment. In this course participants will be introduced to running a data pipeline and data processing along with managing the configurations on the HPC environment, using Apache Flink and Apache Spark.

Agenda

Introduction and access to the ZIH HPC system
Considerations about the hardware environment
Setup of the required software environment
Distributed computing with Big Data frameworks
Efficient configuration of Big Data frameworks
Hands-on session

Pre-knowledge

Basic knowledge of Big Data frameworks (e.g. Apache Flink, Apache Spark) is recommended (but not required)
Basic HPC Knowledge is recommended (but not required)

Course language

English

Organized by

Trainings ScaDS.AI

Contact

trainings.scads.ai@tu-dresden.de

Big Data Processing on HPCTraining

by Apurv Deepak Kulkarni (ScaDS.AI), Elias Werner (ScaDS.AI), Jan Frenzel (ScaDS.AI)

online

TU Dresden

Agenda

Pre-knowledge

Course language