The term Data Analysis or Data Mining describes the systematic application of statistical methods to identify structures, dependencies, and relationships in sometimes very large data sets and to gain new knowledge, where computer-aided methods are used in the individual process steps. The content and scope of the respective steps depend, among other things, on the problem domain, the analysis goal, and other technical aspects like the available data sources or the representation of the data. A relevant process step is the preprocessing of these data (data preparation) to increase their quality for the subsequent analysis.
In this part of the training, the detection and handling of outliers, the imputation of missing values as well as a final comparison of the analysis results based on different variants of preprocessing will be covered.
Agenda
Handouts
The following documents (slides, sample applications) will be provided to the participants:
Prerequisites
Learning Outcomes: After the training, participants will be familiar with theoretical considerations and practical approaches to data preparation using Python with NumPy, Pandas and other packages.
Course language: English/German
Target group: Anyone regularly working with data and an interest in learning about various data preprocessing steps to improve analysis results.
Trainings ScaDS.AI