Workshop: Data Quality Aspects of Administrative Data

Name: Workshop: Data Quality Aspects of Administrative Data
Start: 2025-08-20T09:00:00+02:00
End: 2025-08-20T17:00:00+02:00
Location: ScaDS.AI Dresden/Leipzig

Wednesday Aug 20, 2025, 9:00 AM → 5:00 PM Europe/Berlin

Seminar Room "Zwenkauer See" - A.03.07 (ScaDS.AI Dresden/Leipzig)

Seminar Room "Zwenkauer See" - A.03.07

ScaDS.AI Dresden/Leipzig

Humboldtstr. 25 04105 Leipzig

Description

As researchers and practitioners working with administrative data, we are often given data sets where we do not know the full provenance about how this data set was captured, what kind of processing has been applied to it, and if it has been linked or merged with data from other sources.
Complete and up-to-date metadata are not always available. Not fully understanding the provenance of a data set can lead to assumptions and misconceptions being made about the content and quality of the data set. This can result in incorrect processing and / or analysis of a data set which potentially can lead to bad outcomes and decision making.

This course will provide an introduction to data quality, and how it can affect all aspects of working with administrative data. The course will cover data quality dimensions which include technical, social, as well as legal aspects; discuss frameworks that aim to quantify data quality; and provide examples and case studies showing how (the lack of) quality data can lead to bad outcomes of data science projects. This course will not focus on technical aspects of data cleaning, data processing, or data linkage, but rather highlight the issues researchers and practitioners need to be aware of when working with administrative data. The course will provide and discuss a set of recommendations, and through interactive sessions the participants will be able to share their own experiences of how data quality aspects have led to unexpected outcomes in projects they have worked in.

Course audience:
This one-day course is aimed both at researchers and practitioners who are working with administrative data, as well as those who are involved in the management of data centric systems in organisations that act as data custodians, or who are involved in the capture, processing, and linkage of data that potentially will be used for administrative data research. The course requires little technical knowledge and all technical background will be introduced during the course.
The course will be a mixture of four hours of interactive presentations (containing small practical exercises) plus two one-hour sessions with group discussions.

Please note: [as of Aug 13, preparatory activity of the participants has been cancelled; striked-throught text is kept to not confuse former registrants]
Prior to the course the participants are expected to view two pre-recorded presentations and complete a short homework document with a few questions, which is aimed to guide discussions during the workshop.

Course presenter: Prof Peter Christen, University of Edinburgh
Contact: peter.christen@ed.ac.uk

About the presenter:
Peter Christen is the Research Lead on the Scottish Historic Population Platform (SHiPP) project, run at the Scottish Centre for Administrative Data Research (SCADR) at the University of Edinburgh. He is also a Professor at the School of Computing at the Australian National University in Canberra. Peter is a world-leading expert in record linkage with over 20 years experience in working with administrative data. He has over 200 publications in the area of data science, including the two books "Data Matching" in 2012 and "Linking Sensitive Data" (co-authored with Thilina Ranbaduge and Rainer Schnell) in 2020. As of February 2025, his work has attracted nearly
18,000 citations at Google Scholar.

Maja Schneider

mschneider@informatik.uni-leipzig.de

Registration

Participants

- 9:00 AM → 10:00 AM
  
  Presentation 1 1h
  
  Welcome, course overview, the data science workflow, short introduction to data wrangling and data analytics / mining, overview of data quality aspects, examples and case studies of what can go wrong in data science.
- 10:00 AM → 10:30 AM
  
  Coffee break 30m
- 10:30 AM → 11:30 AM
  
  Presentation 2 1h
  
  Data quality dimensions, data quality assessments, data quality frameworks; how data capturing, data processing, and data linkage can affect data quality.
- 11:30 AM → 12:30 PM
  
  Practical Session 1 1h
  
  Group discussions on participant’s own experiences of data quality issues encountered (30 min), short presentations by groups of the most serious / challenging data quality issues they identified in their group (30 min).
- 12:30 PM → 1:30 PM
  
  Lunch break 1h
- 1:30 PM → 2:30 PM
  
  Presentation 3 1h
  
  Detailed discussions on assumptions and misconceptions in data science (from data capturing to processing and linkage), and how to identify and prevent them.
- 2:30 PM → 3:00 PM
  
  Coffee break 30m
- 3:00 PM → 4:00 PM
  
  Presentation 4 1h
  
  Recommendations on how to improve data quality aspects, what data scientists should check with their data sources, how to ensure data are research ready, and summary of course content.
- 4:00 PM → 5:00 PM
  
  Practical Session 2 1h
  
  Group discussions on how to implement recommendations within your own organisation (30 min), short presentations on strategies to improve data quality (30 min), course wrap-up and feedback.

Choose timezone

Workshop: Data Quality Aspects of Administrative Data

Seminar Room "Zwenkauer See" - A.03.07

ScaDS.AI Dresden/Leipzig