Entwicklung eines Data-Cleaning-Prozesses für das Voestalpine Treasury

Autor: L. Frieske
Masterarbeit: MT1903 (Oktober, 2019)
Betreut von: o. Univ.-Prof. Dr. Michael Schrefl
Angeleitet von: Ass.-Prof. Dr. Christoph Schütz
Ausgeführt an: Universität Linz, Institut für Wirtschaftsinformatik - Data & Knowledge Engineering
Ressourcen: Kopie

Kurzfassung (Deutsch)

Daten sind von essenzieller Bedeutung für ein Unternehmen. Beim Einsatz einer Standardsoftware besteht jedoch die Gefahr, dass aufgrund von zu allgemeinen Überprüfungsmechanismen Inkonsistenzen im Datenbestand entstehen, welche die Datenqualität negativ beeinflussen. Um dies zu verhindern, wurde im Rahmen dieser Masterarbeit für das voestalpine Treasury ein Data-Cleaning-Prozess entwickelt, welcher den Datenbestand auf einer detaillierteren Ebene als die eingesetzte Standardsoftware überprüft und folglich die notwendige Datenqualität im voestalpine Treasury sicherstellt. Für die automatisierte Überprüfung der Datenbestände wurde als Teil des Data-Cleaning-Prozesses ein Softwaresystem implementiert, welches, unter Zuhilfenahme von Geschäftsregeln, Inkonsistenzen im Datenbestand identifiziert. Die Geschäftsregeln wurden im ersten Schritt nach den Vorgaben der Domänenexperten des voestalpine Treasury in natürlicher Sprache abgebildet. Aufgrund der strukturellen Schwachstellen der natürlichen Sprache, wie beispielsweise Mehrdeutigkeiten, wurden die Geschäftsregeln unter Verwendung von Semantics of Business Vocabulary and Rules (Abk. SBVR) formal abgebildet. Um die anschließende Verwendung durch das Softwaresystem zu ermöglichen, wurden die SBVR-Geschäftsregeln in die Form von SQL-Abfragen transformiert. Die Ergebnisse der automatisierten Überprüfung der Datenbestände werden den Mitarbeitern des voestalpine Treasury in Form eines Reports zur Verfügung gestellt, welcher die Grundlage für die Bereinigung des Datenbestands darstellt. Eine erste Anwendung des in dieser Masterarbeit beschriebenen Data-Cleaning-Prozesses im voestalpine Treasury führte bereits zur Identifikation und Bereinigung einer beträchtlichen Anzahl an Inkonsistenzen, wodurch die Datenqualität im voestalpine Treasury verbessert wurde.

Kurzfassung (Englisch)

Data is one of the most valuable assets of a company. Nevertheless, the usage of standard software often leads to inconsistencies in a company’s data due to insufficient consistency checks, negatively affecting data quality. In this master`s thesis, a data cleaning process was developed for the voestalpine Treasury to address the mentioned problem. The proposed data cleaning process examines the voestalpine Treasury`s data on a more detailed level than the built-in checking mechanisms of the standard software and therefore ensures appropriate data quality. A software system that enables the automated checking of the voestalpine Treasury`s data was implemented as a component of the data cleaning process. The software system utilizes business rules to identify inconsistencies in the data. The business rules were derived from the knowledge of the voestalpine Treasury`s domain experts and subsequently documented in natural language. Due to the structural weaknesses of natural language such as ambiguity, the business rules were formally represented using the Semantics of Business Vocabulary and Rules (abbr. SBVR). To enable the execution of the business rules by the software system, the SBVR business rules were transformed into corresponding SQL queries. The results of the automated checks of the software system are presented in a report document, which is accessible to the voestalpine Treasury`s employees and subsequently serves as guidance for the cleaning of the data. A first application of the data cleaning process proposed in this master`s thesis already led to the identification and elimination of a substantial number of inconsistencies, thus increasing data quality at voestalpine Treasury.