The Data Integration course addresses a variety of problems related to the integration of heterogenous data sources, that range from structured data (such as relational databases), over semi-structured data (such as data on the Web, and tree- and graph-structured data), to unstructured (textual) data.  It overviews foundational techniques for data integration, such as schema mappings, data and schema matching, and query processing in data integration, and does so considering different data representations that go beyond the relational model, such as RDF data, linked open data, and knowledge graphs. Architectures for data integration and data federation and their adoption to build comprehensive data integration solutions are studied. By attending the course, students will also learn how to design and build a data integration system, possibly exploiting existing data access and data federation technologies.

Covered topics:
  • Data integration architectures
  • Schema mapping
  • Data matching
  • Heterogeneous and web data
  • Data cleaning
  • Query processing for data integration