Master theses

Abstracting Data Updates Through Provenance Trails

Keywords: Linked Data, Querying, RDF, SPARQL, Webdevelopment, decentralization

Promotors: Ruben Verborgh, Ruben Taelman

Students: max 1

Problem

Query engines have made retrieving data on RDF-interfaces seamless by abstracting away complexities such as heterogeneous interfaces and access-path dependencies. Techniques like link traversal and interface descriptions allow engines to dynamically discover and retrieve data, making querying more efficient and flexible.

However, while reading data has been significantly simplified, updating data remains a major challenge - specifically because of the access path data-dependence. Current query engines require explicit instructions on how updates should be performed for each data-specific interface, which undermines the abstraction power that query engines typically provide.

One promising approach to addressing this challenge is leveraging provenance trails. Since updates are typically expressed in relation to existing data, query engines—which excel at locating relevant data—could infer where newly created data should be stored by tracking the sources that contributed to its creation.

Goal

This thesis will focus on designing algorithms to enable updates on RDF data exposed by web interfaces. Specifically, the student will:

  1. Develop algorithms for updating RDF data through provenance-based inference.
  2. Implement these algorithms within the TypeScript-based Comunica query engine.
  3. Evaluate the effectiveness of provenance-based updates in terms of correctness, efficiency, and scalability.

By enabling seamless updates in RDF query engines, this research will bridge the gap between reading and writing linked data, making web-based RDF systems more dynamic and adaptable.