Master theses

Optimization of Daisy-Chained SPARQL Queries

Keywords: Linked Data, Querying, RDF, SPARQL, Webdevelopment, decentralization

Promotors: Ruben Verborgh, Ruben Taelman

Students: max 1

Problem

SPARQL is the de facto standard for querying RDF data, with SELECT and CONSTRUCT queries being commonly used. While SELECT queries return a table of variable bindings, CONSTRUCT queries generate new RDF data, effectively acting as views over the original dataset.

When multiple CONSTRUCT queries are chained together, they create intermediate datasets that can introduce inefficiencies. A naïve execution of such chains (A -- construct 1 -> B -- construct 2 -> C) leads to redundant computations and unnecessary data materialization, reducing performance. Despite the importance of optimizing non-materialized views in SPARQL, there is currently no systematic approach to algebraically optimizing these query chains into a single, more efficient construct query.

This inefficiency also impacts RDF-based data interfaces, where read interfaces can be described as views over enriched datasets derived from multiple write interfaces. Without optimization, the description and execution of these interfaces remain suboptimal.

Goal

This thesis will focus on designing algebraic optimizations for SPARQL CONSTRUCT queries to transform a daisy chain of queries into a single optimized query. Concretely, the student will:

  1. Develop algebraic transformations that optimize sequences of CONSTRUCT queries.
  2. Implement these optimizations within the TypeScript-based Comunica SPARQL query execution framework.
  3. Evaluate the performance improvements of optimized queries compared to naive execution.

By enabling more efficient query execution and improving non-materialized view optimization, this research will contribute to the performance and scalability of SPARQL-based systems.