Master theses

Web-Scale SPARQL Query Processing: Performance Evaluation of TypeScript and Rust-Based Engines

Promotors: Ruben Taelman

Main contact: Ruben Taelman

Problem

The internet can be considered a massive database, however, most of it consists of unstructured text. Initiatives like RDF provide a way to structure information on the web so that automated engines can perform reasoning tasks at web scale. SPARQL is the state-of-the-art query language for querying RDF. Hosting SPARQL endpoints on servers to execute SPARQL queries can be expensive and is usually limited to restricted datasets. To perform queries at web scale, it is therefore more practical for users to run their own query engines in the browser.

WebAssembly (WASM) promises significant performance improvements over JavaScript (JS) for compute-intensive tasks through lower-level execution. However, this performance advantage comes with a critical tradeoff: the cost of marshalling data between the WASM and JS environments. When data must frequently cross this boundary, the translation overhead can negate, or even exceed, the computational benefits of WASM execution.

SPARQL query engines provide an ideal use case to investigate this tradeoff. At the KNoWs lab, we develop the query engine Comunica in JavaScript (TypeScript), as it is the most accessible programming language for developing web frontend applications. With the rise in popularity of WebAssembly, the query engine Oxigraph provides the option to run its Rust-based engine in the browser.

This thesis will compare these two engines based on objective performance metrics.