Master theses

Building a W3C specification for bringing Linked Data into the event streaming era

Promotors: Pieter Colpaert

Main contact: Pieter Colpaert

Problem

Linked Data Event Streams (LDES) is a specification we’re building with an international team at the European Commision. It enables clients to replicate an append-only log of representations of entities and then stay synchronized as new members arrive. It is built on top of the TREE W3C hypermedia spec, and thus a client will need to potentially traverse multiple pages/nodes in a TREE-based publication and emitting each member exactly once. Links: https://w3id.org/tree/specification and https://w3id.org/ldes/specification In today’s pipelines, a lot of CPU time and I/O is burned on member/message boundary detection and incremental parsing, especially when a client can’t cheaply determine where one “atomic unit” ends and the next begins, or when it must apply heuristics to reconstruct intended groupings. The RDF Messages proposal highlights this explicitly: without message semantics and syntax, reconstructing the intended message becomes slow and relies on sub-optimal heuristics; it therefore proposes explicit message grouping via syntax (e.g., @message delimiters and a messages=rdfm content-type hint). So the core systems problem is: how do we add lightweight, backwards-compatible syntactic hints to RDF serializations so that LDES replication/synchronization can be implemented with significantly lower parsing cost and higher throughput, without sacrificing correctness or interoperability?

Goal

You’ll be building the first implementation of RDF Messages: https://w3c-cg.github.io/rsp/spec/messages. By implementing it, you’ll provide feedback to the specification draft and make it better. You’ll also create a compliance framework so other implementations can prove their compliance.