Master theses

Declarative Knowledge Graph Generation through LLM Orchestration and MCP Invocation

Promotors: Ben De Meester

Main contact: Ben De Meester

Problem

Large Language Models (LLMs) are remarkably capable at understanding structure in unstructured text — extracting entities, inferring relationships, and mapping natural language to formal schemas. Yet their core weakness is equally well-known: they hallucinate. When tasked with producing structured, interoperable data (think RDF, JSON-LD, or OWL ontologies), LLMs are prone to inventing property names, misusing vocabulary terms, and generating output that looks correct but breaks downstream systems. This is a fundamental reliability problem for any real-world data integration pipeline.

At the same time, the demand for Knowledge Graphs (KGs) as a backbone for AI-ready, interoperable data is growing fast — from enterprise data fabrics (Google, LinkedIn, Airbnb) to open government data initiatives and biomedical databases like Wikidata and UniProt. The bottleneck is not the KG technology itself, but the painful, manual effort required to generate KG data from the messy, heterogeneous sources that organisations actually have (CSV files, REST APIs, relational databases, JSON feeds).

The opportunity is clear: use LLMs as intelligent orchestrators that understand user intent and data structure, but constrain their output to declarative, verifiable mapping rules — not raw RDF. This way, an LLM's creative power is harnessed without trusting it blindly with ground truth.

Goal

In this thesis, you will design and prototype an LLM agent capable of generating and executing declarative Knowledge Graph mappings using the RML mapping language — the de-facto standard for heterogeneous KG generation developed right here at IDLab. The agent will be built using the Model Context Protocol (MCP), Anthropic's open standard for giving LLMs structured, tool-based access to external systems, enabling clean separation between reasoning (the LLM) and execution (the RML processor).

Concretely, you will wrap an RML processor (e.g., RMLMapper or Morph-KGC) as an MCP server, exposing tools for mapping validation, execution, and error feedback. You will build an LLM orchestration layer (e.g., using Claude, GPT-4o, or an open-source model) that takes a data source and a target ontology as input, generates candidate RML mappings, and iteratively refines them based on execution feedback. You will evaluate your agent against KROWN, the Knowledge Graph Generation benchmark developed at UGent, giving you a rigorous, reproducible metric for correctness and coverage. The end goal is a compelling end-to-end demo: natural language description in → validated, executable KG out.

This thesis sits at the intersection of semantic web engineering, LLM agent design, and software systems — a genuinely novel research problem with immediate practical impact. You will work closely with the IDLab team that created RML.io, giving you expert mentorship, a mature codebase to build on, and a direct path to publishing your results.

The expected result is a focused research prototype and evaluation on selected benchmarks, not a full production-grade agent platform.