Master theses

Increasing Knowledge Graph Adoption through Excel-Driven Generation of Standardized Artifacts

Problem

Semantic Web technologies are powerful, but adoption is still constrained by tooling complexity. In many organisations, domain experts and business stakeholders are comfortable with spreadsheets, templates, and API documentation — not RDF vocabularies, SHACL constraints, or mapping languages. As a result, the barrier to producing standards-compliant Knowledge Graph data is often social and practical rather than purely technical.

At IDLab, the implementation-process-pipeline already demonstrates a strong solution direction: well-structured Excel templates can be transformed automatically into RDF that conforms to predefined SHACL shapes. This is a major usability breakthrough, because it lets non-specialists contribute structured data without writing RDF manually.

The next challenge is to turn this into a broader, end-to-end artifact generation workflow that supports both knowledge-graph engineers and mainstream API developers. Today, parts of this chain still require manual glue work, and the expressiveness of templates and validation models can be further improved. To maximize real-world uptake, the pipeline should produce consistent, linked artifacts across spreadsheet templates, SHACL constraints, RML mappings, and API-oriented schema descriptions.

Goal

In this thesis, you will extend the Excel-driven KG generation pipeline into a comprehensive artifact engineering workflow that increases interoperability, usability, and adoption. Your work will focus on making the same conceptual model flow consistently across Excel input templates, SHACL validation, RML transformation rules, and developer-facing API artifacts.

Concretely, you will analyse the current expressiveness limits of the Excel template and SHACL layer, design improvements, and implement them in the pipeline. You will investigate automated generation of JSON Schema and OpenAPI/Swagger-compatible descriptions from Excel and SHACL definitions, and ensure these generated artifacts remain semantically linked to the underlying RML mappings. The objective is that one source-of-truth template can drive multiple standards-compliant outputs without duplication.

You will evaluate the extended pipeline on three dimensions: modeling expressiveness (which constraints and patterns can be captured), consistency across generated artifacts (do SHACL, JSON Schema, OpenAPI, and RML remain aligned), and developer adoption potential (can non-RDF experts produce high-quality KG-ready data with minimal friction). The expected outcome is a practical, production-oriented approach that lowers the entry barrier for Knowledge Graph technology and strengthens the bridge between semantic standards and mainstream data engineering workflows.

The thesis emphasizes one coherent source-of-truth pipeline and validation suite, rather than building a full low-code data product.

View all master theses.