Simple Semantic Uplifting - Mixing n-quad awareness into templating (Internship at VLIZ)
Keywords: templating syntax
Promotors: Pieter Colpaert, Julián Andrés Rojas Meléndez
Students: max 1
Problem
Exposing information as Linked Open Data requires producing triples (or more general n-quads). The field of achieving this spans a wide variety of solutions from standard based, over advanced semantic workbench-platforms to low level handmade code that performs basic string concatenation. In the VLIZ setting, where quite some critical systems are maintained in house under full control and with extensive historic expertise build-up, we tend to go for the more “manual” side of this spectrum. This ties in best with what we know, taps into our experience and knowledge of the domain-model and avoids dependencies into new techniques and platforms.
Still, over the years we have grown towards a simple in house templating approach. This allows to separate the life-cycles and coding-experiences that typically emerge in this kind of projects (1) about extracting the available data into in-memory structures (typically represented as json) and (2) fine-tuning the vocabs in use and shapes to construct. This approach allows a template-writer (with vocab and rdf knowledge) and a domain-model-coder (with system knowledge about internal API and storage) to join forces into tuned ad hoc teams to realise these solutions.
Our current approach is however very text-template driven (based on python jinja2). This lack of understanding of the triple-model it is essentially building is sometimes leading to undesired side-effects. Basically the production of non-correct syntax, that can only be prevented by extensive preformating of the input data or guarding sections of the templates with testing code making the resulting templates less ‘readable’ than they could be.
Goal
The idea is to overhaul our current templating approach, and develop a new templating syntax that is fully aware of the model (triples, or more general n-quads) that it is producing. With this change we hope to address a number of things with this:
- missing parts do not need to be checked for → it those lead to incomplete statements in the model (i.e less than 3 parts for a triple, or less than 4 for quads) those can simply be ignored (and maybe logged)
- output to various serialisations should become a configurable option (now tied to the chosen syntax of the template: jsonld, trig or ttl)
We got inspired by the “pug style” templating in use for producing html as it also combines a model-awareness (of the DOM) with a templating style and uses python-like indentation for formatting; and would like to pursue this way of working to produce our triples and quads.
Unlike the current approach, where we simply reused the jinja templating syntax, this will require an own and basic language design. This will include pinning the rules down in EBNF and building a lexical analyser for it (i.e. use parser generator techniques)