Publications

Specification and Implementation of Mapping Rule Visualization and Editing: MapVOWL and the RMLEditor

by Pieter Heyvaert, Anastasia Dimou, Tom Seymoens, Erik Mannens, Dimitri Schuurman, Aron-Levi Herregodts, Ben De Meester, Ruben Verborgh,
Published in Journal of Web Semantics.
Keywords: visualization,RML,rules,research,Linked Data,

Visual tools are implemented to help users in defining how to generate Linked Data from raw data. This is possible thanks to mapping languages which enable detaching mapping rules from the implementation that executes them. However, no thorough research has been conducted so far on how to visualize such mapping rules, especially if they become large and require considering multiple heterogeneous raw data sources and transformed data values. In the past, we proposed the RMLEditor, a visual graph-based user interface, which allows users to easily create mapping rules for generating Linked Data from raw data. In this paper, we build on top of our existing work: we (i) specify a visual notation for graph visualizations used to represent mapping rules, (ii) introduce an approach for manipulating rules when large visualizations emerge, and (iii) propose an approach to uniformly visualize data fraction of raw data sources combined with an interactive interface for uniform data fraction transformations. We perform two additional comparative user studies. The first one compares the use of the visual notation to present mapping rules to the use of a mapping language directly, which reveals that the visual notation is preferred. The second one compares the use of the graph-based RMLEditor for creating mapping rules to the form-based RMLx Visual Editor, which reveals that graph-based visualizations are preferred to create mapping rules through the use of our proposed visual notation and uniform representation of heterogeneous data sources and data values.


Supporting sustainable publishing and consuming of live Linked Time Series Streams

by Pieter Colpaert, Julián Rojas Meléndez, Gayane Sedrakyan, Ruben Verborgh,
Published in Proceedings of the 15th ESWC: Posters and Demos.
Keywords: World Wide Web,Web,

The road to publishing public streaming data on the Web is paved with trade-offs that determine its viability. The cost of unrestricted query answering on top of data streams, may not be affordable for all data publishers. Therefore, public streams needs to be funded in a sustainable fashion to remain online. In this paper we introduce an overview of possible query answering features for live time series as multidimensional interfaces. For example, from a live parking availability data stream, pre-calculated time constrained statistical indicators or geographically classified data can be provided to clients on demand. Furthermore, we demonstrate the initial developments of a Linked Time Series server that supports such features through an extensible modular architecture. Benchmarking the costs associated to each of these features allows to weigh the trade-offs inherent to publishing live time series and establishes the foundations to create a decentralized and sustainable ecosystem for live data streams on the Web.


Declarative Rules for Linked Data Generation at your Fingertips!

by Pieter Heyvaert, Anastasia Dimou, Ben De Meester, Ruben Verborgh,
Published in Proceedings of the 15th ESWC: Posters and Demos.
Keywords: R2RML,RML,annotation,rules,Linked Data,

Linked Data is often generated based on a set of declarative rules using languages such as R2RML and RML. These languages are built with machine-processability in mind. It is thus not always straightforward for users to define or understand rules written in these languages, preventing them from applying the desired annotations to the data sources. In the past, graphical tools were proposed. However, next to users who prefer a graphical approach, there are users who desire to understand and define rules via a text-based approach. For the latter, we introduce an enhancement to their workflow. Instead of requiring users to manually write machine-processable rules, we propose writing human-friendly rules, and generate machine-processable rules based on those human-friendly rules. At the basis is YARRRML: a human-readable text-based representation for declarative generation rules. We propose a novel browser-based integrated development environment called “Matey”, showcasing the enhanced workflow. In this work, we describe our demo. Users can experience first hand how to generate triples from data in different formats by using YARRRML’s representation of the rules. The actual machine-processable rules remain completely hidden when editing. Matey shows that writing human-friendly rules enhances the workflow for a broader range of users. As a result, more desired annotations will be added to the data sources which leads to more desired Linked Data.


SeGoFlow: A Semantic Governance Workflow Tool

by Sven Lieber, Anastasia Dimou, Ruben Verborgh,
Published in Proceedings of the 15th ESWC: Posters and Demos.
Keywords: reuse,SPARQL,research,

Data management increasingly demands transparency with respect to data processing. Various stakeholders need information tailored to their needs, e.g. data management plans (DMP) for funding agencies or privacy policies for the public. DMPs and privacy policies are just two examples of documents describing aspects of data processing. Dedicated tools to create both already exist. However, creating each of them manually or semi-automatically remains a repetitive and cognitively challenging task. We propose a data-driven approach that semantically represents the data processing itself as workflows and serves as a base for different kinds of result-sets, generated with SPARQL, i.e. DMPs. Our approach is threefold: (i) users with domain knowledge semantically represent workflow components; (ii) other users can reuse these components to describe their data processing via semantically enhanced workflows; and, based on the semantic workflows, (iii) result-sets are automatically generated on-demand with SPARQL queries. This paper demonstrates our tool that implements the proposed approach, based on a use-case of a researcher who needs to provide a DMP to a funding agency to approve a proposed research project.


RESTdesc—A Functionality-Centered Approach to Semantic Service Description and Composition

by Thomas Steiner, Sam Coppens, Rik Van de Walle, Joaquim Gabarró Vallés, Erik Mannens, Davy Van Deursen, Ruben Verborgh,
Published in The Semantic Web: ESWC 2012 Satellite Events.
Keywords: RESTdesc,REST,World Wide Web,Web,

If we want automated agents to consume the Web, they need to understand what a certain service does and how it relates to other services and data. The shortcoming of existing service description paradigms is their focus on technical aspects instead of the functional aspect—what task does a service perform, and is this a match for my needs? This paper summarizes our recent work on RESTdesc, a semantic service description approach that centers on functionality. It has a solid foundation in logics, which enables advanced service matching and composition, while providing elegant and concise descriptions, responding to the demands of automated clients on the future Web of Agents.


Linked Data and Linked APIs: Similarities, Differences, and Challenges

by Thomas Steiner, Rik Van de Walle, Joaquim Gabarró Vallés, Ruben Verborgh,
Published in The Semantic Web: ESWC 2012 Satellite Events.
Keywords: RESTdesc,hypermedia,REST,Linked Data,World Wide Web,Web,

In an often retweeted Twitter post, entrepreneur and software architect Inge Henriksen described the relation of Web 1.0 to Web 3.0 as: “Web 1.0 connected humans with machines. Web 2.0 connected humans with humans. Web 3.0 connects machines with machines.” On the one hand, an incredible amount of valuable data is described by billions of triples, machine-accessible and interconnected thanks to the promises of Linked Data. On the other hand, REST is a scalable, resource-oriented architectural style that, like the Linked Data vision, recognizes the importance of links between resources. Hypermedia APIs are resources, too—albeit dynamic ones—and unfortunately, neither Linked Data principles, nor the REST-implied self-descriptiveness of hypermedia APIs sufficiently describe them to allow for long-envisioned realizations like automatic service discovery and composition. We argue that describing inter-resource links—similarly to what the Linked Data movement has done for data—is the key to machine-driven consumption of APIs In this paper, we explain how the description format RESTdesc captures the functionality of APIs by explaining the effect of dynamic interactions, effectively complementing the Linked Data vision.


PREMIS OWL

by Tom Creighton, Sébastien Peyrard, Sam Coppens, Rik Van de Walle, Rebecca Guenther, Kevin Ford, Erik Mannens, Ruben Verborgh,
Published in International Journal on Digital Libraries.
Keywords: archiving,provenance,metadata,World Wide Web,Web,

In this article, we present PREMIS OWL. This is a semantic formalisation of the PREMIS 2.2 data dictionary of the Library of Congress. PREMIS 2.2 are metadata implementation guidelines for digitally archiving information for the long term. Nowadays, the need for digital preservation is growing. A lot of the digital information produced merely a decade ago is in danger of getting lost as technologies are changing and getting obsolete. This also threatens a lot of information from heritage institutions. PREMIS OWL is a semantic long-term preservation schema. Preservation metadata are actually a mixture of provenance information, technical information on the digital objects to be preserved and rights information. PREMIS OWL is an OWL schema that can be used as data model supporting digital archives. It can be used for dissemination of the preservation metadata as Linked Open Data on the Web and, at the same time, for supporting semantic web technologies in the preservation processes. The model incorporates 24 preservation vocabularies, published by the LOC as SKOS vocabularies. Via these vocabularies, PREMIS descriptions from different institutions become highly interoperable. The schema is approved and now managed by the Library of Congress. The PREMIS OWL schema is published at http://www.loc.gov/premis/rdf/v1.


Web-scale Provenance Reconstruction of Implicit Information Diffusion on Social Media

by Sven Lieber, Tom De Nies, Peter Fischer, Io Taxidou, Ruben Verborgh,
Published in Distributed and Parallel Databases.
Keywords: information diffusion,publication,provenance,social media,research,Facebook,World Wide Web,Web,

Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess particular pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. The results show that (on a single machine) we can process datasets consisting of up to several millions of messages at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment.


Enabling context-aware multimedia annotation by a novel generic semantic problem-solving platform

by Rik Van de Walle, Erik Mannens, Davy Van Deursen, Chris Poppe, Ruben Verborgh,
Published in Multimedia Tools and Applications.
Keywords: annotation,Web services,Web service,metadata,Semantic Web,World Wide Web,Web,

Automatic generation of metadata, facilitating the retrieval of multimedia items, potentially saves large amounts of manual work. However, the high specialization degree of feature extraction algorithms makes them unaware of the context they operate in, which contains valuable and often necessary information. In this paper, we show how Semantic Web technologies can provide a context that algorithms can interact with. We propose a generic problem-solving platform that uses Web services and various knowledge sources to find solutions to complex requests. The platform employs a reasoner-based composition algorithm, generating an execution plan that combines several algorithms as services. It then supervises the execution of this plan, intervening in case of errors or unexpected behavior. We illustrate our approach by a use case in which we annotate the names of people depicted in a photograph.


Representing Dockerfiles in RDF

by Pieter Heyvaert, Riccardo Tommasini, Erik Mannens, Emanuele Della Valle, Ben De Meester, Ruben Verborgh,
Published in Proceedings of the 16th International Semantic Web Conference: Posters and Demos.
Keywords: interoperability,research,Linked Data,RDF,

Containers – lightweight, stand-alone software executables – are everywhere. Industries exploit container managers to orchestrate complex cloud infrastructures and researchers in academia use them to foster reproducibility of computational experiments. Among existing solutions, Docker is the de facto standard in the container industry. In this paper, we advocate the value of applying the Linked Data paradigm to the container ecosystem’s building scripts, as it will allow adding additional knowledge, ease decentralized references, and foster interoperability. In particular we defined a vocabulary Dockeronto that allows to semantically annotate Dockerfiles.


Querying Dynamic Datasources with Continuously Mapped Sensor Data

by Ruben Taelman, Pieter Heyvaert, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 15th International Semantic Web Conference: Posters and Demos.
Keywords: publication,Linked Data,RDF,World Wide Web,Web,

The world contains a large amount of sensors that produce new data at a high frequency. It is currently very hard to find public services that expose these measurements as dynamic Linked Data. We investigate how sensor data can be published continuously on the Web at a low cost. This paper describes how the publication of various sensor data sources can be done by continuously mapping raw sensor data to RDF and inserting it into a live, low-cost server. This makes it possible for clients to continuously evaluate dynamic queries using public sensor data. For our demonstration, we will illustrate how this pipeline works for the publication of temperature and humidity data originating from a microcontroller, and how it can be queried.


Semi-Automatic Example-Driven Linked Data Mapping Creation

by Pieter Heyvaert, Anastasia Dimou, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 5th International Workshop on Linked Data for Information Extraction.
Keywords: RML,rules,Linked Data,

Linked Data can be generated by applying mapping rules on existing (semi-)structured data. The manual creation of these rules involves a costly process for users. Therefore, (semi-)automatic approaches have been developed to assist users. Although, they provide promising results, in use cases where examples of the desired Linked Data are available they do not use the knowledge provided by these examples, resulting in Linked Data that might not be as desired. This in turn requires manual updates of the rules. These examples can in certain cases be easy to create and offer valuable knowledge relevant for the mapping process, such as which data corresponds to entities and attributes, how this data is annotated and modeled, and how different entities are linked to each other. In this paper, we introduce a semi-automatic approach to create rules based on examples for both the existing data and corresponding Linked Data. Furthermore, we made the approach available via the RMLEditor, making it readily accessible for users through a graphical user interface. The proposed approach provides a first attempt to generate a complete Linked Dataset based on user-provided examples, by creating an initial set of rules for the users.


Data Analysis of Hierarchical Data for RDF Term Identification

by Pieter Heyvaert, Anastasia Dimou, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the Joint International Semantic Technology Conference.
Keywords: Linked Data,RDF,

Generating Linked Data based on existing data sources requires the modeling of their information structure. This modeling needs the identification of potential entities, their attributes and the relationships between them and among entities. For databases this identification is not required, because a data schema is always available. However, for other data formats, such as hierarchical data, this is not always the case. Therefore, analysis of the data is required to support RDF term and data type identification. We introduce a tool that performs such an analysis on hierarchical data. It implements the algorithms, Daro and S-Daro, proposed in this paper. Based on our evaluation, we conclude that S-Daro offers a more scalable solution regarding run time, with respect to the dataset size, and provides more complete results.


Linked Sensor Data Generation using Queryable RML Mappings

by Ruben Taelman, Pieter Heyvaert, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 15th International Semantic Web Conference: Posters and Demos.
Keywords: RML,interoperability,communication,reuse,Triple Pattern Fragments,research,Linked Data,RDF,

As the amount of generated sensor data is increasing, semantic interoperability becomes an important aspect in order to support efficient data distribution and communication. Therefore, the integration and fusion of (sensor) data is important, as this data is coming from different data sources and might be in different formats. Furthermore, reusable and extensible methods for this integration and fusion are required in order to be able to scale with the growing number of applications that generate semantic sensor data. Current research efforts allow to map sensor data to Linked Data in order to provide semantic interoperability. However, they lack support for multiple data sources, hampering the integration and fusion. Furthermore, the used methods are not available for reuse or are not extensible, which hampers the development of applications. In this paper, we describe how the RDF Mapping Language (RML) and a Triple Pattern Fragments (TPF) server are used to address these shortcomings. The demonstration consists of a micro controller that generates sensor data. The data is captured and mapped to RDF triples using module-specific RML mappings, which are queried from a TPF server.


Towards Approaches for Generating RDF Mapping Definitions

by Pieter Heyvaert, Anastasia Dimou, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 14th International Semantic Web Conference: Posters and Demos.
Keywords: rules,Linked Data,Semantic Web,RDF,World Wide Web,Web,

Obtaining Linked Data by modeling domain-level knowledge derived from input data is not straightforward for data publishers, especially if they are not Semantic Web experts. Developing user interfaces that support domain experts to semantically annotate their data became feasible, as the mapping rules were abstracted from their execution. However, most existing approaches reflect how mappings are typically executed: they offer a single linear workflow, triggered by a particular data source. Alternative approaches were neither thoroughly investigated yet, nor incorporated in any of the existing user interfaces for mappings. In this paper, we generalize the two prevalent approaches for generating mappings of data in databases: database-driven and ontology-driven, to be applicable for any other data structure; and introduce two approaches: model-driven and result-driven.


Towards a Uniform User Interface for Editing Mapping Definitions

by Pieter Heyvaert, Anastasia Dimou, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 4th Workshop on Intelligent Exploration of Semantic Data.
Keywords: RML,Linked Data,Semantic Web,RDF,World Wide Web,Web,

Modeling domain knowledge as Linked Data is not straightforward for data publishers, because they are domain experts and not Semantic Web specialists. Most approaches that map data to its RDF representation still require users to have knowledge of the underlying implementations, as the mapping definitions remained, so far, tight to their execution. Defining mapping languages enables to decouple the mapping definitions from the implementation. However, user interfaces that enable domain experts to model knowledge and, thus, intuitively define such mapping definitions, based on available input sources, were not thoroughly investigated yet. This paper introduces a non-exhaustive list of desired features to be supported by such a mapping editor, independently of the underlying mapping language; and presents the RMLEditor as prototype interface that implements these features with RML as its underlying mapping language.


What Factors Influence the Design of a Linked Data Generation Algorithm?

by Pieter Heyvaert, Anastasia Dimou, Ben De Meester, Ruben Verborgh,
Published in Proceedings of the 11th Workshop on Linked Data on the Web.
Keywords: Linked Data generation,Linked Data,

Generating Linked Data remains a complicated and intensive engineering process. While different factors determine how a Linked Data generation algorithm is designed, potential alternatives for each factor are currently not considered when designing the tools’ underlying algorithms. Certain design patterns are frequently applied across different tools, covering certain alternatives of a few of these factors, whereas other alternatives are never explored. Consequently, there are no adequate tools for Linked Data generation for certain occasions, or tools with inadequate and inefficient algorithms are chosen. In this position paper, we determine such factors, based on our experiences, and present a preliminary list. These factors could be considered when a Linked Data generation algorithm is designed or a tool is chosen. We investigated which factors are covered by widely known Linked Data generation tools and concluded that only certain design patterns are frequently encountered. By these means, we aim to point out that Linked Data generation is above and beyond bare implementations, and algorithms need to be thoroughly and systematically studied and exploited.


Towards an Interface for User-Friendly Linked Data Generation Administration

by Pieter Heyvaert, Anastasia Dimou, Wouter Maroy, Laurens De Graeve, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 15th International Semantic Web Conference: Posters and Demos.
Keywords: RML,Linked Data generation,publication,reuse,Linked Data,Semantic Web,World Wide Web,Web,

Linked Data generation and publication remain challenging and complicated, in particular for data owners who are not Semantic Web experts or tech-savvy. The situation deteriorates when data from multiple heterogeneous sources, accessed via different interfaces, is integrated, and the Linked Data generation is a long-lasting activity repeated periodically, often adjusted and incrementally enriched with new data. Therefore, we propose the RMLWorkbench, a graphical user interface to support data owners administrating their Linked Data generation and publication workflow. The RMLWorkbench’s underlying language is RML, since it allows to declaratively describe the complete Linked Data generation workflow. Thus, any Linked Data generation workflow specified by a user can be exported and reused by other tools interpreting RML.


Towards a Uniform User Interface for Editing Data Shapes

by Pieter Heyvaert, Anastasia Dimou, Ben De Meester, Ruben Verborgh,
Published in Proceedings of the 4th International Workshop on Visualization and Interaction for Ontologies and Linked Data.
Keywords: Semantic Web,RDF,World Wide Web,Web,

Data quality is an important factor for the success of the envisaged Semantic Web. As machines are inherently intolerant at the interpretation of unexpected input, low quality data produces low quality results. Recently, constraint languages such as SHACL were proposed to assess the quality of data graphs, decoupled from the use case and the implementation. However, these constraint languages were designed with machine-processability in mind. Defining data shapes requires knowledge of the language’s syntax – usually RDF – and specification, which is not straightforward for domain experts, as they are not Semantic Web specialists. The notion of constraint languages is very recent: the W3C Recommendation for SHACL was finalized in 2017. Thus, user interfaces that enable domain experts to intuitively define such data shapes are not thoroughly investigated yet. In this paper, we present a non-exhaustive list of desired features to be supported by a user interface for editing data shapes. These features are applied to unSHACLed: a prototype interface with SHACL as its underlying constraint language. For specifying the features, we aligned existing work of ontology editing and linked data generation rule editing with data shape editing, and applied them using a drag-and-drop interface that combines data graph and data shape editing. This work can thus serve as a starting point for data shape editing interfaces.


RDF Graph Validation Using Rule-Based Reasoning

by Pieter Heyvaert, Anastasia Dimou, Dörthe Arndt, Ben De Meester, Ruben Verborgh,
Published in Semantic Web Journal.
Keywords: validation,proof,reasoning,research,Semantic Web,RDF,World Wide Web,Web,

The correct functioning of Semantic Web applications requires that given RDF graphs adhere to an expected shape. This shape depends on the RDF graph and the application’s supported entailments of that graph. During validation, RDF graphs are assessed against sets of constraints, and found violations help refining the RDF graphs. However, existing validation approaches cannot always explain the root causes of violations (inhibiting refinement), and cannot fully match the entailments supported during validation with those supported by the application. These approaches cannot accurately validate RDF graphs, or combine multiple systems, deteriorating the validator’s performance. In this paper, we present an alternative validation approach using rule-based reasoning, capable of fully customizing the used inferencing steps. We compare to existing approaches, and present a formal ground and practical implementation “Validatrr”, based on N3Logic and the EYE reasoner. Our approach – supporting an equivalent number of constraint types compared to the state of the art – better explains the root cause of the violations due to the reasoner’s generated logical proof, and returns an accurate number of violations due to the customizable inferencing rule set. Performance evaluation shows that Validatrr is performant for smaller datasets, and scales linearly w.r.t. the RDF graph size. The detailed root cause explanations can guide future validation report description specifications, and the fine-grained level of configuration can be employed to support different constraint languages. This foundation allows further research into handling recursion, validating RDF graphs based on their generation description, and providing automatic refinement suggestions.


Mapping languages: analysis of comparative characteristics

by Pieter Heyvaert, Anastasia Dimou, Ben De Meester, Ruben Verborgh,
Published in Proceedings of First Knowledge Graph Building Workshop.
Keywords: RDF,

RDF generation processes are becoming more interoperable, reusable, and maintainable due to the increased usage of mapping languages: languages used to describe how to generate an RDF graph from (semi-)structured data. This leads to a rise of new mapping languages, each with different characteristics. However, it is not clear which mapping language can be used for a given task. Thus, a comparative framework is needed. In this paper, we investigate a set of mapping languages that inhibit complementary characteristics, and present an initial set of comparative characteristics based on requirements put forward by those mapping languages. Initial investigation found 9 broad characteristics, classified in 3 categories. To further formalize and complete the set of characteristics, further investigation is needed, requiring a joint effort of the community.


Rule-driven inconsistency resolution for knowledge graph generation

by Pieter Heyvaert, Anastasia Dimou, Ben De Meester, Ruben Verborgh,
Published in Semantic Web Journal.
Keywords: annotation,rules,

Knowledge graphs, which contain annotated descriptions of entities and their interrelations, are often generated using rules that apply semantic annotations to certain data sources. (Re)using ontology terms without adhering to the axioms defined by their ontologies results in inconsistencies in these graphs, affecting their quality. Methods and tools were proposed to detect and resolve inconsistencies, the root causes of which include rules and ontologies. However, these either require access to the complete knowledge graph, which is not always available in a time-constrained situation, or assume that only generation rules can be refined but not ontologies. In the past, we proposed a rule-driven method for detecting and resolving inconsistencies without complete knowledge graph access, but it requires a predefined set of refinements to the rules and does not guide users with respect to the order the rules should be inspected. We extend our previous work with a rule-driven method, called Resglass, that considers refinements for generation rules as well as ontologies. In this article, we describe Resglass, which includes a ranking to determine the order with which rules and ontology elements should be inspected, and its implementation. The ranking is evaluated by comparing the manual ranking of experts to our automatic ranking. The evaluation shows that our automatic ranking achieves an overlap of 80% with experts ranking, reducing this way the effort required during the resolution of inconsistencies in both rules and ontologies.


Using EPUB 3 and the Open Web Platform for Enhanced Presentation and Machine-Understandable Metadata for Digital Comics

by Pieter Heyvaert, Wesley De Neve, Tom De Nies, Rik Van de Walle, Joachim Van Herwegen, Erik Mannens, Miel Vander Sande, Ruben Verborgh,
Published in Proceedings of the 19th International Conference on Electronic Publishing.
Keywords: JavaScript,metadata,World Wide Web,Web,

Various methods are needed to extract information from current (digital) comics. Furthermore, the use of different (proprietary) formats by comic distribution platforms causes an overhead for authors. To overcome these issues, we propose a solution that makes use of the EPUB 3 specification, additionally leveraging the Open Web Platform to support animations, reading assistance, audio and multiple languages in a single format, by using our JavaScript library comicreader.js. We also provide administrative and descriptive metadata in the same format by introducing a new ontology: Dicera. Our solution is complementary to the current extraction methods, on the one hand because they can help with metadata creation, and on the other hand because the machine-understandable metadata alleviates their use. While the reading system support for our solution is currently limited, it can offer all features needed by current comic distribution platforms. When comparing comics generated by our solution to EPUB 3 textbooks, we observed an increase in file size, mainly due to the use of images. In future work, our solution can be further improved by extending the presentation features, investigating different types of comics, studying the use of new EPUB 3 extensions, and by incorporating it in digital book authoring environments.


Parallel RDF generation from heterogeneous big data

by Pieter Heyvaert, Anastasia Dimou, Wouter Maroy, Gerald Haesendonck, Ruben Verborgh,
Published in Proceedings of the International Workshop on Semantic Big Data.
Keywords: RML,RDF,

To unlock the value of increasingly available data in high volumes, we need flexible ways to integrate data across different sources. While semantic integration can be provided through RDF generation, current generators insufficiently scale in terms of volume. Generators are limited by memory constraints. Therefore, we developed the RMLStreamer, a generator that parallelizes the ingestion and mapping tasks of RDF generation across multiple instances. In this paper, we analyze what aspects are parallelizable and we introduce an approach for parallel RDF generation. We describe how we implemented our proposed approach, in the frame of the RMLStreamer, and how the resulting scaling behavior compares to other RDF generators. The RMLStreamer ingests data at 50% faster rate than existing generators through parallel ingestion.


Modeling, Generating, and Publishing Knowledge as Linked Data

by Ruben Taelman, Pieter Heyvaert, Anastasia Dimou, Ruben Verborgh,
Published in Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management.
Keywords: Linked Data publication,XML,RML,Linked Data generation,Linked Data Fragments,Web API,Triple Pattern Fragments,Linked Data,Semantic Web,JSON,World Wide Web,Web,

The process of extracting, structuring, and organizing knowledge from one or multiple data sources and preparing it for the Semantic Web requires a dedicated class of systems. They enable processing large and originally heterogeneous data sources and capturing new knowledge. Offering existing data as Linked Data increases its shareability, extensibility, and reusability. However, using Linking Data as a means to represent knowledge can be easier said than done. In this tutorial, we elaborate on the importance of semantically annotating data and how existing technologies facilitate their mapping to Linked Data. We introduce [R2]RML languages to generate Linked Data derived from different heterogeneous data sources (databases, XML, JSON, …) from different interfaces (documents, Web APIs, …). Those who are not Semantic Web experts can annotate their data with the RMLEditor, whose user interface hides all underlying Semantic Web technologies to data owners. Last, we show how to easily publish Linked Data on the Web as Triple Pattern Fragments. As a result, participants, independently of their knowledge background, can model, annotate and publish data on their own.


Graph-Based Editing of Linked Data Mappings using the RMLEditor

by Pieter Heyvaert, Anastasia Dimou, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 13th Extended Semantic Web Conference: Posters and Demos.
Keywords: RML,Linked Data generation,Linked Data,Semantic Web,World Wide Web,Web,

Linked Data is in many cases generated from (semi-)structured data. This generation is supported by several tools, a number of which use a mapping language to facilitate the Linked Data generation. However, knowledge of this language and other used technologies is required to use the tools, limiting their adoption by non-Semantic Web experts. We demonstrate the RMLEditor: a graphical user interface that utilizes graphs to easily visualize the mappings that deliver the rdf representation of the original data. The required amount of knowledge of the underlying mapping language and the used technologies is kept to a minimum. The RMLEditor lowers the barriers to create Linked Data by aiming to also facilitate the editing of mappings by non-experts.


RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings

by Pieter Heyvaert, Anastasia Dimou, Rik Van de Walle, Erik Mannens, Dimitri Schuurman, Aron-Levi Herregodts, Ruben Verborgh,
Published in The Semantic Web: Latest Advances and New Domains (ESWC 2016).
Keywords: RML,research,Linked Data,RDF,

Although several tools have been implemented to generate Linked Data from raw data, users still need to be aware of the underlying technologies and Linked Data principles to use them. Mapping languages enable to detach the mapping definitions from the implementation that executes them. However, no thorough research has been conducted on how to facilitate the editing of mappings. We propose the RMLEditor, a visual graph-based user interface, which allows users to easily define the mappings that deliver the RDF representation of the corresponding raw data. Neither knowledge of the underlying mapping language nor the used technologies is required. The RMLEditor aims to facilitate the editing of mappings, and thereby lowers the barriers to create Linked Data. The RMLEditor is developed for use by data specialists who are partners of (i) a companies-driven pilot and (ii) a community group. The current version of the RMLEditor was validated: participants indicate that it is adequate for its purpose and the graph-based approach enables users to conceive the linked nature of the data.


Merging and Enriching DCAT Feeds to Improve Discoverability of Datasets

by Pieter Heyvaert, Pieter Colpaert, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 12th Extended Semantic Web Conference: Posters and Demos.
Keywords: DBpedia,Triple Pattern Fragments,SPARQL,World Wide Web,Web,

Data Catalog Vocabulary (DCAT) is a W3C specification to describe datasets published on the Web. However, these catalogs are not easily discoverable based on a user’s needs. In this paper, we introduce the Node.js module "dcat-merger" which allows a user agent to download and semantically merge different DCAT feeds from the Web into one DCAT feed, which can be republished. Merging the input feeds is followed by enriching them. Besides determining the subjects of the datasets, using DBpedia Spotlight, two extensions were built: one categorizes the datasets according to a taxonomy, and the other adds spatial properties to the datasets. These extensions require the use of information available in DBpedia’s SPARQL endpoint. However, public SPARQL endpoints often suffer from low availability, so a Triple Pattern Fragments alternative is used. However, the need for DCAT Merger sparks the discussion for more high level functionality to improve a catalog’s discoverability.


Semantically Annotating CEUR-WS Workshop Proceedings with RML

by Pieter Heyvaert, Anastasia Dimou, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 12th Extended Semantic Web Conference: Semantic Publishing Challenge.
Keywords: RML,SPARQL,RDF,

In this paper, we present our solution for the first task of the second edition of the Semantic Publishing Challenge. The task requires extracting and semantically annotating information regarding CEUR-WS workshops, their chairs and conference affiliations, as well as their papers and their authors, from a set of html-encoded workshop proceedings volumes. Our solution builds on last year’s submission, while we address a number of shortcomings, assess the generated dataset for its quality and publish the queries as SPARQL query templates. This is accomplished using the RDF Mapping Language (RML) to define the mappings, RMLProcessor to execute them, RDFUnit to both validate the mapping documents and assess the generated dataset’s quality, and The DataTank to publish the SPARQL query templates. This results in an overall improved quality of the generated dataset that is reflected in the query results.


Linked Data-enabled Gamification in EPUB 3 for Educational Digital Textbooks

by Pieter Heyvaert, Rik Van de Walle, Erik Mannens, Ruben Verborgh,
Published in Proceedings of the 10th European Conference on Technology Enhanced Learning.
Keywords: JavaScript,Linked Data,

Interest in eLearning environments is constantly increasing, as well as in digital textbooks and gamification. The advantages of gamification in the context of education have been proven. However, gamified educational material, such as digital textbooks and digital systems, are scarce. As an answer to the need for such material, the framework GEL (Gamification for EPUB using Linked Data) has been developed. GEL allows to incorporate gamification concepts in a digital textbook, using EPUB 3 and Linked Data. As part of GEL, we created the ontology GO (Gamification Ontology), representing the different gamification concepts, and a JavaScript library. Using GO allows to discover other gamified systems, to share gamification concepts between applications and to separate the processing and representation of the gamification concepts. Our library is interoperable with any JavaScript-based e-reader, which promotes its reusability.


Conformance Test Cases for the RDF Mapping Language (RML)

by Pieter Heyvaert, Anastasia Dimou, Oscar Corcho, Freddy Priyatna, Erik Mannens, David Chaves-Fraga, Ruben Verborgh,
Published in Proceedings of the 1st Iberoamerican Knowledge Graphs and Semantic Web Conference.
Keywords: CARML,CSV,XML,R2RML,RMLMapper,RML,annotation,rules,programming,RDF,JSON,

Knowledge graphs are often generated using rules that apply semantic annotations to data sources. Software tools then execute these rules and generate or virtualize the corresponding RDF-based knowledge graph. RML is an extension of the W3C-recommended R2RML language, extending support from relational databases to other data sources, such as data in CSV, XML, and JSON format. As part of the R2RML standardization process, a set of test cases was created to assess tool conformance the specification. In this work, we generated an initial set of reusable test cases to assess RML conformance. These test cases are based on R2RML test cases and can be used by any tool, regardless of the programming language. We tested the conformance of two RML processors: the RMLMapper and CARML. The results show that the RMLMapper passes all CSV, XML, and JSON test cases, and most test cases for relational databases. CARML passes most CSV, XML, and JSON test cases regarding. Developers can determine the degree of conformance of their tools, and users determine based on conformance results to determine the most suitable tool for their use cases.