The Web consists of a large amount of information which can be used to answer questions people may have. As more and more data becomes available, more resources are needed to find the right data to answer a question. Currently, big tech companies like Google, Amazon or Facebook solve this by centralizing all these data, integrating them into knowledge graphs and providing (paid) services to query these data. However, big tech companies control which data they integrate, for example: Amazon integrates the UK railway schedule for it’s virtual assistant Alexa, but not the Belgian railway schedule. Either you limit your application to the data provided by the big tech companies or you take part in the harvesting race to generate your own knowledge graph. Although, harvesting, centralizing and integrating such amounts of data is not sustainable for smaller companies and individuals. Different approaches such as federated or decentralized generation of knowledge graphs are not considered yet.
In this Master thesis, you will investigate:
- which different approaches, besides centralized generation, can be applied for generating knowledge graphs.
- which trade-offs need to be considered when choosing a generation approach for certain use cases such as a public transport schedule.
Programming languages: Python
Technologies: RDF, RML.io, LDF/TPF, SOLID is bonus but willingness to learn