The Open Science movement is big in Europe and the rest of the world. Its aim is to make all science outputs available for anyone to find and use: not just scientists in the particular domain, but any scientist, but also any teacher, student, journalist, policy-maker, company, … anyone and everyone, including robots or AI systems that are there to help them out. Much work has been going on in the past decade or so, building a consensus on what this means and how to do it. The current focus is on solving a number of technical problems that are limiting the practicality of Open Science: the fact that it is not enough to make science open, it has to actually be findable, understandable, and usable. One focus here is on interoperability: that data can be opened and read by anyone; and that data can be found by anyone by the use of accompanying descriptive metadata that is understandable to everyone.
Semantic web technology is an important instrument to overcome these interoperability challenges of Open Science. Imagine that all science is described, annotated, formulated so that it can all be “knit together” in a "Science Knowledge Graph", just like how web-pages are connected. Inside the marine domain, this has been happening on various levels ranging from individual datasets, through to individual institute programmes within many EU / EOSC initiatives and projects, all the way up to the UN / Unesco-level work on the Ocean Info Hub.
The promised end result of a science knowledge graph will be to produce the experience of querying a single database, via the web, that contains all the science that is. It is safe to say we are not there yet, but the pathway to this goal is well-established and being worked on from all sorts of directions by many organisations. At VLIZ, we have been working on a set of python services and libraries that together allow a single data scientist to easily build up a local aggregate of what can be found "out there", which they can then use as a basis for data analysis. We apply this technique already in a number of concrete projects and dashboards under development.
While we named this platform K-GAP (Knowledge Graph Analysis Platform), we have found that it is very useful in detecting actual knowledge gaps, giving the name a double meaning.
During the internship the ambition is to do practical work (create/modify code and specifications, write documentation, develop presentations) to further the development of our own K-GAP platform, specific uses of it, and to pragmatically bridge the gaps in the platform.
Given the early stages of this approach, the field and opportunities are quite open an varied:
- Institute identifiers are a useful way to link people, projects, and institutes in the knowledge graph. Unfortunately, The fact that ROR.org, one of the registries of institutes, is not yet published as linked open data is an issue for us, as K-GAP ingests LOD formats. A clever usage of one of our semantic-uplifting tools on top of the provided ROR.org APIs will allow us to produce a publicly-available LOD publication of the ROR registry. Additional application of w3id.org and some .htaccess files could make that conform to classic LOD expectations about de-referenceability and content-negotiation.
- Active participation on some of our various K-GAP based projects will give a further practical experience of how to use this tool, as well as potentially disclose some other gaps to close (i.e. we need user testing!).
- Finally, since K-GAP is still under development, there are a number of extra new features in the books that are waiting for a willing contributor:
- an LDES client
- an aggregator of various LOD resources
- a set of export-import routines to
- support for alternative triple stores
The process will be to first investigate the problem space and gain some experience with our LOD supporting python libraries. The main task is to create the ror.org/LOD publication as described above. But depending on how well that goes, we see a vast opportunity for related and very useful follow up tasks.