PhD position on Decentralized Big Data Processing on the Web

Application deadline: 2022-02-31 or until the vacancy is filled
Type of contract: Full-time
Employment: 1 year, which is extended with until 4 years after positive evaluation at the end of the first year
Starting date: As soon as possible

We especially encourage applications by candidates from diverse groups; all are welcome in our team.

Job description

Traditional Big Data technologies have shown that there are no technological limits to how much data we can store in one place. Simultaneously, it has uncovered all the economical, societal, and ethical limitations inherent to such unprecedented centralization of data. The Solid project (https://solidproject.org/) decentralizes data storage and tackles these issues by giving people back control over their own data, thereby re-enabling data-driven innovation.

In order to make a paradigm shift from centralized Big Data processing to decentralized processing, the Solid project needs to scale up to the current Volume, Velocity and Variety requirements of Big Data processing. The overall goal of this PhD is to make decentralized processing scale to the standard of current centralized Big Data technologies. The Solid project proposes to store data in personal data vaults. People must be able to control their own personal data vault, in which they guard all public and private data they or others create about them. These vaults allow individuals to decide at every moment to which people and organizations they selectively grant read or write access to specific documents. Companies can still use that data, without needing to collect or control it themselves, since they can ask permission to access the pieces of data. Data vaults are not limited to personal data. Sensors and companies can exploit the same storage principle.

The data analytics efforts will thus shift from processing a centralized very large data set as in traditional Big Data, towards processing a very large number of small and individually permissioned data sets. Data variety in decentralized data storage is high, due to the absence of a central imposed schema. To tackle this data Variety, Linked Data principles, Semantic Web technologies, Knowledge Graph processing and Reasoning techniques are being applied. These techniques will be further extended to handle decentralized data retrieval. To handle Volume, techniques from traditional Big Data processing cannot directly be applied. The goal is to design novel scalable algorithms that can target huge amounts of small datasets instead of small amounts of huge datasets. In order to handle Velocity in combination with Variety and Volume, techniques from the Stream Reasoning paradigm will be extended. Stream Reasoning is the novel stream processing paradigm that combines Knowledge graph processing with stream processing in order to simultaneously target Velocity and Variety. These techniques will be further extended to enable decentralized processing.

Are you passionate about Big Data, the Web and interested in working on decentralized knowledge graph technologies? Join our team to work on the next phase of the Web! You can contribute to the Solid ecosystem by tackling scalability issues while enabling a paradigm shift from centralized Big Data to decentralized processing. You can read more about our work on Solid here:

Your profile

  • Degree: Master’s degree in Computer Science, Engineering, Informatics, ICT, Mathematics or related field
  • Advanced programming skills in at least 1 major programming language
  • You have a strong interest in data analytics, reasoning and/or the Semantic Web, and are eager to advance the state-of-the-art.
  • Experience with Solid or Semantic Web is not obliged, but strong interest is required.
  • Fluent in English, spoken and written
  • Self-directed and able to perform independent work
  • Enthusiastic about working in a research environment
  • Both young graduates and candidates with (industrial) experience are welcome.

KNoWS & KM team - IDLab (Ghent University, Belgium)

IDLab is a core research group of imec, a world-leading research and innovation hub in nanoelectronics and digital technologies, with research activities at Ghent University. IDLab performs fundamental and applied research on data science and internet technology, and is, with over 300 researchers, one of the larger research groups at imec. Our major research areas are machine learning and data mining; semantic intelligence; multimedia processing; distributed intelligence for IoT; cloud and big data infrastructures; wireless and fixed networking; electromagnetics, RF and high-speed circuits and systems.

The knowledge Management (KM) & Knowledge on Web Scale (KNoWS) teams are embedded in this stimulating environment.

KNoWS covers research on the full (semantic) data management ecosystem, namely starting from methodologies to more easily and performantly generate Linked Data (e.g. by design of the RML mapping language and ecosystem), over research on Web APIs to publish this Linked Data scalably and reliably on the Web (e.g. by the design of Linked Data Fragments), to the research on paradigms and techniques to query Linked Data while performing reasoning (e.g. with the contributions to the N3 rule reasoning formalism & the design of Comunica, a modular query engine for semantic data).

KM performs research into a) expressive semantic stream and distributed reasoning, b) the incorporation of expert knowledge in data analytics algorithms, c) hybrid AI, fusing semantic models and machine learning, and d) explainable AI by leveraging Knowledge Graphs. This research is mainly applied to the domains of predictive healthcare and industry 4.0 in order to realize context-aware and personalized decision support systems.

Our offer

You receive the opportunity to perform full-time research in a highly international and friendly working environment, with a competitive salary. Grounded in fundamental academic research, as a PhD candidate you will also participate in collaborative research with industrial and academic partners in Flanders and on a wider geographic scale in new and ongoing projects. You will publish your research results at major international conferences and in journal papers.

Interested?

Send your application by email or any questions concerning this vacancy to Femke Ongenae and Pieter Bonte, indicating "Job application: PhD position on Decentralized Big Data Processing on the Web" in the subject. Applications should include

  1. an academic/professional resume,
  2. a personal motivation letter, and
  3. transcripts of study results.

After screening, selected candidates will be invited for an online interview as a first contact in a multi-stage selection process.