There are increasing concerns about the centralization of personal data that has resulted from over a decade of Big Data thinking, substantiated by data scandals, e.g. Cambridge Analytica. This instigates a paradigm shift toward decentralization of personal data. People must be able to control their own personal data vault, in which they guard all public and private data they or others create about them. These vaults allow individuals to decide at every moment to which people and organizations they selectively grant read or write access to specific documents. Companies can still use that data, without needing to collect or control it themselves, since they can ask permission to access the pieces of data. Data vaults are not limited to personal data. Sensors and companies can exploit the same storage principle.
From a technical perspective, the data analytics effort will thus shift from processing a centralized very large data set, towards processing a very large number of small and individually permissioned data sets. Moreover, decentralized data is inherently varied, as there exists no central agreement on data formats. As such the key to sustainability is that every data vault provider implements a universally accepted standard. As such each service provider can also adopt this standard and thus request and process data from any data pod provider, with the person’s permission. Solid provides a collection of standards and data formats/vocabularies that allow to set up data vaults that store data in the form of documents with the appropriate access control mechanisms, e.g. identity, authentication, permission lists, etc. In Solid each vault stores data in the form of documents containing Linked Data. Semantic Web reasoning techniques, exploiting the explicit semantics of Linked Data, can be employed to perform data analysis. Semantic Web schema/ontology alignment through reasoning is thus essential to decentralized data, as each vault is maintained individually, making it impossible for each data vault to use the same data format as in a centralized scenario.
While the community is currently gaining momentum and rapidly realizing decentralized solutions for the storage and querying of data in personal vaults, fundamental research questions arise from a service provisioning viewpoint concerning the scalable and performant processing of all the decentralized data. We will tackle the problem from a querying perspective: a service provider has an information need, expressed as a query, and wishes it to be answered. This requires solving a federated query and reasoning problem with a high number of independent and varied data sources, requiring more complex algorithms while having less computational power per node than centralized systems. Moreover, companies are increasingly dealing with high velocity streaming data, e.g. collected by sensors or social media streams. There is a need for research on exploiting data locality, processing data close to its production site in order to greatly reduce the amount of data that needs to be transmitted, such that we can achieve low-latency and high-velocity processing.
Therefore, in this PhD, we want to address this problem by using a network of query and reasoning agents, each of which can be deployed on any network node and contributes partial results to a query. Processing nodes are heterogeneous, as they can have different processing resources and unpredictable network bandwidths. The solution will need to fully exploit data locality, processing the data close to the source.
The overall aim of the PhD is to design performant, scalable and transparent Big Data semantic reasoning techniques across decentralized storage and processing nodes in order to pave the way towards realization of the decentralization vision in which every user, company or device stores their data in their own personal data vault. Decentralized Big Data can solve the problems of its centralized counterpart, however, the following four fundamental research objectives need to be addressed in the PhD:
- Design of decentralized analytics that can autonomously distribute the data analytics across the data stored in the distributed processing nodes, while hiding the complexity of the network..
- Investigation of methodologies that exploit the heterogeneity of the decentralized network to achieve scalable and performant decentralized analytics, while maintaining its correctness.
We offer a competitive salary with interesting social benefits and a challenging, stimulating and pleasant research environment, where you can contribute to worldwide research on AI in healthcare. During your research, the following activities will be part of your work:
- Analyze the current existing semantic reasoning / big data analytics solutions for decentralized data processing and design new algorithms and methodologies for realizing performant, scalable and transparent semantic reasoning across personal data vaults.
- Build-up hands-on experience by implementing the designed algorithms & evaluating them on data collected from patients.
- Thoroughly evaluate the designed algorithms, both through simulations as well as real life data offered by companies/organizations involved in the Solid community.
- Participate in European and national research projects, in collaboration with industry and governmental organizations.
-Publish and present the research results at international conferences and in scientific journals.
- Work towards realized a PhD in about 4 years.
- Build towards a future research career (in academia or industry) through project experience and high-profile scientific publications or towards a promising industry career in data analytics through collaborations with several high-impact industry partners.
KNoWS & KM team - IDLab (Ghent University, Belgium)
IDLab is a core research group of imec, a world-leading research and innovation hub in nanoelectronics and digital technologies, with research activities at Ghent University. IDLab performs fundamental and applied research on data science and internet technology, and is, with over 300 researchers, one of the larger research groups at imec. Our major research areas are machine learning and data mining; semantic intelligence; multimedia processing; distributed intelligence for IoT; cloud and big data infrastructures; wireless and fixed networking; electromagnetics, RF and high-speed circuits and systems.
The knowledge Management (KM) & Knowledge on Web Scale (KNoWS) teams are embedded in this stimulating environment.
KNoWS covers research on the full (semantic) data management ecosystem, namely starting from methodologies to more easily and performantly generate Linked Data (e.g. by design of the RML mapping language and ecosystem), over research on Web APIs to publish this Linked Data scalably and reliably on the Web (e.g. by the design of Linked Data Fragments), to the research on paradigms and techniques to query Linked Data while performing reasoning (e.g. with the contributions to the N3 rule reasoning formalism & the design of Comunica, a modular query engine for semantic data).
KM performs research into a) expressive semantic stream and distributed reasoning, b) the incorporation of expert knowledge in data analytics algorithms, c) hybrid AI, fusing semantic models and machine learning, and d) explainable AI by leveraging Knowledge Graphs. This research is mainly applied to the domains of predictive healthcare and industry 4.0 in order to realize context-aware and personalized decision support systems.
We offer the opportunity to do full-time research in an international (with over 17 nationalities at IDLab, part of imec and Ghent University) and friendly working environment, with a competitive salary at Ghent University. While grounded in fundamental academic research, as a PhD candidate you will also participate in collaborative research with industrial and/or academic partners in Flanders and/or on a wider geographic scale (e.g., EU H2020 projects), in the framework of new/ongoing projects. Furthermore, you will publish your research results at major international conferences and in journal papers, as part of meeting the requirements for your PhD. The PhD position is available starting fall 2021.