In the ever-evolving landscape of natural language processing and artificial intelligence, large language models (LLMs), incorporated in chatbots like ChatGPT, Claude and Llama, have emerged as powerful tools, hailed for their remarkable capabilities and broad applicability. Yet, despite their prowess, LLMs often encounter limitations when it comes to accessing and incorporating factual knowledge. This is where Knowledge Graphs (KGs) step in. A Knowledge Graph (KG) is a structured representation of knowledge in a graph format, typically designed to store and organize factual information about entities and their relationships. In a KG, entities are represented as nodes, and relationships between entities are represented as edges connecting these nodes. Each node in a KG typically corresponds to a real-world entity, such as a person, place, concept, or event, while edges represent various types of connections or relationships between these entities. KGs are often used to capture and represent complex knowledge in a way that is both human-readable and machine-interpretable, enabling efficient querying, reasoning, and inference over large volumes of interconnected data. Large KGs already exist, e.g. DBPedia (the Knowledge Graph variant of Wikipedia), which offer a structured reservoir of factual information explicitly designed for interpretation and inference.
By integrating KGs with LLMs, we unlock a realm of possibilities. KGs enrich LLMs by providing a wealth of external knowledge, enhancing both inference and interpretability. However, constructing KGs is no small feat. They are complex entities, constantly evolving, which poses challenges in generating new facts and representing previously unseen knowledge. Hence, the symbiosis between LLMs and KGs emerges as a promising avenue, wherein we can harness the strengths of each to bolster the other.
The overall goal of this master thesis is to investigate one or more techniques to unify LLMs and KGs. For this the student will start by making a comprehensive overview of the various manners LLMs and KGs could enrich each other and for which use cases this could be useful. Next, the student, in mutual agreement with the promotor, will pick a methodology to further work out. The research investigates various aspects of this integration, including model architectures, training methodologies, and applications across different domains.