Recent advancements in Large Language Models (LLMs) have substantially improved question answering (QA) over knowledge-intensive domains. However, fundamental challenges remain unsolved: factual grounding, traceable reasoning, and robust multi-hop inference. Existing Retrieval-Augmented Generation (RAG) approaches, even when enriched with Knowledge Graphs (KGs), often rely on free-text representations or unconstrained reasoning, resulting in hallucinated facts, incomplete evidence chains, and opaque reasoning steps.
Therefore, at IDLab-KNoWS we designed an agentic AI approach, called GraphWalker, that tightly couples LLM reasoning with explicit graph traversal. By forcing each step of the reasoning chain to correspond to deterministic KG edges, GraphWalker dramatically improves traceability and reduces hallucinations. It uses two cooperating agents: a Walker Agent for KG traversal and a Rephraser Agent for subquestion generation, achieving significant gains over state-of-the-art baselines on the MetaQA benchmark.
Despite its strong performance, several open research gaps remain:
- Scalability and efficiency issues emerge in deep multi-hop reasoning (e.g., 3+ hops, aggregation-heavy questions).
- Exploration is sometimes redundant, revisiting identical regions of the KG.
- Subquestion sequencing is strictly linear, limiting efficiency for queries requiring parallel information gathering.
- Starting-point selection relies on simplified assumptions, and better entity-selection strategies may improve performance.
- Generalization to larger, real-world KGs (e.g., Wikidata) or domain-specific KGs (e.g., biomedical, industry datasets) has not been explored.
Addressing these gaps is crucial for advancing trustworthy, interpretable, KG-grounded QA systems that can scale to real-world knowledge bases.
The goal of this thesis is to extend, optimize, and rigorously evaluate the GraphWalker framework for improved multi-hop question answering over KGs. The student will perform research in both algorithmic design and empirical evaluation.