background image for GxP Lifeline
GxP Lifeline

RAGulating Compliance With GenAI: A Multi-Agent Knowledge Graph Approach for Regulatory QA


Image of a life science professional using MasterControl’s ThinkJSON LLM on a tablet.

This second installment in the MasterControl product team's artificial intelligence (AI) blog series focuses on the team's efforts to build our AI-powered solutions around Retrieval-Augmented Generation (RAG) and Knowledge Graph (KG) as primary paradigms. For further insight into MasterControl's AI innovations, you can read the first blog post in the series here.

INTRODUCTION

Compliance with regulatory bodies is critical for life sciences manufacturers to safely and effectively deliver products to patients. Regulations globally cover a broad range of requirements, including drug approval processes, labeling requirements, clinical trial standards, and more. Production must strictly adhere to these standards, and to multiple standards for global distribution. Yet few people truly know, and even understand, these regulatory requirements well enough to ensure compliance.

When faced with those critical regulatory questions, professionals need answers that are both precise and trustworthy. Generic AI chatbots may offer quick responses, but in regulatory matters, they present a dangerous proposition: confident-sounding answers that might be subtly incorrect. These AI systems weren't built for specialized domains where the difference between "usually required" and "always required" could mean the difference between compliance and costly violations.

Our generative AI solution built upon Retrieval-Augmented Generation (RAG) and Knowledge Graph (KG) addresses this fundamental challenge by creating a system that never guesses. Our approach meticulously retrieves, verifies, and presents regulatory facts with direct links to their authoritative sources, ensuring human-in-the-loop review and approval of generated information.

Ultimately, for regulatory affairs professionals, quality teams, and compliance officers, this transforms how they work—replacing hours of document searching with instant, verified knowledge access. It's not about replacing human expertise but rather amplifying it—providing a trusted advisor that helps navigate regulatory complexity with unprecedented speed and accuracy.

The Problems

Most of the issues that arise when applying commonly used AI tools in regulatory settings fall into two categories:

  • Mountains of data: Regulatory and proprietary.
  • Generic chatbots fall short: Large language models (LLMs) can rephrase rules but often "hallucinate" details or miss subtle exceptions or "generate" for a wrong recipient. In the realm of regulatory compliance, "almost right" is wrong.

The Big Idea: Triplets → Graph → Smart Answers

To overcome the problems that plague most LLMs, MasterControl takes a three-step approach:

  1. Break every sentence into a tiny fact, specifically, a subject–predicate–object (SPO) triplet.
    Example: "FDA requires manufacturers to submit reports within 15 days."
  2. Store those triplets (plus the original text) as vectors in a database you can search by meaning.
  3. Use a swarm of agents to keep everything clean, searchable, and up to date. When you ask a question, the system grabs the right triplets, shows you the evidence, and an LLM stitches them into a fluent, traceable answer.

Think of it as Google Maps for regulations: the graph is the map, the agents are the GPS satellites, and the LLM is the friendly voice telling you where to turn.

Who Does What? Meet the Agents.

The following is a list of the agents MasterControl employs, their specific purposes, and the reasons they matter.

AgentJobWhy It Matters
IngestionSplits raw documents into logical chunks, tags metadata.Keeps source text organized.
ExtractionFinds SPO triplets with an LLM.Turns prose into structured data.
CleaningDeduplicates, resolves synonyms.Prevents graph spaghetti.
IndexerEmbeds triplets and text, pushes to vector database.Makes semantic search fast.
RetrievalPulls the most relevant triplets for a query.Guarantees factual grounding.
Story Builder and GeneratorAssembles evidence, lets the LLM draft the answer.Provides fluent, verifiable responses.

Why Skip a Formal Ontology (for Now)?

Traditional knowledge graphs start with a rigid schema. We go ontology-free, meaning we let patterns emerge organically and add structure only when needed. This keeps ingestion fast and maintenance cheap yet still allows future "power-user" reasoning once clusters form.

How Does It Work?

Does It Work?

The effectiveness of our AI tools is determined by addressing three fundamental aspects:

1. Section Overlap: Did the system fetch the right source?

Similarity/ApproachWithout TripletsWith Triplets
0.50.0811840.074507
0.60.2699900.214310
0.750.1684470.288833

Similarity threshold controls how "strict" the system is in counting the fetched sources. At the lower threshold it is more permissive, at higher it considers fetched sources correct if similarity score is stronger. Highest right source fetching rate is achieved with our system of triplets 0.288833 at stricter similarity threshold of 0.75.

2. Answer Accuracy: Are triplets helping the LLM to fact-check the outputs?

Factual accuracy of 4.7 for the answers obtained with triplets system has been measured on a random sample of questions with ground truth answers from the corpus. It has been rated on a scale from 1 to 5:

1 - Strongly Disagree (mostly incorrect or hallucinated).
2 - Disagree (contains major factual errors).
3 - Neutral (partially correct but misses or misstates some points).
4 - Agree (mostly correct, with minor issues).
5 - Strongly Agree (factually accurate and complete).

3. Navigation Metric: Can users jump across related sources via shared triplets?

There is an "original" citations network in regulatory corpus. For example, section XX can be mentioned/cited in section YY. However, this network cannot help to surface the semantic network of "time of appeal" topic. That would only be possible with the system of triplets.

The average degree of citation network (total number of connections/number of nodes) is 1.2939, while average degree of triplets network is 1.6080 signifying denser, more interconnected system. Triplets network is making up for gaps in citation network: among 5014 sections which are not connected through citations, 5011 sections are connected through triplets. And finally, the average shortest path (how many hops) of citation network is 2.0167, while in triplets network it is 1.33, which facilitates faster flow of information.

Cool Extra: Interactive Subgraph Views

When the system answers, it also displays a minigraph of the rules it used. Stakeholders can click through to verify each source—no more blackbox trust.

Challenges and Next Steps

To further refine our AI tools and make them even more powerful, our team continues to work on addressing the following three intricacies:

  • Vocabulary drift: Multiple ways to spell the same requirement need constant merging.
  • Deep reasoning: Multistep logic (e.g., timebound exceptions) still needs smarter rules.
  • Scaling updates: Incremental re-embedding keeps costs low when rules change hourly.

Looking ahead, we see multiple avenues for enhancing and extending the system: although our current pipeline supports factual lookups, more complex regulatory questions demand deeper logical reasoning or chaining of evidence, and integration with advanced reasoning LLMs can address multistep analysis and domain-specific inference needs.

Viktoria Rojkova

Dr. Rojkova has been building and operating revenue-generating machine learning services and helping companies integrate AI for more than 15 years.

Prior to MasterControl, she led the team of ML and ML Ops engineers at Deloitte to build and support multimodal applications, such as computer vision and predictive maintenance for power and utilities, medical image segmentation, spoken task-oriented language-agnostic dialogue assistants, knowledge graphs, and policy learning for healthcare and life sciences. She also carries ML and NLP experience from Apple, LifeLock/IDAnalytics, and Kernel.

Dr. Rojkova completed her undergraduate degree in neuroscience at Moscow State University before completing a master's degree in psychology and cognitive neuroscience at the University of Illinois- Urbana Champaign and PhD in Computer Science at the University of Louisville. She has authored and co-authored papers and patents in the field of applied AI and ML.


[ { "key": "fid#1", "value": ["GxP Lifeline Blog"] } ]