Beyond Retrieval: Why AI Agents Need a Context Graph, Not a Vector Store

The Problem Nobody Talks About

Imagine you are building an AI agent for a sales team. Its job is to evaluate renewal discounts. It needs to consider the customer's history, your company's discount policies, what exceptions have been made before, and whether those exceptions worked out.

You build a RAG pipeline. You feed it policy documents, past deals, CRM notes. It retrieves relevant chunks and makes a recommendation.

It works. Until it doesn't.

Six months in, the agent approves an unusually large discount for a customer. A human reviews it and flags it — this exact same situation happened four months ago with a different customer, the exception was denied, and the account churned anyway. The agent had no idea. It had the documents, but it had no memory of the decision, no knowledge of the outcome, and no way to connect the two customers as belonging to the same risk pattern.

This is the problem nobody talks about when discussing RAG. RAG is great at retrieving information. It is not designed to remember decisions, track outcomes, or reason about institutional experience.

That gap is what a Context Graph is built to fill.

Traditional RAG — What It Is and What It Gets Right

Retrieval-Augmented Generation is one of the most important ideas in applied AI. Before it, language models could only reason about information baked into their weights at training time. RAG changed that by giving models access to external knowledge at inference time.

The core loop is elegantly simple:

The RAG Pipeline

Knowledge

Corpus

Documents, PDFs, web pages — your external knowledge source

↓ chunk + embed

Vector

Store

Embeddings + metadata stored for similarity lookup

↓ user query embeds → top-k similarity

Retrieved

Chunks

Most relevant passages, selected by cosine similarity

↓ + original query

LLM

Generation

“Based on these documents...” → grounded response

What RAG gets right:

It grounds LLM responses in actual source material, reducing hallucination
It allows models to reason over private, domain-specific knowledge
It separates the knowledge store from the model weights, making updates easy
It is simple to implement and scales well

For a huge class of problems — customer support bots, document Q&A, research assistants, internal knowledge bases — RAG is the right tool. Do not overcomplicate it.

The Five Gaps in Traditional RAG

When you move from “answer questions about documents” to “make consistent decisions over time,” RAG starts to show fundamental limitations.

Gap 1: It retrieves documents, not decisions

RAG's unit of knowledge is a passage — a chunk of text from a document. But decisions are not passages. A decision has a reasoning chain, a policy it was evaluated against, a who approved it, a what happened as a result, and connections to specific entities and events. None of that structure survives chunking.

When you ask a RAG system “what did we decide last time this happened?”, it finds you the document that describes the situation. It cannot find you the decision record with its full context.

Gap 2: Every query is stateless

A RAG system has no memory between queries. Each question is answered in isolation. This is fine for Q&A. It is fatal for agents that need to act consistently. The same agent asked the same question on different days may give different answers, with no audit trail to compare them.

Gap 3: Entity identity is invisible

In the real world, “Acme Corp” in your CRM, “acme-corporation” in your ticketing system, “Acme” in your Slack, and “ACME” in your billing system are all the same entity. Traditional RAG treats them as four different strings. When you retrieve context, you get fragmented, siloed pictures of the same company because the system has no concept of entity identity across sources.

Gap 4: No feedback loop

Did the recommendation work? Did the decision have good outcomes? Did the human override the agent's suggestion? RAG systems have no mechanism to capture any of this. The knowledge store is static — it grows when you add documents, but it never learns from what happened after retrieval.

Gap 5: Time is flat

Documents have a created date, but RAG treats all knowledge as equally current. A pricing policy from eighteen months ago and a pricing policy from last week look the same to the retriever. There is no native concept of “this information expired” or “this decision was later overridden.”

Graph Databases — A Different Lens

Graph databases (Neo4j, Amazon Neptune, and others) model the world differently from both relational databases and vector stores. Instead of tables or vectors, the fundamental primitives are nodes (things) and edges (relationships between things).

Relationship-First Data Model

Customer

—— involves ——→

Opportunity

↓ has_ticket

↓ has_incident

Ticket

←— caused_by ——

Incident

Graph databases shine at relationship traversal — questions like:

Which customers are affected by this infrastructure failure?
What is the shortest path between these two entities?
Find all entities within two hops of this node that share this property

What graph databases get right:

Relationship-first data modeling that matches how domains actually work
Efficient multi-hop traversal without expensive joins
Explicit, auditable schema for entity types and relationship types
Strong consistency and transactional guarantees

Where they fall short for AI agents:

They do not natively support semantic similarity search. You cannot ask “find me nodes conceptually similar to this situation” — you can only traverse edges.
Populating them requires explicit, structured input. They do not ingest unstructured text from Slack, emails, or incident notes.
They have no concept of embedding-based ranking or relevance scoring — every result is equally “found” or “not found.”
They are not designed to store reasoning chains or model outputs as first-class objects.

A graph database tells you what is connected. It does not tell you what is relevant.

GraphRAG — The Bridge That Almost Works

Microsoft Research's GraphRAG (2024) was a genuine advance. The insight was: if you extract entities and relationships from your document corpus and build a knowledge graph from them, you can answer questions that require multi-hop reasoning — the kind of questions that pure vector similarity search fails at.

The GraphRAG Pipeline

Source

Documents

Raw document corpus — PDFs, articles, reports

↓ LLM-powered NLP pipeline

Extraction

Entity + Relation

LLM identifies named entities, relationships, and community clusters

inferred,
not ground truth

↓ build graph

Knowledge

Graph

Extracted entities + relationships, batch snapshot

↓ query + graph traversal + community summarization + vector similarity

LLM

Answer

Synthesized answer with multi-hop reasoning

What GraphRAG gets right:

Answers questions that require connecting information across many documents
Community detection surfaces themes and clusters that pure retrieval misses
The graph structure makes reasoning chains more explainable

Where GraphRAG still falls short:

The fundamental issue is that the graph is derived from documents by an LLM. That means:

Entity extraction is imperfect — the same real-world entity may appear as multiple nodes
Relationships are inferred, not ground truth — they carry the hallucination risk of the extraction model
It is a batch process — run once over your corpus, not updated in real time from operational systems
It still has no memory of decisions, no outcome tracking, no feedback loop
It answers questions about what is in your documents, not what your organization has decided and learned

GraphRAG is excellent for knowledge synthesis over a large document corpus. It is not designed for operational decision memory.

The Context Graph — A New Mental Model

The Context Graph starts from a different question: not “what information is relevant?” but “what has my organization decided, in similar situations, and what happened?”

This reframes the problem entirely. The unit of knowledge is not a document or a passage. It is a decision — a structured record of what situation prompted it, what context was gathered, what policy was applied, what was decided and by whom, what precedents were cited, what actually happened afterward, and whether a human later overrode it.

Surrounding each decision is a graph of relationships to the entities involved, the context fragments that informed it, and the precedent decisions it built on. The result is an organizational memory that gets smarter with use.

How a Context Graph Handles a Decision

“What should we do about Acme Corp's renewal request?”

↓

Context Fragments

Gather signals from live systems: CRM note, PD alert, Slack thread, ticket

↓

Entity Graph

Resolve all mentions to canonical entities — “Acme Corp” is one node across all systems

↓

Decision Traces

Search for precedent: “Similar to Beta Inc 6 months ago — approved at 15%, churned anyway”

↓

New Decision Trace

Agent reasons and decides — new trace written with edges to entities, fragments, and precedents

↓

Outcome Tracking

Did it work? Outcome feeds back into the graph as a quality signal for future searches

Core Concepts of a Context Graph

Context Fragments

A context fragment is a structured capture of a signal from an operational system — a support ticket, a Slack message, an incident alert, a CRM note. Unlike a RAG chunk, a fragment retains its provenance: which system it came from, when, what entities it mentions, and how severe it is.

Fragments are the raw evidence layer. They are never directly shown to the LLM as retrieved passages. Instead, they flow through entity resolution and then attach to decisions as supporting evidence.

Entity Resolution

The cross-system identity problem is one of the hardest and most underappreciated problems in enterprise AI. “Acme Corp” in Salesforce, “acme-corp” in Jira, and “Acme” in PagerDuty are the same company. An AI agent that does not know this will give incomplete, inconsistent context.

Entity resolution maps raw mentions to canonical entity records, with a confidence score. High-confidence matches are auto-resolved. Low-confidence matches are surfaced for human review.

Entity Resolution in Practice

"Acme Corp" (CRM)

—— high confidence ——→

ent-acme

"acme-corp" (Jira)

—— medium confidence ——→

ent-acme

"Acme" (Slack)

—— low confidence ——→

? human review

Resolution edges carry metadata: confidence score, match method, source system, and expiry time. When a human corrects a resolution, the old edge is expired and a new one is created — the full audit trail is preserved.

Decision Traces

A decision trace is the central record in the Context Graph. It captures not just the output of a decision (approved / denied / escalated) but the full reasoning structure:

Who decided — agent ID, actor type, approval level
What policy applied — policy ID and version at time of decision
What evidence was used — links to specific context fragments
What precedents were cited — links to prior decision traces
What actually happened — outcome status, tracked after the fact
Was it overridden — human correction with reason and timestamp

This is meaningfully different from storing a log entry. A decision trace is a queryable, searchable, embeddable record of organizational reasoning.

Temporal Edges

Edges in the Context Graph are time-aware. Every edge has a valid_from and valid_until timestamp. An edge with valid_until = 0 never expires. An edge with a future valid_until will be automatically excluded from queries after that time.

This matters for entity resolution corrections, policy changes (a decision made under policy v1 should not pretend to have been made under policy v2), and any relationship that has a natural end date.

Hybrid Search for Precedent

Finding relevant precedent is not a pure keyword problem or a pure semantic problem — it requires both. A query like “healthcare customer with multiple SLA breaches requesting discount above standard cap” needs:

Semantic similarity to surface decisions that are conceptually related
Keyword matching to catch specific terms (SLA, discount, healthcare)
Recency weighting to prefer recent decisions over stale ones
Quality weighting to prefer decisions with confirmed good outcomes

A Context Graph uses hybrid ranking — typically Reciprocal Rank Fusion combining multiple signals — to produce a single relevance score. The weights are tunable per query: dial up recency during a crisis, dial up semantic similarity when searching for policy precedent.

Outcome Tracking

Every decision trace has an outcome status that starts as pending and is updated after the fact: successful, failed, partial. This creates a feedback signal that can weight precedent search — decisions that worked out well are more valuable precedent than decisions that didn't.

This is the mechanism by which the system learns from experience. Not automated ML retraining — structured human feedback flowing back into the knowledge graph as queryable, searchable metadata.

High-Level Architecture

Context Graph System Architecture

Operational

Systems

CRM · Ticketing · Monitoring · Slack · Any HTTP source

↓ webhooks / event streams

Ingestion

Layer

Parse & normalize → Extract entity mentions → Resolve to canonical IDs

Output: context_fragment + resolution edges

↓

Context Graph

Store

EntitiesFragmentsDecision Traces

Edges: involves · informed_by · preceded_by · resolved_to · caused_by · supersedes (all temporal: valid_from / valid_until)

unified search
+ graph
+ vector

↓

Agent Layer

1. Gather context

2. Search precedent

3. Evaluate policy

4. Reason + decide

5. Record trace

6. Write edges

Human Review UI

Resolution corrections

Decision overrides

Outcome confirmation

Precedent browsing

Graph visualization

↓

Outcome

Tracking

Did the decision work? Update trace status → feed quality signal back into graph

The store layer is the heart of the system. It needs to do three things simultaneously that most databases handle separately:

Vector similarity search — semantic matching for precedent and entity resolution
Structured attribute filtering — filter by entity type, decision type, policy, date ranges
Graph traversal — multi-hop traversal from any node to its neighbors

This combination is why a specialist engine (like Vespa) is used rather than a pure vector database or a pure graph database. Each alone solves part of the problem.

Side-by-Side Comparison

What each system stores

	Traditional RAG	Graph Database	GraphRAG	Context Graph
Unit of knowledge	Text chunk	Node / Edge	Extracted entity + relation	Decision trace + evidence + entity
Knowledge source	Documents	Manual / ETL	Documents (LLM extraction)	Operational events + agent decisions
Knowledge update	Re-index documents	Manual write / ETL	Re-run extraction pipeline	Real-time event ingestion
Temporal model	Created date only	Static or versioned	Batch snapshot	Edges with `valid_from` / `valid_until`

How each system retrieves

	Traditional RAG	Graph Database	GraphRAG	Context Graph
Retrieval primitive	Vector similarity	Graph traversal	Community + vector + traversal	Hybrid: semantic + keyword + recency + quality
Multi-hop reasoning	Poor	Excellent	Good	Good
Semantic similarity	Excellent	None	Good	Excellent
Structured filtering	Limited	Excellent	Limited	Good
Relevance ranking	Cosine similarity	Not applicable	LLM-reranked	Reciprocal Rank Fusion (tunable)

What each system knows about decisions

	Traditional RAG	Graph Database	GraphRAG	Context Graph
Decision memory	None	If manually written	None	First-class
Decision provenance	None	Partial	None	Full (policy, evidence, approver)
Outcome tracking	None	If manually written	None	Built-in
Human override record	None	If modeled	None	Built-in with audit trail
Precedent chain	None	If modeled	None	Queryable graph

Operational properties

	Traditional RAG	Graph Database	GraphRAG	Context Graph
Setup complexity	Low	Medium	High	High
Domain generality	High	Medium	High	Low — schema-first
Entity resolution	None	Manual	Imperfect (LLM)	Explicit + human-correctable
Feedback loop	None	Manual	None	Outcome tracking
Explainability	Passage citations	Traversal path	Community summaries	Full trace: evidence → reasoning → outcome
Infrastructure	Vector DB	Graph DB	Vector DB + Graph DB + LLM pipeline	Unified search + graph + vector engine

The Decision Flywheel

The most important architectural property of a Context Graph is one that is easy to miss: it gets better the more it is used.

This is not just a feature. It is a fundamentally different relationship between the system and time.

Traditional RAG

Knowledge = documents in corpus

Quality improves by adding more documents

Day 1 → Day 365: same mechanism, no accumulated wisdom

Context Graph

Knowledge = decisions + outcomes + precedent chains

Quality improves with every decision made + every outcome recorded

Day 365: N decisions, ~N outcomes, deep precedent chains — all searchable

Each decision that gets recorded becomes:

Searchable precedent for the next similar situation
A quality signal — did it work? — that weights future searches
An audit record that answers “why did we do this?”
A node in the precedent chain that future decisions can explicitly cite

The flywheel is slow to start. The first hundred decisions do not look dramatically different from a good RAG setup. But at scale — thousands of decisions across hundreds of entity types — the system's ability to surface “we have seen exactly this before, here is what happened” becomes genuinely powerful.

An organization running a Context Graph for a year has meaningfully different institutional memory than one running traditional RAG for a year — even starting from the same document corpus. The difference is that the Context Graph has been actively absorbing the organization's decision-making, not just passively storing its documents.

When to Use Each Approach

Use Traditional RAG when:

Your problem is “find me relevant information from my documents”
You have a static or slowly-changing knowledge corpus
Your users ask questions, not make decisions
You need something working quickly
The domain is too general for schema design upfront

Examples: Internal documentation search, customer support knowledge base, research assistant, code documentation Q&A.

Use a Graph Database when:

Your domain is fundamentally relationship-centric (fraud, supply chain, identity)
You need guaranteed multi-hop traversal with ACID properties
Your data is highly structured and the schema is well-understood
Fine-grained access control on relationships is a requirement
Semantic similarity is not part of your query patterns

Examples: Fraud detection networks, organizational hierarchies, product catalog relationships, knowledge ontologies.

Use GraphRAG when:

You have a large corpus of documents with rich entity relationships buried in text
Users ask questions that require synthesizing information across many documents
You want better answers on “who, what, when, how are these things connected?” questions
You can afford the cost of the entity extraction pipeline and its periodic re-runs

Examples: Scientific literature analysis, legal discovery, intelligence analysis, enterprise knowledge synthesis.

Use a Context Graph when:

Your agent makes recurring decisions in a domain with defined policies
Consistency across decisions matters — the same situation should be handled similarly
Audit and explainability are requirements (regulated industries, high-stakes decisions)
Context comes from multiple operational systems with overlapping entity references
You want the system to improve as it accumulates decisions
A human needs to review, correct, or override agent decisions

Examples: Deal desk and pricing decisions, incident response triage, resource allocation, compliance reviews, hiring decisions, loan underwriting, sprint planning.

The combination in practice:

These approaches are not mutually exclusive. A production system might use:

RAG layer→policy docs, general knowledge

Graph DB layer→canonical entity schema

Context Graph→decision memory, precedent search

The Context Graph sits at the agent reasoning layer, above both the document retrieval layer and the entity model layer. It does not replace the others — it makes them useful for decision-making over time.

What This Is Not

It is worth being explicit about what a Context Graph is not, to avoid overpromising.

It is not a general-purpose knowledge base. If you want to answer “what is the capital of France,” use RAG or just ask the base model. A Context Graph is domain-specific by design — it requires committing to a schema for entity types, decision types, and relationships upfront.

It is not automated machine learning. The system does not retrain models based on outcomes. It makes past decisions and their outcomes searchable and surfaceable to future decision-making. The learning is structured and explicit, not gradient-based.

It is not a replacement for human judgment. The system is designed to augment human decision-making, not replace it. Human review interfaces — resolution correction, decision override, outcome tracking — are first-class parts of the architecture. The goal is to make human experts faster and more consistent, not to remove them.

It is not plug-and-play. Unlike traditional RAG, where you can feed in any corpus and get reasonable results, a Context Graph requires upfront domain modeling. You need to define your entity types, your decision types, your relationship vocabulary, and your source systems. This is an investment. The payoff is a system that actually understands your domain's structure rather than treating it as a bag of text.

Conclusion

Traditional RAG solved an important problem: how do you give a language model access to domain knowledge it was not trained on? That solution works well for a large class of problems, and it will continue to be the right choice for many use cases.

But as organizations move from “AI assistants that answer questions” to “AI agents that make decisions,” a new class of problem emerges. Decisions need to be consistent across time. They need to be explainable. They need to learn from outcomes. They need to respect entity identity across systems. They need an audit trail.

RAG was not designed for any of that. It retrieves passages. It does not remember decisions.

Graph databases model relationships with precision and enforce schema with rigour — but they require structured input, have no concept of semantic similarity, and are not designed to store reasoning chains as first-class objects.

GraphRAG bridges documents and relationships — but the graph is extracted by an LLM, which inherits all the imprecision of that process, and it still has no memory of what was decided or what happened next.

The Context Graph is one answer to the question: what kind of memory does an AI agent need to act like a trustworthy, experienced member of an organization?

The answer is not more documents. It is structured decision memory — the ability to look at a new situation and say:

We have seen something like this before. Here is what we decided. Here is why. Here is what happened.

That is the difference between a system that retrieves information and a system that accumulates institutional wisdom.