The Problem Nobody Talks About
Imagine you are building an AI agent for a sales team. Its job is to evaluate renewal discounts. It needs to consider the customer's history, your company's discount policies, what exceptions have been made before, and whether those exceptions worked out.
You build a RAG pipeline. You feed it policy documents, past deals, CRM notes. It retrieves relevant chunks and makes a recommendation.
It works. Until it doesn't.
Six months in, the agent approves an unusually large discount for a customer. A human reviews it and flags it — this exact same situation happened four months ago with a different customer, the exception was denied, and the account churned anyway. The agent had no idea. It had the documents, but it had no memory of the decision, no knowledge of the outcome, and no way to connect the two customers as belonging to the same risk pattern.
This is the problem nobody talks about when discussing RAG. RAG is great at retrieving information. It is not designed to remember decisions, track outcomes, or reason about institutional experience.
That gap is what a Context Graph is built to fill.
Traditional RAG — What It Is and What It Gets Right
Retrieval-Augmented Generation is one of the most important ideas in applied AI. Before it, language models could only reason about information baked into their weights at training time. RAG changed that by giving models access to external knowledge at inference time.
The core loop is elegantly simple:
The RAG Pipeline
Knowledge
Corpus
Documents, PDFs, web pages — your external knowledge source
Vector
Store
Embeddings + metadata stored for similarity lookup
Retrieved
Chunks
Most relevant passages, selected by cosine similarity
LLM
Generation
“Based on these documents...” → grounded response
What RAG gets right:
- It grounds LLM responses in actual source material, reducing hallucination
- It allows models to reason over private, domain-specific knowledge
- It separates the knowledge store from the model weights, making updates easy
- It is simple to implement and scales well
For a huge class of problems — customer support bots, document Q&A, research assistants, internal knowledge bases — RAG is the right tool. Do not overcomplicate it.
The Five Gaps in Traditional RAG
When you move from “answer questions about documents” to “make consistent decisions over time,” RAG starts to show fundamental limitations.
Gap 1: It retrieves documents, not decisions
RAG's unit of knowledge is a passage — a chunk of text from a document. But decisions are not passages. A decision has a reasoning chain, a policy it was evaluated against, a who approved it, a what happened as a result, and connections to specific entities and events. None of that structure survives chunking.
When you ask a RAG system “what did we decide last time this happened?”, it finds you the document that describes the situation. It cannot find you the decision record with its full context.
Gap 2: Every query is stateless
A RAG system has no memory between queries. Each question is answered in isolation. This is fine for Q&A. It is fatal for agents that need to act consistently. The same agent asked the same question on different days may give different answers, with no audit trail to compare them.
Gap 3: Entity identity is invisible
In the real world, “Acme Corp” in your CRM, “acme-corporation” in your ticketing system, “Acme” in your Slack, and “ACME” in your billing system are all the same entity. Traditional RAG treats them as four different strings. When you retrieve context, you get fragmented, siloed pictures of the same company because the system has no concept of entity identity across sources.
Gap 4: No feedback loop
Did the recommendation work? Did the decision have good outcomes? Did the human override the agent's suggestion? RAG systems have no mechanism to capture any of this. The knowledge store is static — it grows when you add documents, but it never learns from what happened after retrieval.
Gap 5: Time is flat
Documents have a created date, but RAG treats all knowledge as equally current. A pricing policy from eighteen months ago and a pricing policy from last week look the same to the retriever. There is no native concept of “this information expired” or “this decision was later overridden.”
Graph Databases — A Different Lens
Graph databases (Neo4j, Amazon Neptune, and others) model the world differently from both relational databases and vector stores. Instead of tables or vectors, the fundamental primitives are nodes (things) and edges (relationships between things).
Relationship-First Data Model
Graph databases shine at relationship traversal — questions like:
- Which customers are affected by this infrastructure failure?
- What is the shortest path between these two entities?
- Find all entities within two hops of this node that share this property
What graph databases get right:
- Relationship-first data modeling that matches how domains actually work
- Efficient multi-hop traversal without expensive joins
- Explicit, auditable schema for entity types and relationship types
- Strong consistency and transactional guarantees
Where they fall short for AI agents:
- They do not natively support semantic similarity search. You cannot ask “find me nodes conceptually similar to this situation” — you can only traverse edges.
- Populating them requires explicit, structured input. They do not ingest unstructured text from Slack, emails, or incident notes.
- They have no concept of embedding-based ranking or relevance scoring — every result is equally “found” or “not found.”
- They are not designed to store reasoning chains or model outputs as first-class objects.
A graph database tells you what is connected. It does not tell you what is relevant.
GraphRAG — The Bridge That Almost Works
Microsoft Research's GraphRAG (2024) was a genuine advance. The insight was: if you extract entities and relationships from your document corpus and build a knowledge graph from them, you can answer questions that require multi-hop reasoning — the kind of questions that pure vector similarity search fails at.
The GraphRAG Pipeline
Source
Documents
Raw document corpus — PDFs, articles, reports
Extraction
Entity + Relation
LLM identifies named entities, relationships, and community clusters
inferred,
not ground truth
Knowledge
Graph
Extracted entities + relationships, batch snapshot
LLM
Answer
Synthesized answer with multi-hop reasoning
What GraphRAG gets right:
- Answers questions that require connecting information across many documents
- Community detection surfaces themes and clusters that pure retrieval misses
- The graph structure makes reasoning chains more explainable
Where GraphRAG still falls short:
The fundamental issue is that the graph is derived from documents by an LLM. That means:
- Entity extraction is imperfect — the same real-world entity may appear as multiple nodes
- Relationships are inferred, not ground truth — they carry the hallucination risk of the extraction model
- It is a batch process — run once over your corpus, not updated in real time from operational systems
- It still has no memory of decisions, no outcome tracking, no feedback loop
- It answers questions about what is in your documents, not what your organization has decided and learned
GraphRAG is excellent for knowledge synthesis over a large document corpus. It is not designed for operational decision memory.
The Context Graph — A New Mental Model
The Context Graph starts from a different question: not “what information is relevant?” but “what has my organization decided, in similar situations, and what happened?”
This reframes the problem entirely. The unit of knowledge is not a document or a passage. It is a decision — a structured record of what situation prompted it, what context was gathered, what policy was applied, what was decided and by whom, what precedents were cited, what actually happened afterward, and whether a human later overrode it.
Surrounding each decision is a graph of relationships to the entities involved, the context fragments that informed it, and the precedent decisions it built on. The result is an organizational memory that gets smarter with use.
How a Context Graph Handles a Decision
Context Fragments
Gather signals from live systems: CRM note, PD alert, Slack thread, ticket
Entity Graph
Resolve all mentions to canonical entities — “Acme Corp” is one node across all systems
Decision Traces
Search for precedent: “Similar to Beta Inc 6 months ago — approved at 15%, churned anyway”
New Decision Trace
Agent reasons and decides — new trace written with edges to entities, fragments, and precedents
Outcome Tracking
Did it work? Outcome feeds back into the graph as a quality signal for future searches
Core Concepts of a Context Graph
Context Fragments
A context fragment is a structured capture of a signal from an operational system — a support ticket, a Slack message, an incident alert, a CRM note. Unlike a RAG chunk, a fragment retains its provenance: which system it came from, when, what entities it mentions, and how severe it is.
Fragments are the raw evidence layer. They are never directly shown to the LLM as retrieved passages. Instead, they flow through entity resolution and then attach to decisions as supporting evidence.
Entity Resolution
The cross-system identity problem is one of the hardest and most underappreciated problems in enterprise AI. “Acme Corp” in Salesforce, “acme-corp” in Jira, and “Acme” in PagerDuty are the same company. An AI agent that does not know this will give incomplete, inconsistent context.
Entity resolution maps raw mentions to canonical entity records, with a confidence score. High-confidence matches are auto-resolved. Low-confidence matches are surfaced for human review.
Entity Resolution in Practice
Resolution edges carry metadata: confidence score, match method, source system, and expiry time. When a human corrects a resolution, the old edge is expired and a new one is created — the full audit trail is preserved.
Decision Traces
A decision trace is the central record in the Context Graph. It captures not just the output of a decision (approved / denied / escalated) but the full reasoning structure:
- Who decided — agent ID, actor type, approval level
- What policy applied — policy ID and version at time of decision
- What evidence was used — links to specific context fragments
- What precedents were cited — links to prior decision traces
- What actually happened — outcome status, tracked after the fact
- Was it overridden — human correction with reason and timestamp
This is meaningfully different from storing a log entry. A decision trace is a queryable, searchable, embeddable record of organizational reasoning.
Temporal Edges
Edges in the Context Graph are time-aware. Every edge has a valid_from and valid_until timestamp. An edge with valid_until = 0 never expires. An edge with a future valid_until will be automatically excluded from queries after that time.
This matters for entity resolution corrections, policy changes (a decision made under policy v1 should not pretend to have been made under policy v2), and any relationship that has a natural end date.
Hybrid Search for Precedent
Finding relevant precedent is not a pure keyword problem or a pure semantic problem — it requires both. A query like “healthcare customer with multiple SLA breaches requesting discount above standard cap” needs:
- Semantic similarity to surface decisions that are conceptually related
- Keyword matching to catch specific terms (SLA, discount, healthcare)
- Recency weighting to prefer recent decisions over stale ones
- Quality weighting to prefer decisions with confirmed good outcomes
A Context Graph uses hybrid ranking — typically Reciprocal Rank Fusion combining multiple signals — to produce a single relevance score. The weights are tunable per query: dial up recency during a crisis, dial up semantic similarity when searching for policy precedent.
Outcome Tracking
Every decision trace has an outcome status that starts as pending and is updated after the fact: successful, failed, partial. This creates a feedback signal that can weight precedent search — decisions that worked out well are more valuable precedent than decisions that didn't.
This is the mechanism by which the system learns from experience. Not automated ML retraining — structured human feedback flowing back into the knowledge graph as queryable, searchable metadata.
High-Level Architecture
Context Graph System Architecture
Operational
Systems
CRM · Ticketing · Monitoring · Slack · Any HTTP source
Ingestion
Layer
Parse & normalize → Extract entity mentions → Resolve to canonical IDs
Output: context_fragment + resolution edges
Context Graph
Store
Edges: involves · informed_by · preceded_by · resolved_to · caused_by · supersedes (all temporal: valid_from / valid_until)
unified search
+ graph
+ vector
Agent Layer
1. Gather context
2. Search precedent
3. Evaluate policy
4. Reason + decide
5. Record trace
6. Write edges
Human Review UI
Resolution corrections
Decision overrides
Outcome confirmation
Precedent browsing
Graph visualization
Outcome
Tracking
Did the decision work? Update trace status → feed quality signal back into graph
The store layer is the heart of the system. It needs to do three things simultaneously that most databases handle separately:
- Vector similarity search — semantic matching for precedent and entity resolution
- Structured attribute filtering — filter by entity type, decision type, policy, date ranges
- Graph traversal — multi-hop traversal from any node to its neighbors
This combination is why a specialist engine (like Vespa) is used rather than a pure vector database or a pure graph database. Each alone solves part of the problem.
Side-by-Side Comparison
What each system stores
| Traditional RAG | Graph Database | GraphRAG | Context Graph | |
|---|---|---|---|---|
| Unit of knowledge | Text chunk | Node / Edge | Extracted entity + relation | Decision trace + evidence + entity |
| Knowledge source | Documents | Manual / ETL | Documents (LLM extraction) | Operational events + agent decisions |
| Knowledge update | Re-index documents | Manual write / ETL | Re-run extraction pipeline | Real-time event ingestion |
| Temporal model | Created date only | Static or versioned | Batch snapshot | Edges with valid_from / valid_until |
How each system retrieves
| Traditional RAG | Graph Database | GraphRAG | Context Graph | |
|---|---|---|---|---|
| Retrieval primitive | Vector similarity | Graph traversal | Community + vector + traversal | Hybrid: semantic + keyword + recency + quality |
| Multi-hop reasoning | Poor | Excellent | Good | Good |
| Semantic similarity | Excellent | None | Good | Excellent |
| Structured filtering | Limited | Excellent | Limited | Good |
| Relevance ranking | Cosine similarity | Not applicable | LLM-reranked | Reciprocal Rank Fusion (tunable) |
What each system knows about decisions
| Traditional RAG | Graph Database | GraphRAG | Context Graph | |
|---|---|---|---|---|
| Decision memory | None | If manually written | None | First-class |
| Decision provenance | None | Partial | None | Full (policy, evidence, approver) |
| Outcome tracking | None | If manually written | None | Built-in |
| Human override record | None | If modeled | None | Built-in with audit trail |
| Precedent chain | None | If modeled | None | Queryable graph |
Operational properties
| Traditional RAG | Graph Database | GraphRAG | Context Graph | |
|---|---|---|---|---|
| Setup complexity | Low | Medium | High | High |
| Domain generality | High | Medium | High | Low — schema-first |
| Entity resolution | None | Manual | Imperfect (LLM) | Explicit + human-correctable |
| Feedback loop | None | Manual | None | Outcome tracking |
| Explainability | Passage citations | Traversal path | Community summaries | Full trace: evidence → reasoning → outcome |
| Infrastructure | Vector DB | Graph DB | Vector DB + Graph DB + LLM pipeline | Unified search + graph + vector engine |
The Decision Flywheel
The most important architectural property of a Context Graph is one that is easy to miss: it gets better the more it is used.
This is not just a feature. It is a fundamentally different relationship between the system and time.
Traditional RAG
Context Graph
Each decision that gets recorded becomes:
- Searchable precedent for the next similar situation
- A quality signal — did it work? — that weights future searches
- An audit record that answers “why did we do this?”
- A node in the precedent chain that future decisions can explicitly cite
The flywheel is slow to start. The first hundred decisions do not look dramatically different from a good RAG setup. But at scale — thousands of decisions across hundreds of entity types — the system's ability to surface “we have seen exactly this before, here is what happened” becomes genuinely powerful.
An organization running a Context Graph for a year has meaningfully different institutional memory than one running traditional RAG for a year — even starting from the same document corpus. The difference is that the Context Graph has been actively absorbing the organization's decision-making, not just passively storing its documents.
When to Use Each Approach
Use Traditional RAG when:
- Your problem is “find me relevant information from my documents”
- You have a static or slowly-changing knowledge corpus
- Your users ask questions, not make decisions
- You need something working quickly
- The domain is too general for schema design upfront
Examples: Internal documentation search, customer support knowledge base, research assistant, code documentation Q&A.
Use a Graph Database when:
- Your domain is fundamentally relationship-centric (fraud, supply chain, identity)
- You need guaranteed multi-hop traversal with ACID properties
- Your data is highly structured and the schema is well-understood
- Fine-grained access control on relationships is a requirement
- Semantic similarity is not part of your query patterns
Examples: Fraud detection networks, organizational hierarchies, product catalog relationships, knowledge ontologies.
Use GraphRAG when:
- You have a large corpus of documents with rich entity relationships buried in text
- Users ask questions that require synthesizing information across many documents
- You want better answers on “who, what, when, how are these things connected?” questions
- You can afford the cost of the entity extraction pipeline and its periodic re-runs
Examples: Scientific literature analysis, legal discovery, intelligence analysis, enterprise knowledge synthesis.
Use a Context Graph when:
- Your agent makes recurring decisions in a domain with defined policies
- Consistency across decisions matters — the same situation should be handled similarly
- Audit and explainability are requirements (regulated industries, high-stakes decisions)
- Context comes from multiple operational systems with overlapping entity references
- You want the system to improve as it accumulates decisions
- A human needs to review, correct, or override agent decisions
Examples: Deal desk and pricing decisions, incident response triage, resource allocation, compliance reviews, hiring decisions, loan underwriting, sprint planning.
The combination in practice:
These approaches are not mutually exclusive. A production system might use:
The Context Graph sits at the agent reasoning layer, above both the document retrieval layer and the entity model layer. It does not replace the others — it makes them useful for decision-making over time.
What This Is Not
It is worth being explicit about what a Context Graph is not, to avoid overpromising.
It is not a general-purpose knowledge base. If you want to answer “what is the capital of France,” use RAG or just ask the base model. A Context Graph is domain-specific by design — it requires committing to a schema for entity types, decision types, and relationships upfront.
It is not automated machine learning. The system does not retrain models based on outcomes. It makes past decisions and their outcomes searchable and surfaceable to future decision-making. The learning is structured and explicit, not gradient-based.
It is not a replacement for human judgment. The system is designed to augment human decision-making, not replace it. Human review interfaces — resolution correction, decision override, outcome tracking — are first-class parts of the architecture. The goal is to make human experts faster and more consistent, not to remove them.
It is not plug-and-play. Unlike traditional RAG, where you can feed in any corpus and get reasonable results, a Context Graph requires upfront domain modeling. You need to define your entity types, your decision types, your relationship vocabulary, and your source systems. This is an investment. The payoff is a system that actually understands your domain's structure rather than treating it as a bag of text.
Conclusion
Traditional RAG solved an important problem: how do you give a language model access to domain knowledge it was not trained on? That solution works well for a large class of problems, and it will continue to be the right choice for many use cases.
But as organizations move from “AI assistants that answer questions” to “AI agents that make decisions,” a new class of problem emerges. Decisions need to be consistent across time. They need to be explainable. They need to learn from outcomes. They need to respect entity identity across systems. They need an audit trail.
RAG was not designed for any of that. It retrieves passages. It does not remember decisions.
Graph databases model relationships with precision and enforce schema with rigour — but they require structured input, have no concept of semantic similarity, and are not designed to store reasoning chains as first-class objects.
GraphRAG bridges documents and relationships — but the graph is extracted by an LLM, which inherits all the imprecision of that process, and it still has no memory of what was decided or what happened next.
The Context Graph is one answer to the question: what kind of memory does an AI agent need to act like a trustworthy, experienced member of an organization?
The answer is not more documents. It is structured decision memory — the ability to look at a new situation and say:
We have seen something like this before. Here is what we decided. Here is why. Here is what happened.
That is the difference between a system that retrieves information and a system that accumulates institutional wisdom.