RAG Architecture for Security
Security RAG systems require specialized components to handle the unique characteristics of security data:| Component | Security Consideration | Implementation |
|---|---|---|
| Document Ingestion | Sensitive data handling, access control | Encrypted storage, RBAC on indices |
| Embedding Model | Domain-specific understanding | Security-tuned or fine-tuned models |
| Vector Store | Query performance, data isolation | Tenant-separated indices |
| Retrieval Strategy | Precision for security decisions | Hybrid search, reranking |
| Generation | Hallucination prevention | Grounded generation, citation |
Retrieval Strategies
Effective RAG for security requires retrieval strategies that handle the specialized vocabulary, temporal sensitivity, and precision requirements of security data.Hybrid Search
Hybrid search combines semantic (vector) search with keyword (lexical) search to capture both conceptual similarity and exact technical matches. Security terminology—CVE IDs, IP addresses, hash values—requires exact matching that pure semantic search may miss.| Search Component | Strength | Security Use Case | Implementation |
|---|---|---|---|
| Semantic search | Conceptual similarity | Finding related threats, similar incidents | Embedding models + vector DB |
| Keyword search (BM25) | Exact matching | IOC lookup, CVE search | Full-text index |
| Hybrid fusion | Combined strengths | General security queries | Reciprocal rank fusion |
| Filtered search | Constrained results | Tenant isolation, date ranges | Metadata filters |
- Query-dependent weighting — Adjust semantic vs. keyword weight based on query type (technical IOC vs. conceptual question)
- Reciprocal Rank Fusion (RRF) — Combine rankings using
1/(k + rank)formula per Weaviate hybrid search - Learned weighting — Train model to optimize weights for security queries
- Fallback strategy — Use keyword search when semantic results have low confidence
Multi-Index Retrieval
Security RAG systems typically query multiple specialized indices to gather comprehensive context:| Index Type | Content | Update Frequency | Access Pattern |
|---|---|---|---|
| Threat Intelligence | IOCs, TTPs, threat actors | Real-time to hourly | High-precision lookup |
| Documentation | Runbooks, policies, procedures | Weekly to monthly | Conceptual search |
| Incident History | Past incidents, investigations | Per-incident | Similar case retrieval |
| Vulnerability Data | CVEs, exploits, patches | Daily | CVE ID lookup |
| Asset Inventory | Systems, owners, criticality | Continuous | Asset context enrichment |
| Configuration | Security configs, baselines | On change | Policy verification |
- Parallel query — Query all relevant indices simultaneously
- Cascading query — Query primary index first, expand to secondary if insufficient
- Query routing — Route queries to appropriate indices based on intent classification
- Result fusion — Merge results from multiple indices with source attribution
Contextual Retrieval
Contextual retrieval incorporates current operational context to improve relevance. Anthropic’s contextual retrieval research demonstrates significant accuracy improvements.| Context Type | Information Included | Retrieval Impact |
|---|---|---|
| User context | Role, expertise, access level | Filter by permission, adjust complexity |
| Alert context | Current alerts, investigation state | Prioritize related documents |
| Environment context | Network topology, asset inventory | Include relevant infrastructure docs |
| Temporal context | Current date, time zone, business hours | Time-decay weighting |
| Session context | Recent queries, viewed documents | Conversation continuity |
- Context extraction — Capture relevant context from user session and environment
- Query augmentation — Append context to query or use as filter
- Dynamic reranking — Boost results matching current context
- Context windowing — Include recent conversation for multi-turn queries
Temporal-Aware Retrieval
Security data has strong temporal characteristics—recent threat intelligence is more relevant than historical, but incident patterns may repeat.| Temporal Strategy | Use Case | Time Decay Function | Implementation |
|---|---|---|---|
| Recency boost | Threat intelligence | Exponential decay | Boost recent IOCs |
| Periodic patterns | Attack timing | Cyclical weighting | Detect recurring threats |
| Event windows | Incident investigation | Hard cutoff | Focus on incident timeframe |
| Version awareness | Documentation | Latest version priority | Prefer current procedures |
- Exponential decay — Weight = e^(-λ × age), λ controls decay rate
- Step function — Full weight for recent, reduced for older
- Logarithmic decay — Gradual reduction: weight = 1 / log(1 + age)
- Custom windows — Different decay rates for different content types
Security Knowledge Sources
Threat Intelligence Integration
Integrating threat intelligence feeds into RAG systems enables real-time enrichment of security queries. Key sources include MISP, STIX/TAXII, and commercial feeds.| TI Source Type | Integration Method | Update Frequency | Use Case |
|---|---|---|---|
| STIX/TAXII feeds | Scheduled polling, TAXII client | Hourly to daily | IOC enrichment |
| MISP communities | API integration | Real-time | Collaborative TI sharing |
| Commercial feeds | API integration | Real-time to hourly | Premium IOC data |
| OSINT feeds | Web scraping, RSS | Varies | Broad coverage |
| Internal TI | Direct ingestion | On production | Organizational IOCs |
- Ingestion pipeline — Parse, normalize, and deduplicate TI data
- Confidence scoring — Weight by source reliability and age
- Relationship mapping — Link related IOCs, TTPs, and actors
- Expiration handling — Remove or demote stale intelligence
Runbook and Playbook Retrieval
Security runbooks and playbooks provide procedural guidance for incident response. Effective retrieval matches operational needs to documented procedures.| Document Type | Retrieval Strategy | Matching Criteria |
|---|---|---|
| Incident playbooks | Alert type + severity matching | Detection rule, attack type |
| Remediation runbooks | Vulnerability + system matching | CVE, platform, component |
| Escalation procedures | Severity + SLA matching | Incident classification |
| Communication templates | Incident type + audience | Stakeholder type, notification need |
Vulnerability Database Access
Integrating vulnerability databases like NVD, CVE, and OSV enables contextual vulnerability information.| Database | Content | Access Method | Best For |
|---|---|---|---|
| NVD | CVSS scores, references, CPE | API, bulk download | Comprehensive CVE data |
| CVE Program | CVE IDs, descriptions | API, GitHub | Authoritative CVE info |
| OSV | Open source vulnerabilities | API | Dependency scanning |
| Exploit-DB | PoC exploits | API, search | Exploitability assessment |
| VulDB | Vendor-specific vulns | API | Specialized vulns |
Incident History Search
Historical incident data enables pattern recognition and similar case retrieval for investigation support.| Search Capability | Value | Implementation |
|---|---|---|
| Similar incident retrieval | Learn from past responses | Semantic similarity on incident summaries |
| IOC history | Track indicator recurrence | Exact match on IOC values |
| Attack pattern matching | Identify campaign activity | TTP vector similarity |
| Resolution search | Find effective remediations | Outcome-based filtering |
Advanced Techniques
Query Transformation
Query transformation improves retrieval by reformulating user queries for better matching.| Transformation Type | Technique | Use Case |
|---|---|---|
| Query expansion | Add synonyms, related terms | Broaden recall for vague queries |
| Query decomposition | Split complex queries | Multi-part questions |
| Hypothetical document | Generate ideal answer, search for similar | HyDE technique |
| Step-back prompting | Generalize then specify | Abstract concept queries |
| Query rewriting | LLM-based reformulation | Clarify ambiguous queries |
- Intent classification — Determine query type and complexity
- Transformation selection — Choose appropriate technique
- Multiple query generation — Create transformed variants
- Parallel retrieval — Search with all variants
- Result fusion — Combine and deduplicate results
Reranking and Filtering
Reranking improves result quality after initial retrieval using cross-encoder models or LLM-based scoring.| Reranking Approach | Accuracy | Latency | Best For |
|---|---|---|---|
| Cross-encoder reranking | High | Medium | Quality-critical applications |
| LLM-based reranking | Very High | High | Complex relevance judgments |
| Metadata filtering | N/A | Very Low | Hard constraints (date, access) |
| Diversity reranking | Medium | Low | Broad coverage needs |
| Reciprocal rank fusion | Medium | Low | Combining multiple retrievers |
- Source authority — Prioritize authoritative sources (NIST, vendor advisories)
- Temporal relevance — Boost recent for evolving threats
- Specificity — Prefer specific over general documents
- Actionability — Prioritize documents with concrete guidance
Multi-Hop Reasoning
Multi-hop RAG chains multiple retrieval steps to answer complex security questions that require connecting information across documents.| Hop Pattern | Example Query | Retrieval Chain |
|---|---|---|
| Entity linking | ”What vulnerabilities affect our web servers?” | Asset lookup → Vulnerability matching |
| Causal chain | ”How did the attacker gain access?” | Initial access → Persistence → Lateral movement |
| Comparative | ”Compare this IOC to past incidents” | IOC lookup → Similar incident retrieval |
| Aggregation | ”Summarize all critical findings” | Retrieve all → Filter critical → Aggregate |
- Iterative retrieval — Use each answer to formulate next query
- Graph traversal — Follow relationships in knowledge graph
- Query planning — LLM generates retrieval plan upfront
- Result accumulation — Combine evidence across hops
Source Attribution
Source attribution ensures every claim can be traced to authoritative sources, critical for security decision-making.| Attribution Level | Implementation | Verification |
|---|---|---|
| Document-level | Cite source document | Link to original |
| Passage-level | Quote specific passage | Inline citation |
| Fact-level | Attribute each claim | Statement-source mapping |
| Confidence-level | Source reliability score | Weighted attribution |
- Inline citations — Include source reference with each factual claim
- Source diversity — Note when multiple sources confirm information
- Recency indication — Include publication/last-update dates
- Authority markers — Distinguish official vs. community sources
Implementation Patterns
Chunking Strategies for Security Documents
Effective chunking preserves semantic coherence while optimizing for retrieval. Security documents have unique structural characteristics that inform chunking strategy.| Document Type | Chunking Approach | Chunk Size | Overlap | Rationale |
|---|---|---|---|---|
| Incident reports | Section-based | 500-1000 tokens | 100 tokens | Preserve incident phase coherence |
| Runbooks | Step-based | 200-400 tokens | 50 tokens | Keep procedural steps intact |
| Threat intel | Object-based | Variable | None | Preserve IOC/TTP boundaries |
| Policies | Paragraph-based | 300-500 tokens | 75 tokens | Maintain policy clause integrity |
| Log documentation | Field-based | 100-200 tokens | 20 tokens | Preserve field definitions |
| CVE entries | Entry-based | Variable | None | Keep CVE as atomic unit |
- Semantic boundaries — Split at natural document boundaries (headings, sections, paragraphs)
- Metadata preservation — Attach source document, section, and hierarchy information
- Overlap strategy — Include context from preceding chunk to maintain continuity
- Size optimization — Balance between context completeness and retrieval precision
- Special handling — Preserve tables, code blocks, and structured content as units
Embedding Model Selection
Embedding model choice significantly impacts retrieval quality. Consider domain-specific models for security applications.| Model Category | Examples | Strengths | Considerations |
|---|---|---|---|
| General-purpose | OpenAI text-embedding-3-large, Cohere embed-v3 | Broad coverage, easy deployment | May miss security nuances |
| Security-tuned | Fine-tuned on security corpora | Security terminology understanding | Requires training data |
| Multilingual | Cohere multilingual, E5-multilingual | International threat intel | Increased dimensionality |
| Sentence transformers | all-MiniLM-L6-v2, all-mpnet-base-v2 | Open source, customizable | May need fine-tuning |
| Instruction-tuned | GTE, BGE, E5-instruct | Query-document optimization | Task-specific prompts |
- Domain coverage — Test on security-specific queries and documents
- Dimensionality — Balance between expressiveness and storage/compute cost
- Latency — Consider real-time retrieval requirements
- Privacy — Evaluate data handling for sensitive security content
- Fine-tuning options — Ability to improve on security-specific data
Index Architecture
Index architecture determines retrieval performance, scalability, and operational characteristics.| Index Pattern | Use Case | Advantages | Trade-offs |
|---|---|---|---|
| Single index | Small knowledge base | Simple, low latency | Limited scalability |
| Partitioned index | Multi-tenant deployment | Tenant isolation | Management overhead |
| Hierarchical index | Large document sets | Efficient coarse-to-fine search | Complex implementation |
| Federated index | Multiple data sources | Source-specific optimization | Query coordination |
| Hybrid index | Mixed search requirements | Semantic + keyword | Dual infrastructure |
- Tenant isolation — Separate indices or row-level security for multi-tenant
- Access control — Metadata filters for permission enforcement
- Update strategy — Real-time vs. batch indexing based on freshness needs
- Backup and recovery — Index rebuilding procedures and RTO
- Scaling approach — Horizontal partitioning for large indices
Quality and Evaluation
Systematic evaluation ensures RAG system quality and enables continuous improvement. Use evaluation frameworks like Ragas and LlamaIndex evaluation.| Metric | Description | Target | Measurement Method |
|---|---|---|---|
| Retrieval Precision | Relevant documents in top-k results | > 80% | Human annotation, ground truth |
| Retrieval Recall | Relevant documents found | > 90% | Known-item search evaluation |
| Answer Accuracy | Correctness of generated responses | > 95% | Expert evaluation |
| Answer Faithfulness | Response grounded in retrieved context | > 95% | NLI-based verification |
| Source Attribution | Responses properly cite sources | 100% | Automatic citation verification |
| Latency P50/P95 | End-to-end response time | < 3s / < 5s | Performance monitoring |
| Hallucination Rate | Unsupported claims in responses | < 1% | Source verification |
| Context Relevance | Retrieved context usefulness | > 85% | LLM-based scoring |
- Ground truth creation — Build evaluation dataset with queries and expected answers
- Retrieval evaluation — Measure precision, recall, and ranking quality
- Generation evaluation — Assess answer accuracy, faithfulness, and completeness
- End-to-end evaluation — Measure full system performance on security tasks
- Regression testing — Detect quality degradation with system changes
Security Considerations
Security RAG systems require careful attention to data protection and access control.| Security Concern | Risk | Mitigation | Monitoring |
|---|---|---|---|
| Data isolation | Cross-tenant data leakage | Separate indices, access filters | Query audit logging |
| Access control | Unauthorized document access | Document-level permissions | Permission violation alerts |
| Prompt injection | Malicious retrieved content | Content sanitization | Injection pattern detection |
| Sensitive data exposure | PII/secrets in responses | Output filtering, redaction | Sensitive data detection |
| Index poisoning | Malicious document ingestion | Content validation, source verification | Anomaly detection |
| Model extraction | Embedding theft | Rate limiting, monitoring | Query pattern analysis |
- Document-level ACLs — Enforce permissions at retrieval time
- Query sanitization — Clean user queries before processing
- Content filtering — Remove or redact sensitive data from retrieved content
- Audit logging — Log all queries and retrieved documents
- Encryption — Encrypt indices at rest and in transit
- Rate limiting — Prevent abuse and extraction attacks
Anti-Patterns to Avoid
- Stale indices — Threat intelligence must be continuously updated. Implement real-time or near-real-time indexing for dynamic content.
- Over-reliance on semantic search — Security terms like CVE IDs, IP addresses, and hashes require exact matching. Always implement hybrid search.
- Ignoring source authority — Not all sources are equally trustworthy. Implement source weighting and clearly distinguish authoritative vs. community sources.
- Unbounded retrieval — Too much context degrades generation quality and increases latency. Implement appropriate top-k limits and relevance thresholds.
- Missing metadata — Documents without proper metadata (date, source, classification) cannot be properly filtered or attributed. Enforce metadata requirements.
- Monolithic indices — Single large indices create scaling and isolation challenges. Partition by tenant, data type, or update frequency.
- Ignoring evaluation — Without systematic evaluation, quality degradation goes unnoticed. Implement continuous evaluation pipelines.
Tools and Frameworks
| Tool | Purpose | Integration |
|---|---|---|
| LlamaIndex | RAG framework, indexing, retrieval | Python SDK |
| LangChain | RAG chains, retrievers | Python SDK |
| Pinecone | Managed vector database | Cloud API |
| Weaviate | Open source vector database | Self-hosted, Cloud |
| Chroma | Lightweight vector store | Python embedded |
| Qdrant | High-performance vector database | Self-hosted, Cloud |
| Ragas | RAG evaluation framework | Python SDK |
| Cohere Rerank | Neural reranking | API |
References
- LlamaIndex RAG Documentation
- LangChain RAG Tutorial
- Anthropic RAG Best Practices
- Anthropic Contextual Retrieval
- Pinecone Vector Database
- Weaviate Hybrid Search
- MITRE ATT&CK Knowledge Base
- NIST National Vulnerability Database
- CVE Program
- OSV - Open Source Vulnerabilities
- MISP Threat Intelligence Platform
- STIX/TAXII Cyber Threat Intelligence
- Ragas RAG Evaluation
- Sentence Transformers
- OpenAI Embeddings Guide
- Cohere Embed Models

