Skip to main content

Documentation Index

Fetch the complete documentation index at: https://threatbasis.io/llms.txt

Use this file to discover all available pages before exploring further.

Security incidents are inevitable in modern organizations—the question is not whether incidents will occur but how effectively organizations respond when they do. Security engineers design incident response as a practiced discipline with clear roles, documented procedures, and measurable outcomes rather than ad-hoc crisis management. Effective incident response minimizes damage, preserves evidence, and enables learning that prevents future incidents. According to IBM’s Cost of a Data Breach Report, organizations with incident response teams and regularly tested IR plans save an average of $2.66 million per breach compared to those without. Incident response excellence requires preparation through planning, training, and tooling before incidents occur. Organizations that treat incident response as a practiced sport with regular exercises and continuous improvement respond more effectively to real incidents while maintaining team composure under pressure.

Incident Response Lifecycle

The incident response lifecycle, as defined by NIST SP 800-61, comprises interconnected phases that guide response efforts from detection through recovery.
PhasePrimary ObjectiveKey ActivitiesSuccess Criteria
PreparationBuild response capabilityPlanning, tooling, training, exercisesTested playbooks, trained team
IdentificationDetect and validate incidentsMonitoring, triage, classificationAccurate detection, low false positives
ContainmentLimit incident impactIsolation, blocking, evidence preservationDamage limited, evidence preserved
EradicationRemove attacker presenceMalware removal, patching, credential rotationNo attacker persistence
RecoveryRestore normal operationsSystem restoration, validation, monitoringVerified clean systems
Lessons LearnedImprove future responsePost-incident review, action trackingDocumented improvements

Lifecycle Phase Details

Preparation establishes capabilities before incidents occur through planning, tooling, training, and exercises. This phase determines response effectiveness for all subsequent phases. Identification detects and validates potential security incidents, distinguishing genuine incidents from false positives. Effective identification relies on comprehensive monitoring, trained analysts, and clear escalation criteria. Containment limits incident impact and prevents further damage while preserving evidence. Containment strategies balance speed with evidence preservation and business continuity. Eradication removes attacker presence and closes vulnerabilities that enabled the incident. Thorough eradication prevents attackers from maintaining persistence or quickly recompromising systems. Recovery restores normal operations with validation that systems are clean and secure. Phased recovery with enhanced monitoring detects any recompromise attempts. Lessons Learned captures insights that improve future incident response and prevent similar incidents. This phase closes the loop, feeding improvements back into preparation.

Severity Classification

Incident severity determines response urgency, resource allocation, escalation paths, and communication requirements. Define severity through business-aligned impact criteria rather than purely technical metrics.
SeverityBusiness ImpactResponse TimeEscalationExample Scenarios
Critical (P1)Existential threat, major data breach, complete outageImmediate (< 15 min)Executive, Legal, BoardRansomware, major data exfiltration, critical infrastructure compromise
High (P2)Significant data exposure, major service degradation< 1 hourDirector, Security LeadCredential compromise, targeted attack, significant malware
Medium (P3)Limited data exposure, partial service impact< 4 hoursManager, On-callPhishing success, limited malware, policy violations
Low (P4)Minimal impact, no data exposure< 24 hoursTeam LeadFailed attacks, minor policy violations, suspicious activity
Organizations should err toward declaring incidents early with higher severity—downgrading severity is easier than explaining delayed response to severe incidents.

Preparation

Preparation is the foundation of effective incident response. Organizations that invest in preparation respond faster, more effectively, and with less stress during actual incidents.

Incident Response Team Structure

Incident response requires clear roles with defined responsibilities and decision authority:
RoleResponsibilitiesDecision Authority
Incident CommanderOverall coordination, resource allocation, status reportingEscalation, resource requests, incident closure
Technical LeadInvestigation direction, technical analysis, remediation oversightTechnical containment actions, tool deployment
Communications LeadInternal/external messaging, stakeholder updates, media coordinationCommunication timing and content (with legal review)
Scribe/DocumenterTimeline maintenance, action tracking, evidence loggingDocumentation format and completeness
Subject Matter ExpertsSystem-specific knowledge, threat intelligence, forensic analysisTechnical recommendations within expertise
On-call and escalation requirements:
  • 24/7 coverage: Maintain on-call rotations with clear escalation paths and current contact information
  • Decision rights documentation: Clarify who can authorize containment actions, customer communications, or law enforcement engagement
  • Backup personnel: Ensure backup responders for all critical roles to handle extended incidents or unavailability
  • Cross-training: Train team members on multiple roles to ensure flexibility during incidents

Pre-Staged Access and Tooling

Incident responders need pre-staged access to systems, logs, and tools before incidents occur. Waiting to provision access during an active incident wastes critical time. Access requirements:
  • Break-glass accounts: Emergency access procedures that enable rapid access without compromising security, with comprehensive audit logging
  • Pre-authorized access: Standing access to critical systems, log aggregation platforms, and security tools
  • Cloud API access: Pre-configured credentials for cloud provider APIs enabling snapshot creation, log retrieval, and resource isolation
  • Third-party access: Pre-established relationships with forensic vendors, legal counsel, and law enforcement contacts
Essential IR tooling:
Tool CategoryPurposeExample Tools
SIEM/Log AnalysisCentralized log search and correlationSplunk, Elastic Security, Microsoft Sentinel
EDR/XDREndpoint detection, investigation, responseCrowdStrike Falcon, Microsoft Defender, SentinelOne
Memory ForensicsVolatile data capture and analysisVolatility, Rekall, AVML
Disk ForensicsDisk imaging and analysisFTK Imager, Autopsy, EnCase
Network AnalysisTraffic capture and analysisWireshark, Zeek, NetworkMiner
Threat IntelligenceIOC enrichment and contextMISP, VirusTotal, AlienVault OTX

Incident Response Playbooks

Playbooks document procedures for common incident types, providing checklists that ensure consistent response while reducing cognitive load during high-stress incidents. Essential playbooks to develop:
  • Ransomware: Isolation procedures, backup verification, decryption assessment, law enforcement notification
  • Data breach: Scope determination, regulatory notification requirements, customer communication templates
  • Account compromise: Credential reset procedures, session termination, access review
  • Phishing: Email quarantine, recipient notification, credential reset if clicked
  • DDoS: Traffic analysis, mitigation activation, upstream provider coordination
  • Insider threat: Evidence preservation, HR coordination, legal consultation
Playbooks should reference the MITRE ATT&CK framework for threat context and detection opportunities.

Exercises and Training

Regular exercises validate plans, identify gaps, and build team muscle memory. The NIST Cybersecurity Framework emphasizes exercises as essential to the Respond function.
Exercise TypeDescriptionFrequencyParticipants
TabletopDiscussion-based scenario walkthroughQuarterlyIR team, leadership
FunctionalTest specific capabilities (evidence collection, communications)Semi-annuallyIR team, relevant SMEs
Full-scaleRealistic simulation with time pressureAnnuallyFull organization
Red/Purple TeamAdversary simulation with detection/responseAnnuallySecurity team, red team
Exercise best practices:
  • Include realistic injects, time compression, and surprise elements that test adaptability
  • Involve leadership and communications teams, not just technical responders
  • Document lessons learned and track improvement actions to completion
  • Vary scenarios to cover different incident types and attack vectors

Detection and Identification

Effective detection combines automated monitoring with human analysis to identify security incidents quickly and accurately.

Detection Sources

Detection SourceWhat It DetectsResponse TimeFalse Positive Rate
EDR/XDREndpoint malware, suspicious behavior, lateral movementReal-timeLow-Medium
SIEMLog anomalies, correlation patterns, policy violationsNear real-timeMedium-High
Network Detection (NDR)C2 traffic, data exfiltration, network anomaliesReal-timeMedium
User ReportsPhishing, suspicious emails, unusual behaviorVariableLow
Threat IntelligenceKnown IOCs, emerging threatsVariesLow
Cloud Security (CSPM/CWPP)Misconfigurations, suspicious API callsNear real-timeMedium

Triage and Validation

Not every alert represents a genuine incident. Effective triage distinguishes true positives from false positives:
  • Initial assessment: Gather context from multiple sources before escalating
  • IOC enrichment: Use threat intelligence platforms to assess indicator reputation
  • Scope determination: Identify affected systems, users, and data before classification
  • False positive documentation: Document false positive patterns to improve detection rules
Use the MITRE ATT&CK framework to map observed behaviors to known attack techniques, providing context for investigation prioritization.

Evidence and Forensics

Digital forensics preserves and analyzes evidence to understand incident scope, identify attackers, and support potential legal proceedings. Follow established standards like ISO/IEC 27037 for evidence handling.

Chain of Custody

Evidence handling requires documented chain of custody that tracks who collected, transferred, and analyzed evidence:
Chain of Custody ElementPurposeDocumentation Required
CollectionCapture evidence in forensically sound mannerWho, when, where, how, hash values
TransferMove evidence between partiesTransfer log, signatures, timestamps
StorageSecure evidence against tamperingAccess logs, integrity verification
AnalysisExamine evidence without modificationAnalysis logs, working copies used
DispositionRetain or destroy per policyRetention decision, destruction certificate
Forensic collection best practices:
  • Use write-blockers to prevent accidental modification during collection
  • Create bit-for-bit forensic images rather than file copies
  • Calculate and document cryptographic hashes (SHA-256) immediately after collection
  • Maintain detailed notes of all collection activities

Evidence Collection Order

Collect evidence in order of volatility—the most volatile data disappears first:
PriorityEvidence TypeVolatilityCollection Method
1Memory (RAM)Seconds-minutesMemory dump tools (AVML, WinPmem)
2Running processesMinutesProcess listing, handle enumeration
3Network connectionsMinutesNetstat, connection logs
4Disk (live)HoursForensic imaging, cloud snapshots
5LogsDays-weeksLog export, SIEM queries
6BackupsWeeks-monthsBackup retrieval

Evidence Preservation

  • Snapshot before changes: Create cloud snapshots or disk images before any remediation
  • Isolate, don’t power off: Host isolation preserves volatile data while preventing lateral movement
  • Preserve logs: Ensure log retention policies don’t delete evidence during investigation
  • Document everything: Maintain detailed timeline of all investigative actions

Containment and Eradication

Containment limits incident impact while eradication removes attacker presence. Both phases require careful coordination to avoid alerting attackers or destroying evidence.

Containment Strategies

Containment ActionUse CaseImpactReversibility
Network isolationPrevent lateral movement, block C2High - system offlineEasy
Account disablementStop credential abuseMedium - user disruptionEasy
Endpoint quarantineIsolate while preserving access for investigationMediumEasy
Service shutdownStop compromised applicationHigh - service outageEasy
Credential rotationInvalidate stolen credentialsLow-MediumPermanent
DNS/IP blockingBlock malicious infrastructureLowEasy
Application allowlistingPrevent unauthorized executionHigh - operational impactMedium
Containment decision factors:
  • Evidence preservation: Will this action destroy evidence? Collect first if so
  • Attacker awareness: Will the attacker detect containment and accelerate damage?
  • Business impact: What’s the operational cost of this containment action?
  • Scope: Is containment scoped appropriately, or will it cause unnecessary disruption?

Eradication Activities

Eradication removes attacker presence and closes vulnerabilities that enabled the incident:
Eradication ActivityPurposeValidation Method
Malware removalRemove malicious softwareAV/EDR scan, behavioral monitoring
Vulnerability patchingClose exploitation vectorsVulnerability scan
Backdoor removalEliminate persistence mechanismsConfiguration review, integrity monitoring
Credential rotationInvalidate compromised credentialsAccess log review
System reimagingEnsure clean system stateImage verification, baseline comparison
Configuration hardeningPrevent similar attacksConfiguration audit
Eradication best practices:
  • Reimage from known-good images rather than attempting to clean compromised systems
  • Rotate all credentials that may have been exposed, not just confirmed compromised ones
  • Patch vulnerabilities before restoring systems to prevent immediate recompromise
  • Validate eradication through detection rules, scanning, and behavioral monitoring
  • Document all eradication actions for post-incident review

Recovery and Communications

Recovery restores normal operations while communications keep stakeholders informed. Both require careful planning and execution.

Phased Recovery

Recovery should be phased with enhanced monitoring to detect recompromise attempts:
Recovery PhaseActivitiesMonitoring FocusSuccess Criteria
ValidationVerify eradication completeOriginal IOCs, persistence mechanismsClean scans, no suspicious activity
Limited restorationRestore critical systems with restrictionsBehavioral anomalies, authentication patternsStable operation, no alerts
Expanded restorationRestore additional systems and usersLateral movement indicatorsContinued stability
Full restorationRemove restrictions, normal operationsStandard monitoringNormal operations confirmed
Recovery best practices:
  • Canary deployment: Restore canary systems or users first to provide early warning of recompromise
  • Enhanced monitoring: Focus monitoring on IOCs from the original incident during recovery
  • Validation testing: Perform functional testing, security scanning, and monitoring review before declaring recovery complete
  • Documentation: Document all recovery steps for post-incident review

Stakeholder Communications

Effective incident communication requires different approaches for different audiences:
AudienceCommunication FocusFrequencyApproval Required
Executive leadershipBusiness impact, response status, resource needsPer severity SLAIncident Commander
Technical teamsTechnical details, action items, coordinationAs neededTechnical Lead
EmployeesWhat happened, what to do, status updatesMajor milestonesCommunications Lead
CustomersImpact, remediation, protective actionsAs requiredLegal, Communications
RegulatorsCompliance notifications, formal reportsPer regulationLegal, Compliance
MediaPrepared statements, factual updatesAs neededCommunications, Legal

Regulatory Notification Requirements

Many regulations mandate breach notification within specific timeframes. Consult legal counsel for jurisdiction-specific requirements:
RegulationNotification TimelineKey Requirements
GDPR (EU)72 hours to supervisory authorityNature of breach, categories of data, mitigation measures
CCPA/CPRA (California)“Most expedient time possible”Categories of information, what happened
HIPAA (US Healthcare)60 days to individuals, HHSPHI involved, mitigation steps
PCI DSSImmediately to card brandsCardholder data exposure
SEC Rules (US Public Companies)4 business days (material incidents)Material impact determination

Metrics and Continuous Improvement

Metrics enable data-driven improvement of incident response capabilities. Track these key indicators to measure and improve IR effectiveness.

Key Incident Response Metrics

MetricDefinitionTarget TrendAlert Threshold
Mean Time to Detect (MTTD)Time from compromise to detectionDecreasingAbove industry benchmark
Mean Time to Respond (MTTR)Time from detection to containmentDecreasingExceeds severity SLA
Mean Time to RecoverTime from containment to normal operationsDecreasingExceeds business tolerance
Containment Effectiveness% of incidents contained before significant damageIncreasingBelow 80%
Repeat Incident Rate% of incidents similar to previous incidentsDecreasingIncreasing trend
Exercise Coverage% of critical systems/scenarios testedIncreasingBelow 70%
Playbook Coverage% of incident types with documented playbooksIncreasingBelow 80%

Lessons Learned Process

Post-incident reviews (also called retrospectives or post-mortems) are essential for continuous improvement. Follow a blameless approach focused on systemic improvements: Post-incident review structure:
  1. Timeline reconstruction: What happened, when, and in what sequence?
  2. Detection analysis: How was the incident detected? Could it have been detected earlier?
  3. Response evaluation: What went well? What could be improved?
  4. Root cause analysis: What enabled the incident? What systemic factors contributed?
  5. Action items: What specific improvements will prevent recurrence?
Action item tracking:
Action CategoryExamplesOwnerTimeline
Detection improvementsNew detection rules, alert tuningSecurity Operations2-4 weeks
Playbook updatesNew procedures, clarified stepsIR Team1-2 weeks
Access changesPermission adjustments, access reviewsIT/Security1-2 weeks
Architecture changesSegmentation, hardeningEngineering1-3 months
TrainingTeam training, awareness updatesSecurity2-4 weeks

Incident Trend Analysis

Analyze incident trends to identify systemic issues requiring strategic investment:
  • Incident categorization: Classify incidents by type, attack vector, and root cause
  • Pattern identification: Look for repeated incident types indicating fundamental gaps
  • Investment prioritization: Direct security investments toward highest-impact improvements
  • Benchmark comparison: Compare metrics against industry benchmarks and peer organizations

Conclusion

Incident response excellence requires preparation, practice, and continuous improvement. Security engineers design incident response programs that treat security incidents as learning opportunities while minimizing damage through rapid, effective response. Key success factors:
  • Clear roles, responsibilities, and decision authority documented and practiced
  • Pre-staged access, tooling, and playbooks ready before incidents occur
  • Regular exercises that test and improve response capabilities
  • Structured evidence handling that preserves forensic integrity
  • Phased containment, eradication, and recovery with validation at each stage
  • Stakeholder communications tailored to audience needs and regulatory requirements
  • Metrics-driven continuous improvement through blameless post-incident reviews
Organizations that invest in incident response fundamentals respond more effectively to incidents while building organizational resilience. The goal is not to prevent all incidents—that’s impossible—but to detect quickly, respond effectively, and learn continuously.

References