Documentation Index
Fetch the complete documentation index at: https://threatbasis.io/llms.txt
Use this file to discover all available pages before exploring further.
Input validation and output encoding form the foundation of injection attack prevention, treating all external data as untrusted until proven safe. Security engineers ensure that inputs are validated at every boundary and outputs are encoded for their specific context, preventing dangerous strings from being interpreted as code. Effective boundary hardening requires understanding the difference between validation (ensuring inputs meet expectations) and encoding (ensuring outputs are safe in their destination context).
Injection vulnerabilities consistently rank among the most critical security flaws. According to the CWE Top 25 Most Dangerous Software Weaknesses, injection-related weaknesses including CWE-79 (XSS), CWE-89 (SQL Injection), and CWE-78 (OS Command Injection) remain prevalent attack vectors. Proper input validation and output encoding eliminate entire classes of these vulnerabilities.
Understanding the distinction between validation and encoding is essential for effective security:
| Concept | Purpose | When Applied | Example |
|---|
| Input Validation | Ensure data meets expected format, type, and constraints | At trust boundaries when receiving data | Reject email without @ symbol |
| Output Encoding | Ensure data is safe in destination context | Before rendering or executing data | Convert < to < in HTML |
| Sanitization | Remove or neutralize dangerous content | After validation, before storage | Strip HTML tags from plain text field |
| Canonicalization | Convert to standard form before validation | Before validation | Normalize Unicode, resolve paths |
Both validation and encoding are required—validation alone cannot prevent all injection attacks, and encoding alone cannot ensure data integrity.
The following table summarizes key validation strategies:
| Validation Type | Description | Prevents | Example |
|---|
| Type validation | Ensure correct data type | Type confusion attacks | Reject string where integer expected |
| Length validation | Enforce maximum/minimum length | Buffer overflow, DoS | Limit username to 64 characters |
| Range validation | Ensure numeric bounds | Integer overflow, logic errors | Age between 0-150 |
| Format validation | Match expected pattern | Malformed input attacks | Email regex validation |
| Allow-list validation | Accept only known-good values | Unknown attack patterns | Enum for status values |
Validation at Every Boundary
Input validation should occur at every trust boundary—HTTP requests, message queue consumers, file parsers, API endpoints, and database inputs. Validation at boundaries prevents malicious data from propagating through the system.
- Schema-based validation: Use JSON Schema, Protocol Buffers, or Apache Avro for structured validation with clear contracts
- Centralized validation: Implement validation in framework middleware or shared libraries to ensure consistent application across all endpoints
- Defense in depth: Validate at multiple layers (client, API gateway, application, database) rather than relying on a single validation point
// Example: Zod schema validation in TypeScript
import { z } from "zod";
const UserSchema = z.object({
email: z.string().email().max(254),
age: z.number().int().min(0).max(150),
role: z.enum(["user", "admin", "moderator"]),
});
// Validation throws on invalid input
const user = UserSchema.parse(untrustedInput);
Comprehensive validation requires checking multiple dimensions of input data:
- Type validation: Ensure inputs match expected data types—string inputs should not be accepted where integers are expected
- Length validation: Enforce maximum lengths based on business requirements and storage constraints to prevent buffer overflows and resource exhaustion
- Range validation: Ensure numeric inputs fall within acceptable bounds to prevent integer overflow and business logic errors
- Format validation: Use regular expressions or parsing libraries (see OWASP Validation Regex Repository) to ensure inputs match expected patterns
Canonicalization
Canonicalization transforms inputs into standard form before validation, preventing bypass through encoding variations:
- Unicode normalization: Convert different representations of the same character into single form using Unicode Normalization Forms (NFC recommended)
- Path canonicalization: Resolve relative paths (
../) and symbolic links to prevent directory traversal attacks (CWE-22)
- URL canonicalization: Parse and validate URLs in canonical form, handling encoding variations and protocol differences
Allow-Lists Over Deny-Lists
| Approach | Security | Maintenance | Use Case |
|---|
| Allow-list | Strong—rejects unknown patterns | Low—stable over time | Enumerated values, known formats |
| Deny-list | Weak—misses new patterns | High—requires constant updates | Legacy systems, broad input types |
Allow-list validation explicitly defines acceptable inputs, rejecting everything else. Deny-lists are incomplete by nature, as new attack patterns emerge continuously. Enumerated types and strict format validation implement allow-list approaches—only known-good values should be accepted.
Rejection Over Coercion
Invalid inputs should be rejected with clear error messages rather than silently truncated or coerced:
- No silent truncation: Truncation can bypass validation by removing dangerous characters after validation—reject inputs exceeding length limits
- No type coercion: Implicit type conversion can lead to unexpected behavior—enforce strict type checking
- Clear error messages: Provide actionable feedback without revealing system internals
Output Encoding
Output encoding ensures that data is rendered safely in its destination context. The OWASP XSS Prevention Cheat Sheet provides comprehensive guidance on context-aware encoding.
Context-Aware Encoding
Output encoding must match the destination context. Each context has different encoding requirements:
| Context | Characters to Encode | Encoding Method | Example |
|---|
| HTML Body | < > & " ' | HTML entity encoding | < → < |
| HTML Attribute | < > & " ' ` | Attribute encoding | " → " |
| JavaScript | ' " \ / + control chars | JavaScript string escaping | ' → \' |
| URL Parameter | Non-alphanumeric | Percent encoding | → %20 |
| CSS | Non-alphanumeric | CSS hex escaping | ( → \28 |
| JSON | " \ / + control chars | JSON.stringify | Automatic |
HTML Context
Modern frameworks provide automatic context-aware encoding:
- Auto-escaping frameworks: React, Vue, and Angular provide context-aware HTML encoding by default—enable auto-escaping globally
- Template engines: Use Jinja2 (Python), Thymeleaf (Java), or Razor (.NET) with auto-escaping enabled
- Avoid dangerous sinks: Never use
innerHTML, document.write(), or outerHTML with untrusted data—use textContent and setAttribute instead
// UNSAFE: XSS vulnerability
element.innerHTML = userInput;
// SAFE: Automatic encoding
element.textContent = userInput;
// SAFE: React auto-escapes by default
return <div>{userInput}</div>;
JavaScript Context
JavaScript string encoding prevents script injection:
- Escape special characters: Encode quotes, backslashes, and control characters in JavaScript strings
- Avoid inline handlers: Never use
onclick, onerror, or other inline event handlers with user data—attach event handlers programmatically
- Never use eval(): The
eval() function and Function constructor should never process user input—they enable arbitrary code execution (CWE-95)
CSS, URL, and JSON Contexts
- CSS encoding: Escape non-alphanumeric characters in style attributes and blocks to prevent CSS injection
- URL encoding: Use
encodeURIComponent() for query parameters and encodeURI() for full URLs—encode path, query, and fragment separately
- JSON encoding: Always use
JSON.stringify() or equivalent library functions—manual JSON construction is error-prone
- Avoid context mixing: When output spans multiple contexts, encode for each context separately
SQL and Command Injection Prevention
SQL injection (CWE-89) and command injection (CWE-78) remain critical vulnerabilities. The OWASP SQL Injection Prevention Cheat Sheet provides comprehensive defense strategies.
Parameterized Queries
Parameterized queries (prepared statements) separate SQL code from data, preventing SQL injection:
| Language/Framework | Parameterized Query Method | Example |
|---|
| Java JDBC | PreparedStatement | stmt.setString(1, userInput) |
| Python | DB-API parameters | cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,)) |
| Node.js | Placeholder parameters | db.query("SELECT * FROM users WHERE id = $1", [userId]) |
| .NET | SqlParameter | cmd.Parameters.AddWithValue("@id", userId) |
Implementation guidance:
- Use ORMs: Hibernate (Java), Entity Framework (.NET), SQLAlchemy (Python), and Prisma (Node.js) provide parameterized queries by default
- Prohibit concatenation: Use static analysis tools like Semgrep or SonarQube to detect string concatenation in SQL queries
- Stored procedures: Use parameterized queries within stored procedures—stored procedures alone do not prevent SQL injection if they use dynamic SQL
# UNSAFE: SQL injection vulnerability
cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
# SAFE: Parameterized query
cursor.execute("SELECT * FROM users WHERE id = %s", (user_id,))
Command Injection Prevention
Avoid shell command execution entirely where possible:
- Use native libraries: Replace shell commands with language-native APIs (e.g., use Python’s
os.makedirs() instead of os.system("mkdir -p ..."))
- Use argument arrays: When shell execution is unavoidable, pass arguments as arrays to prevent metacharacter interpretation
- Never use shell=True: In Python’s
subprocess, always use shell=False with argument lists
- Sandbox execution: Use containers, seccomp, or restricted shells to limit blast radius
# UNSAFE: Command injection vulnerability
os.system(f"convert {user_filename} output.png")
# SAFE: Argument array prevents injection
subprocess.run(["convert", user_filename, "output.png"], shell=False)
File Upload Security
File uploads present unique security challenges. The OWASP File Upload Cheat Sheet provides comprehensive guidance on secure file handling.
Content Validation
| Validation Layer | What It Checks | Limitations | Recommendation |
|---|
| MIME type | Content-Type header | Attacker-controlled | Validate but don’t trust |
| File extension | Filename suffix | Can be spoofed | Allow-list only |
| Magic numbers | File header bytes | Can be forged | Use with other checks |
| Content parsing | Full file structure | Resource intensive | Required for high-risk types |
Validation strategies:
- Multi-layer validation: Combine extension, magic number, and content validation for defense in depth
- Image re-encoding: Use ImageMagick, Sharp, or Pillow to parse and re-encode images, removing embedded malicious content
- Document sanitization: Use Apache PDFBox or similar libraries to remove JavaScript from PDFs
Storage Security
- Store outside web root: Uploaded files should never be directly accessible—serve through download handlers with explicit
Content-Type and Content-Disposition headers
- Virus scanning: Integrate ClamAV or cloud-based scanning before files are made available
- Separate storage domain: Serve user content from a separate domain to prevent cookie theft via XSS
Filename Sanitization
- Generate new filenames: Use UUIDs or hashes to eliminate filename-based attacks—store original filenames as metadata
- Remove path characters: Strip
../, ..\\, and null bytes to prevent path traversal (CWE-22)
- Allow-list characters: Accept only alphanumeric characters, hyphens, and underscores in filenames
Testing and Validation
Comprehensive testing ensures validation and encoding controls function correctly.
Security Testing Approaches
| Testing Type | Purpose | Tools | Integration Point |
|---|
| Property-based testing | Test parser robustness with random inputs | Hypothesis, fast-check | Unit tests |
| Fuzzing | Find parsing vulnerabilities | AFL++, libFuzzer, OSS-Fuzz | CI/CD |
| Negative testing | Verify invalid inputs are rejected | Custom test suites | Unit/integration tests |
| DAST | Test running application | OWASP ZAP, Burp Suite | Staging environment |
Static Analysis
Static analysis tools detect dangerous patterns before code reaches production:
- SQL injection detection: Semgrep, SonarQube, and CodeQL detect string concatenation in SQL queries
- XSS detection: Identify use of dangerous sinks like
innerHTML and document.write()
- Taint tracking: Advanced tools track untrusted data flow from sources to sinks
Negative Testing Checklist
Verify that your application correctly handles:
Implementation Checklist
Use this checklist to verify comprehensive input validation and output encoding:
| Category | Control | Status |
|---|
| Input Validation | Schema validation at all API endpoints | ☐ |
| Type, length, range, and format checks | ☐ |
| Allow-list validation for enumerated values | ☐ |
| Canonicalization before validation | ☐ |
| Output Encoding | Context-aware encoding for all outputs | ☐ |
| Auto-escaping enabled in templates | ☐ |
| No use of dangerous sinks (innerHTML, eval) | ☐ |
| SQL/Command | Parameterized queries for all database access | ☐ |
| No shell command execution with user input | ☐ |
| File Upload | Multi-layer content validation | ☐ |
| Files stored outside web root | ☐ |
| Generated filenames (UUIDs) | ☐ |
| Testing | Static analysis in CI/CD pipeline | ☐ |
| Negative tests for all input handlers | ☐ |
| Regular DAST scanning | ☐ |
Conclusion
Input validation and output encoding prevent injection attacks by treating all external data as untrusted and ensuring that data is safe in its destination context. Security engineers implement validation at every trust boundary and context-aware encoding for all outputs.
Key success factors:
- Validate inputs at every boundary using schema-based validation with type, length, range, and format checks
- Apply context-aware output encoding using framework-provided functions
- Use parameterized queries exclusively for database access
- Avoid shell command execution; when unavoidable, use argument arrays
- Implement multi-layer file upload validation with content verification
- Enforce controls through static analysis and comprehensive testing
Organizations that invest in input validation and output encoding fundamentals eliminate entire classes of injection vulnerabilities, significantly reducing their attack surface.
References