The Alert Fatigue Crisis
VXDF (Validated Exploitable Data Flow) represents a paradigm shift from theoretical vulnerability detection to evidence-based security analysis. This inaugural post dissects the technical architecture of the VXDF engine—a multi-stage analysis pipeline that systematically validates exploitability before alerting security teams.
Unlike traditional scanners that output raw findings, the VXDF engine operates as an automated security investigator: it ingests signals from multiple sources, correlates them through static analysis, dynamically validates potential exploits in sandboxed environments, and produces verified findings with concrete proof-of-concept demonstrations.
┌─────────────────────────────────────────────────────────────────┐ │ VXDF ENGINE PIPELINE │ ├─────────────────────────────────────────────────────────────────┤ │ Stage 1: Universal Fact Ingestion & Modeling │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ SARIF │ │ CycloneDX │ │ API Disc. │ │ IaC Scans │ │ │ │ (SAST) │ │ (SCA/SBOM) │ │ (Custom) │ │ (Checkov) │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ │ └───────────────┼───────────────┼───────────────┘ │ │ ▼ ▼ │ │ ┌─────────────────────────────────────────┐ │ │ │ BaseFact Objects │ │ │ │ • VulnerabilityFact • EndpointFact │ │ │ │ • PackageFact • ConfigFact │ │ │ └─────────────────┬───────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ Stage 2: Static Correlation & Hypothesis Generation │ │ ┌─────────────────▼───────────────────────┐ │ │ │ Correlation Graph Builder │ │ │ │ │ │ │ │ Layer 1: Heuristic Correlation (0.1-0.4)│ │ │ │ Layer 2: Import Graph Analysis (0.5-0.6)│ │ │ │ Layer 3: Taint Analysis (0.8-0.9)│ │ │ └─────────────────┬───────────────────────┘ │ │ ▼ │ │ ┌─────────────────────────────────────────┐ │ │ │ Hypothesis Graph (Facts + Edges) │ │ │ │ Nodes: Facts | Edges: Relationships │ │ │ └─────────────────┬───────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ Stage 3: Dynamic Validation & Hypothesis Testing │ │ ┌─────────────────▼───────────────────────┐ │ │ │ Validation Orchestrator │ │ │ │ │ │ │ │ • Payload Generation (context-aware) │ │ │ │ • Sandboxed Request Execution │ │ │ │ • Multi-Oracle Evidence Collection: │ │ │ │ - Content Oracle (response analysis) │ │ │ │ - Timing Oracle (blind injection) │ │ │ │ - OAST Oracle (out-of-band callbacks) │ │ │ └─────────────────┬───────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ Stage 4: Reporting & Enrichment │ │ ┌─────────────────▼───────────────────────┐ │ │ │ Validated Exploitable Findings │ │ │ │ │ │ │ │ • OWASP VXDF JSON Output │ │ │ │ • Evidence Artifacts (req/resp logs) │ │ │ │ • CWE/CVSS Enrichment │ │ │ │ • Remediation Guidance │ │ │ └─────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘
Stage 1: Universal Fact Ingestion & Modeling
The foundation of evidence-based security analysis is structured data normalization. Stage 1 transforms heterogeneous security tool outputs into a unified fact model, preserving provenance and enabling cross-tool correlation.
Input Sources & Parsing
Static Analysis (SARIF)
- • Semgrep, CodeQL, Snyk Code
- • Bandit (Python), ESLint (JS)
- • Custom SAST rule outputs
Composition Analysis
- • CycloneDX SBOM (Trivy, Grype)
- • SPDX JSON (Snyk Open Source)
- • Package vulnerability databases
API Discovery
- • Flask route extraction
- • Express.js endpoint mapping
- • OpenAPI specification parsing
Infrastructure Scans
- • Terraform (tfsec, Checkov)
- • Kubernetes YAML (kube-score)
- • CloudFormation templates
BaseFact Data Model
// Core BaseFact interface interface BaseFact { id: string; // Unique identifier type: FactType; // VULNERABILITY | ENDPOINT | PACKAGE | CONFIG location: SourceLocation; // File path, line numbers provenance: ProvenanceInfo; // Source tool, version, timestamp confidence: number; // Initial confidence [0.0, 1.0] metadata: Record<string, any>; // Tool-specific data } // Example: SQL Injection from Semgrep SARIF const vulnFact: VulnerabilityFact = { id: "vuln_001", type: "VULNERABILITY", location: { filePath: "src/models/user.py", startLine: 45, endLine: 47, codeSnippet: "cursor.execute(f\"SELECT * FROM users WHERE id = {user_id}\")" }, provenance: { tool: "semgrep", version: "1.45.0", ruleId: "python.lang.security.audit.sqli.python-sqli", timestamp: "2024-01-15T10:30:00Z" }, confidence: 0.8, vulnerabilityData: { cweId: "CWE-89", severity: "HIGH", category: "SQL_INJECTION", sinkFunction: "cursor.execute", taintedParameter: "user_id" } };
Key Innovation: Every fact retains complete audit trail. We never lose track of which tool produced what finding, enabling confidence calibration and false positive feedback loops.
Stage 2: Static Correlation & Hypothesis Generation
Stage 2 transforms isolated facts into a Correlation Graph—a knowledge graph where nodes represent security facts and edges represent potential attack paths. This stage employs a three-layer analysis approach with increasing fidelity.
Layer 1: Heuristic Correlation (Confidence: 0.1-0.4)
// Heuristic correlation example function correlateByProximity(facts: BaseFact[]): CorrelationEdge[] { const edges: CorrelationEdge[] = []; for (const vulnFact of facts.filter(f => f.type === 'VULNERABILITY')) { for (const endpointFact of facts.filter(f => f.type === 'ENDPOINT')) { const pathSimilarity = calculatePathSimilarity( vulnFact.location.filePath, endpointFact.location.filePath ); if (pathSimilarity > 0.7) { // Same directory or similar naming edges.push({ source: endpointFact.id, target: vulnFact.id, relationship: 'heuristic_proximity', confidence: 0.2 + (pathSimilarity * 0.2), evidence: { pathSimilarity, heuristic: 'file_proximity' } }); } } } return edges; }
Layer 2: Import Graph Analysis (Confidence: 0.5-0.6)
Analyze module dependencies to establish structural relationships between endpoints and vulnerable code.
// Import graph traversal for Python/JavaScript class ImportGraphAnalyzer { buildImportGraph(codebase: string): ImportGraph { // Parse import statements across the codebase // Build directed graph of module dependencies } findImportPath(sourceFile: string, targetFile: string): ImportPath | null { // BFS/DFS to find if sourceFile can reach targetFile // Return path: [file1 -> file2 -> ... -> targetFile] } correlateViaImports(endpointFact: EndpointFact, vulnFact: VulnerabilityFact): CorrelationEdge | null { const importPath = this.findImportPath( endpointFact.location.filePath, vulnFact.location.filePath ); if (importPath) { return { source: endpointFact.id, target: vulnFact.id, relationship: 'has_import_path_to', confidence: 0.6, evidence: { importPath: importPath.pathSteps, transitiveDepth: importPath.length } }; } return null; } }
Layer 3: High-Fidelity Taint Analysis (Confidence: 0.8-0.9)
The most sophisticated layer: orchestrate existing static analyzers to perform targeted data flow analysis between specific source-sink pairs.
// Dynamic Semgrep rule generation for taint analysis class TaintAnalysisOrchestrator { async validateDataFlow(sourceNode: EndpointFact, sinkNode: VulnerabilityFact): Promise<TaintResult> { // Generate custom Semgrep rule for this specific source-sink pair const customRule = this.generateTaintRule(sourceNode, sinkNode); // Execute Semgrep with the generated rule const semgrepResult = await this.executeSemgrep(customRule); if (semgrepResult.findings.length > 0) { return { hasDataFlow: true, confidence: 0.9, staticTrace: semgrepResult.findings[0].dataFlowTrace, evidence: { semgrepRule: customRule, traceSteps: semgrepResult.findings[0].dataFlowTrace } }; } return { hasDataFlow: false, confidence: 0.0 }; } private generateTaintRule(endpoint: EndpointFact, vuln: VulnerabilityFact): SemgrepRule { return { id: `custom-taint-${endpoint.id}-to-${vuln.id}`, pattern: ` rules: - id: custom-data-flow-check mode: taint pattern-sources: - patterns: - pattern: ${endpoint.parameterPattern} # e.g., request.args.get("user_id") pattern-sinks: - patterns: - pattern: ${vuln.sinkPattern} # e.g., cursor.execute(...) message: "Data flows from endpoint parameter to vulnerable sink" ` }; } }
Correlation Graph Output
At the end of Stage 2, we have a rich graph structure:
- • Nodes: All security facts (vulns, endpoints, packages, configs)
- • Edges: Hypothesized relationships with confidence scores
- • Evidence: Static analysis traces, import paths, heuristic metadata
- • Prioritization: High-confidence edges (>0.7) become validation candidates
Stage 3: Dynamic Validation & Hypothesis Testing
Stage 3 implements an automated penetration testing loop. For each high-confidence hypothesis from Stage 2, the Validation Orchestrator attempts to craft and execute real exploits in sandboxed environments.
Multi-Oracle Evidence Collection
The engine employs three complementary oracles to detect successful exploits, including blind and asynchronous vulnerabilities.
1. Content Oracle
class ContentOracle { analyzeResponse(response: HttpResponse, payload: Payload): EvidenceResult { const indicators = this.getVulnIndicators(payload.type); for (const indicator of indicators) { if (response.body.includes(indicator.signature)) { return { success: true, evidence: { type: 'content_match', signature: indicator.signature, location: response.body.indexOf(indicator.signature), context: this.extractContext(response.body, indicator.signature) } }; } } return { success: false }; } }
2. Timing Oracle
class TimingOracle { async analyzeTimingSignal(endpoint: EndpointFact, payload: Payload): Promise<EvidenceResult> { // Establish baseline response time const baselineRequests = await Promise.all([ this.sendRequest(endpoint, { normalPayload: 'test' }), this.sendRequest(endpoint, { normalPayload: 'test' }), this.sendRequest(endpoint, { normalPayload: 'test' }) ]); const baselineTime = this.calculateMedianTime(baselineRequests); // Send timing-attack payload (e.g., pg_sleep, WAITFOR DELAY) const timingRequest = await this.sendRequest(endpoint, payload); const payloadTime = timingRequest.responseTime; const timeDifference = payloadTime - baselineTime; if (timeDifference > payload.expectedDelay * 0.8) { // 80% threshold return { success: true, evidence: { type: 'timing_differential', baselineTime, payloadTime, timeDifference, expectedDelay: payload.expectedDelay, confidence: Math.min(timeDifference / payload.expectedDelay, 1.0) } }; } return { success: false }; } }
3. Out-of-Band Oracle (OAST)
class OASTOracle { private oastDomain = 'vxdf-oast.com'; private callbackServer: CallbackListener; async generateOASTPayload(vulnType: string): Promise<OASTPayload> { const uniqueId = this.generateUniqueId(); const subdomain = `${uniqueId}.${this.oastDomain}`; // Register expectation for callback this.callbackServer.expectCallback(uniqueId, { timeout: 30000, // 30 second timeout expectedProtocols: ['DNS', 'HTTP', 'HTTPS'] }); switch (vulnType) { case 'SSRF': return { payload: `http://${subdomain}/ssrf-test`, expectedCallback: { type: 'HTTP', subdomain } }; case 'XXE': return { payload: `<!ENTITY xxe SYSTEM "http://${subdomain}/xxe-test">&xxe;`, expectedCallback: { type: 'HTTP', subdomain } }; } } }
Evidence Fusion: The orchestrator combines results from all oracles to make the final determination. Even if one oracle fails, others might provide confirmation. For maximum confidence, multiple oracles should agree.
Stage 4: Reporting & Enrichment
Stage 4 transforms validated findings into actionable intelligence. The engine produces machine-readable OWASP VXDF JSON reports enriched with CWE mappings, CVSS scores, and concrete remediation guidance.
OWASP VXDF JSON Output Format
{ "vxdf_version": "1.0", "scan_metadata": { "scan_id": "scan_20240115_103000", "timestamp": "2024-01-15T10:30:00Z", "target": { "repository": "https://github.com/example/vulnerable-app", "commit_sha": "a1b2c3d4e5f6", "branch": "main" }, "engine_version": "0.9.2" }, "validated_findings": [ { "finding_id": "vxdf_001", "title": "SQL Injection in User Profile Endpoint", "severity": "CRITICAL", "cvss_score": 9.8, "cwe_id": "CWE-89", "vulnerability_details": { "category": "SQL_INJECTION", "location": { "file_path": "src/models/user.py", "line_range": [45, 47], "function": "get_user_profile", "code_snippet": "cursor.execute(f\"SELECT * FROM users WHERE id = {user_id}\")" }, "entry_point": { "endpoint": "/api/users/{user_id}", "method": "GET", "parameter": "user_id", "parameter_type": "path_parameter" } }, "exploit_evidence": { "validation_method": "dynamic_exploitation", "proof_of_concept": { "http_request": { "method": "GET", "url": "/api/users/1' OR '1'='1", "headers": { "Authorization": "Bearer test_token" } }, "http_response": { "status_code": 200, "response_time_ms": 1250, "body_snippet": "{\"users\": [{\"id\": 1, \"username\": \"admin\"}...]" } }, "oracle_confirmations": [ { "oracle_type": "content", "result": "success", "evidence": "Response contained multiple user records" }, { "oracle_type": "timing", "result": "success", "evidence": "pg_sleep(5) caused 5.2s delay vs 0.1s baseline" } ], "confidence_score": 0.98 } } ] }
Why This Architecture is Revolutionary
Evidence-Driven Analysis
Unlike probabilistic ML approaches, every VXDF finding is backed by concrete exploitation evidence. No black-box scoring—every confidence level has transparent justification through static analysis traces and dynamic validation results.
Best-in-Class Tool Orchestration
Rather than reinventing static analysis, VXDF leverages Semgrep, CodeQL, and other mature tools as specialized components. The innovation lies in intelligent orchestration and correlation across tool boundaries.
Multi-Factor Exploit Validation
Professional penetration testers use content analysis, timing attacks, and out-of-band techniques. VXDF automates these methodologies, detecting even blind and asynchronous vulnerabilities that simple DAST tools miss.
Auditable Confidence Scoring
Confidence scores aren't arbitrary—they reflect cumulative evidence strength. A 0.9 confidence means "static taint analysis confirmed + successful dynamic exploit." Security engineers can inspect the reasoning behind every score.
Performance & Scalability Characteristics
Stage 1-2 (Static)
- • Linear complexity O(n) in codebase size
- • Parallelizable across files/modules
- • ~10-30 seconds for typical web app
Stage 3 (Dynamic)
- • O(k) in high-confidence hypotheses
- • Sandboxed execution required
- • ~2-5 minutes per hypothesis
Overall Pipeline
- • 95%+ false positive reduction
- • ~10-15 min end-to-end for most apps
- • Scales horizontally with container orchestration
Join the Evidence-Based Security Revolution
The VXDF engine represents a fundamental shift from theoretical vulnerability detection to provable exploit validation. This approach has the potential to eliminate alert fatigue, accelerate remediation cycles, and restore developer confidence in security tooling.
Open Source Contribution Opportunities
Parser Development
- • New SAST tool integrations (SonarQube, Veracode)
- • Language-specific API discovery engines
- • Container security scan parsers (Trivy, Clair)
Validation Modules
- • Framework-specific payload generators
- • Advanced oracle implementations
- • Exploit technique libraries
Core Engine
- • Graph correlation algorithms
- • Performance optimizations
- • Distributed processing architecture
Integration & UX
- • CI/CD pipeline integrations
- • Dashboard and visualization tools
- • IDE plugins and developer workflows
Ready to move beyond false positives? Join us in building the future of evidence-based application security.
Next posts in this series: Deep-dive implementation guides, performance benchmarks, and real-world case studies from production deployments.
Join the Conversation
Have thoughts on this architecture? Questions about implementation details? We'd love to hear from the security engineering community.