Speed
Measured runtime path from request entry to a sponsoring decision being ready, excluding the AI model’s own wait time. p99 means 99% of requests were this fast or faster.
Evidence
We test wavebird the way we would want our infrastructure tested. Every claim on this page is backed by reproducible evidence from controlled benchmark runs and a comprehensive pre-pilot validation campaign with fault injection.
Last updated: 2026-03-30
Speed
Measured runtime path from request entry to a sponsoring decision being ready, excluding the AI model’s own wait time. p99 means 99% of requests were this fast or faster.
Reliability
Over 8 hours (263,534 total slots processed) with fault injection active, we observed 0 missing proofs across the 260,321 terminal slots expected to produce proof.
Resilience
We simulated seven exchange failure modes (plus three PostgreSQL failures). Result: 0 crashes, 0 unrecoverable states, correct circuit breaker activation and recovery.
In March 2026, we ran our pre-pilot validation campaign: a set of automated tests designed to find problems before the first real partner connects. We did not test under ideal conditions. We deliberately broke things.
Our Mock-SSP chaos mode randomly injected network delays, server errors, malformed responses, dropped connections, and traffic spikes into the test runs. The goal is simple: prove correct behavior under failure before we connect a live partner.
Mock-SSP
Mock-SSP simulates an ad exchange response inside the benchmark harness and inside the pre-pilot chaos campaign so we can measure the internal ad path without public network noise.
We processed 10,000 sponsoring slots at 100 concurrent connections with fault injection active. Result: 0 missing proofs, 0 invalid signatures, 0 orphaned beacons.
We ran 5,000 slots through 6 billing scenarios — including micro-unit price boundaries, duplicate detection, and multi-SSP fallback attribution. Result: exact reconciliation in every scenario (0 billing errors).
We tested 7 SSP failure scenarios plus 3 PostgreSQL failure scenarios. Result: 0 crashes and correct circuit breaker activation and recovery in all scenarios.
Found and fixed during the campaign
Settlement attribution bug in multi-SSP fallback: slots were incorrectly attributed to the timed-out primary SSP.
In March 2026, we ran a comprehensive pre-pilot validation campaign with chaos fault injection active. The campaign tested proof integrity, settlement accuracy, resilience, concurrency limits, and sustained stability.
Found and fixed: settlement attribution bug in multi-SSP fallback. Slots were incorrectly attributed to the timed-out primary SSP.
Open finding: in-memory accumulation causes memory growth over extended runs. Slot eviction and ledger compaction are implemented and active. This is under continued optimization.
We pushed the system from 10 to 200 concurrent connections to find where it starts to struggle. The answer: it never crashes. It gets slower, but it keeps working.
“c100” means 100 concurrent connections.
| Concurrent connections | Response time (p99) | Throughput | Errors |
|---|---|---|---|
| 10 | 64 ms | 333 ops/s | 0 |
| 25 | 293 ms | 126 ops/s | 0 |
| 50 | 695 ms | 92 ops/s | 0 |
| 75 | 1,203 ms | 73 ops/s | 0 |
| 100 | 1,764 ms | 64 ops/s | 0 |
| 150 | 3,267 ms | 52 ops/s | 0 |
| 200 | 3,590 ms | 33 ops/s | 0 |
At 200 concurrent connections, p99 response time increases to 3.6 seconds but every response is still valid (2xx). Under that extreme load we see decision poll timeouts; when load drops back to 25 connections, the system recovers within 30 seconds.
The “Errors” column is HTTP-level errors. In these runs, every response was 2xx at every concurrency level. Under extreme load we do observe decision poll timeouts (2 at c100, 130 at c150, and 1,871 at c200). The system degrades gracefully rather than failing hard. Spike recovery from c200 to c25 completes within 30 seconds.
We ran the system continuously for 8 hours with fault injection active, processing 263,534 sponsoring slots. All 8 hourly quality gates passed. 0 missing proofs across 260,321 terminal slots expected to produce proof. 0 handle leaks.
Open finding
What we found: memory usage grows over extended runs because in-memory state accumulates faster than it is cleaned up. Slot eviction and ledger compaction are implemented and active. This is under continued optimization.
The benchmark suite and the pre-pilot campaign were both run under controlled conditions. The goal was to measure the wavebird runtime itself, not the public internet or live model providers.
Per-run variation exists internally and will be published once the sanitized artifact bundle is ready. The original benchmark methodology remains unchanged and the March 23 results remain valid.
March 23, 2026
Firewall p99 latency
Filtering step before any ad request leaves the runtime.
Mock-SSP round-trip p99 latency
Internal ad path against a controlled exchange substitute.
End-to-end p99 latency
Measured runtime path with external model wait time excluded.
Settlement max runtime
Longest measured settlement run in the current evidence pack.
Mock-SSP request throughput
Controlled request throughput inside the benchmark harness.
March 30, 2026
Proof integrity
Processed at c100 with 0 missing proofs.
Settlement accuracy
6 scenarios with exact reconciliation.
SSP resilience
0 crashes across SSP failure scenarios.
Concurrency tested
Graceful degradation under spike load.
Sustained load
Processed over 8 hours with 0 proof gaps.
What this does not claim
We are transparent about what this evidence does and does not prove:
What is still open
Two things are not where we want them yet: beacon processing slows down above 50 concurrent connections, and the 8-hour sustained run shows more memory growth than our target allows. Both are under active optimization.
Artifacts
Downloadable artifacts will be published once the sanitized bundle is ready for public release. Pre-pilot campaign reports are available internally as machine-readable JSON artifacts.
Related material
Next step
If the runtime evidence is what you needed, the next step is the integration path.