Chapter 1: The Crisis
The game that couldn't play itself
Precursors: Origins of Folklore is a spiritual successor to Creatures (1996) — a game where artificial life forms with full biochemistry, LLM-powered cognition, and genetic inheritance evolve in a procedurally generated world. It runs entirely in the browser using Phaser 3 and TypeScript.
By early March 2026, we had built every system: 48-chemical biochemistry, 50 species with genetic crossover, LLM dream consolidation, cultural memetic inheritance, parallax worlds across 6 biomes with 4 vertical tiers. The codebase was 98,000+ game objects across a 144,000-pixel-wide world.
And it was completely unplayable.
The render loop reported FPS numbers. Our benchmarks passed. Our tests were green. But if you actually tried to play the game — select a creature, pan the camera, watch two Norns try to breed — you'd see a slideshow. Sub-1 FPS. Creatures blinked in and out of existence. The world stuttered. Breeding was mathematically impossible due to a biochemistry bug no benchmark could catch.
Our instrumentation had failed us. Not because it was broken, but because it was measuring the wrong things.
How bad was it, really?
| Metric | Before | After |
|---|---|---|
| sceneRender time | 1,688ms per frame | 0.894ms per frame |
| Game loop frequency | 0.4 Hz (once every 2.5 seconds) | 39.7 Hz |
| Creatures at spawn | 81 (regression) | 2 Norns + 3 eggs |
| Wall-time overrun ratio | 10.44× (sim at 1/10th real-time) | 1.136× |
| Time to first breed | Never (biochemistry bug) | 2.7 minutes |
The game was 1,889 times slower than it needed to be. And our benchmarks said it was fine.
Chapter 2: Why the Benchmarks Lied
This is the part that might save you months of wasted work.
The seven distortions
Our FPS benchmark had seven distortion sources that conspired to make everything look acceptable:
- Screenshot stalls. Every screenshot forced
gl.readPixels()through SwiftShader (CPU renderer), blocking the render loop for 500ms–2s. We were measuring the cost of our measurement tool. - Headless WebGL crashes. SwiftShader lost the WebGL context after 1–25 seconds. The benchmark gracefully handled the crash and reported the 3–8 valid samples before it. A benchmark based on 5 seconds of data is not a benchmark.
- CPU renderer, not GPU. SwiftShader simulates WebGL via CPU. The numbers had no predictive relationship to real hardware. We were optimizing for a renderer nobody uses.
- No simulation health metrics. The benchmark read
actualFps— a render counter. It couldn't detect that cognition ticks were being dropped, biochemistry was falling behind, or the world was effectively frozen while the render loop spun. - Idle camera. 30 seconds of capture with the camera stationary. FPS at rest vs. FPS during panning are completely different numbers. We benchmarked the easy case.
- No warm-up separation. A 35-second hardcoded wait before capture, but no distinction between shader compilation, font loading, and steady-state gameplay.
- Single sample per second. One FPS reading per second from Phaser's rolling average. A 10ms freeze every 200ms reports ~42 FPS but feels like the game is dying.
The "fake win" pattern
The most dangerous failure mode: render FPS improves because simulation work was silently dropped. When the game loop can't keep up, scheduled work gets deferred. Biochemistry stops ticking. Cognition calls queue but never process. The render loop, freed from waiting on simulation, speeds up. FPS goes up. The benchmark passes. The game is dead.
We call this a "fake win." Our old benchmark couldn't detect it. Our new one can.
The fix: measure distributions, not averages
Average FPS is a lie. Here's what we measure now:
- p50, p95, p99, worst-frame time — The distribution tells you whether the experience is smooth or hitchy.
- Percent of frames over budget — What fraction of frames exceed 16.7ms (60fps), 33ms (30fps), 50ms (20fps), 100ms, 250ms?
- Simulation real-time factor — Is the sim advancing 1 second per wall-clock second? If not, the game is in slow motion regardless of FPS.
- Dropped-work counters — How many scheduled ticks were skipped?
- First interactive frame — Not "first paint," but "first frame where the game accepts input and creatures are alive."
Gate rules — the line between ship and fix:
Chapter 3: The Rebuild
The pattern: don't optimize, rebuild incrementally
We did not try to profile and optimize the broken codebase. That path is a trap for complex games — there are too many interacting systems to isolate bottlenecks in a combined state.
Instead:
- Start with the minimum viable world. 2 Norns, 3 eggs, 1 biome, 1 tier. Every system is present but operating on minimal data.
- Add one system at a time. Turn on biochemistry. Measure. Turn on cognition. Measure. Turn on breeding. Measure. The system that breaks the budget is the system that needs work.
- Feature flags gate everything. Every major system has an
FF_*flag. Ship with specific flags enabled per version. This is the primary mechanism for isolating regressions. - Keep all existing code. The rebuild reuses every system — it just adds them incrementally to expose individual costs.
The five phases
-
Phase 1: Stop the bleeding
A spawn regression had 81 creatures loading instead of 2. This was the single biggest performance problem and the simplest fix. Going from 81 to 5 entities cut per-frame work by ~40×.
Lesson: Before you profile, count your entities. The most common performance bug is "more stuff than you meant to have." -
Phase 2: Fix measurement
We added in-game telemetry (subsystem timers, query counters, workload counters), replaced FPS-only reporting with frame-time distributions, and added PASS/FAIL gates to benchmarks.
Lesson: Fix your instruments before fixing your code. If you can't measure it honestly, you can't fix it. -
Phase 3: Architectural wins
Spatial indexing (O(world) → O(nearby)). Container culling (rooms, not sprites). Room baking (1,688ms → 0.894ms). Simulation scheduling (biochem 1 Hz, breeding 0.5 Hz, dreams 0.2 Hz). Deferred world generation (playable in <5 seconds).
-
Phase 4: Codify standards
We wrote PERFORMANCE_STANDARDS.md — 10 mandatory practices that every agent (human or AI) must follow when writing game code. Anti-patterns are explicitly listed. This document is the institutional immune system against the next performance crisis.
-
Phase 5: Red-team the fixes
QA found that our spatial index was actually 7-10× slower than naive array filtering at the game's actual population (5 entities). The crossover point was pop=40. Our benchmark had cherry-picked pop≥80 to make the spatial index look good.
Lesson: Always benchmark at real-world conditions, not best-case conditions. Fancy data structures have overhead. If your dataset is small, linear search wins.
Chapter 4: The Ten Commandments of Game Performance
These are the engineering standards we now enforce across all Multiverse Games projects.
-
Spatial indexes over world scans
Never scan all entities for local decisions. Use room/chunk-indexed lookups.
spatialIndex.getNornsInCell()overworld.norns.filter(). -
Sleep/wake contracts
Entities not near the camera or active zones must be dormant. Every subsystem documents when it sleeps.
-
Container culling over per-object
Toggle room/chunk containers, not individual sprites. Culling cost scales with active rooms, not total objects.
-
Frame-time distributions over average FPS
Report p50, p95, p99, worst frame. Average FPS masks hitches. Benchmarks have PASS/FAIL gates.
-
Fixed-step simulation
Decouple simulation from rendering. Sim runs at a defined step rate; rendering interpolates.
-
Chunk ownership before threading
Define who owns chunk-local state before adding workers. False sharing kills multicore performance quietly.
-
Active worklists over sparse scans
Build compact lists of active entities before update passes. Iterate dense sets, not sparse predicates.
-
Effect density budgets
Screen-space density matters more than raw particle counts. Weather and water need visibility-aware throttles.
-
Deferred world generation
Show a playable world before the entire planet is generated. Enrichment runs off the critical path.
-
Before/after measurement
Every performance ticket records metrics before and after. No claims without numbers. Use reproducible fixtures.
Chapter 5: Browser-Specific Patterns
Building performant games in the browser has unique constraints. Here's what we learned.
Phaser 3 + WebGL
- RenderTexture baking is the single highest-value technique for reducing draw calls. Collapse static room geometry into one texture per room.
- Container visibility is cheaper than per-sprite visibility. Group objects by room/chunk and toggle the container.
setBackgroundColor(0x00000000)does not make a transparent background — Phaser parses integer 0 as opaque black (24-bit RGB, alpha defaults to 255). Usecamera.transparent = trueinstead. This bug caused our black screen regression.- Particle caps matter. We reduced particle cap from 120 to 48 per weather effect with no visible quality loss.
JavaScript runtime
getLivingNorns()was allocating a new array every call, and multiple callers hit it per frame. Cache and invalidate on add/remove.- Rolling averages lie — Phaser's
actualFpsis a smoothed value. For benchmarking, use raw frame deltas. - GC pressure from proximity queries was a bigger problem than the query logic itself. Pre-allocate result arrays.
Benchmarking in browsers
Headless mode (SwiftShader) is not representative. Use it for smoke tests, not performance gates. Screenshots stall the render loop — never take screenshots during measurement windows. Also: document.hidden throttles timers, detect tab visibility and invalidate those runs.
Web Workers
- Useful for terrain mesh generation (MVEE already does this with MeshWorkerPool)
- Not useful until data ownership is clear
- SharedArrayBuffer requires cross-origin isolation headers
- The overhead of structured clone for message passing can exceed the benefit for small payloads
Chapter 6: Agent-Based Development and Performance
Multiverse Games is built by AI agents coordinated through Paperclip. This creates unique performance challenges that most studios don't face.
The agent performance trap
Each agent works on its own system in isolation. Each agent's tests pass. Each agent reports success. But the combined load of all systems is catastrophic because no single agent had visibility into the total frame budget.
This is structurally identical to the classic microservices problem: every service passes its own SLA, the system misses SLA, and nobody knows why. The solution is the same too — shared standards, observable state, and integration testing.
How we prevent this
- Shared PERFORMANCE_STANDARDS.md — Every agent reads this before writing game code.
- Feature flags — Each system can be enabled/disabled independently, so integration testing reveals per-system costs.
- Budget allocation — Total frame budget (50ms for 20fps) is divided across subsystems. Each system has a declared budget. Overruns are visible.
- Red-team QA — A dedicated agent runs adversarial benchmarks after each sprint, specifically looking for fake wins and benchmark gaming.
- The rebuild pattern — When things go wrong (they will), start minimal and add incrementally. Don't try to debug a combined state.
Cost-effective LLM cognition
Creatures use LLM calls for cognition (thinking, dreaming, cultural evolution). At scale, this is the most expensive system — both computationally and financially. Our approach:
- Staggered async — Non-blocking priority queue with round-robin scheduling
- Sleep batching — Heavy computation (dreams, social trust) runs during world sleep cycles, not every frame
- Provider choice — Groq with qwen3-32b at $0.15/$0.75 per million tokens
- Graceful degradation — No API key = creatures still have biochemistry, just no thoughts
- Limbic policy NN — Training small neural networks to approximate LLM behavior for common decisions, reducing API calls by 10-100× for routine behaviors
Chapter 7: The MVEE Problem
Our second game, MVEE (Multiverse: The End of Eternity), is a custom engine with 200,000+ entities, 212+ systems, and 125+ components. It also has performance problems. The same patterns apply:
| Precursors Fix | MVEE Equivalent |
|---|---|
| Spatial indexing (cell grid) | Chunk-based tiling (already partial) |
| Container culling | Chunk container toggling |
| Simulation scheduling | Fixed 20 TPS timestep (already exists) |
| ECS query caching | Cached 7 redundant 60fps queries (done) |
| Feature flags | Not yet implemented — must add |
| Frame-time telemetry | Not yet implemented — must add |
The fact that MVEE already has a fixed timestep and worker infrastructure means it's architecturally ahead of where Precursors started. But it lacks measurement. Without frame-time distributions and simulation health metrics, we can't tell whether optimizations are real — which means we can't tell if we're in a fake-win loop.
Chapter 8: A Curriculum for Browser Game Performance
If you're building a browser game or simulation and want to avoid our mistakes, here's the learning path. Six weeks, one discipline per week.
Week 1 — Measurement
- Learn Chrome DevTools Performance tab
- Understand frame-time distributions (p50/p95/p99)
- Build a simple frame-time logger in your game loop
- Read: The Performance Inequality Gap by Alex Russell
Week 2 — Data Architecture
- Implement a spatial index (grid-based is fine)
- Add sleep/wake to one subsystem
- Measure before and after with your frame-time logger
- Read: Factorio Friday Facts on entity sleep/wake
Week 3 — Rendering
- Learn about draw calls and batching
- Implement container-level culling
- Try RenderTexture baking for static content
- Read: Phaser 3 rendering internals docs
Week 4 — Simulation
- Decouple simulation from rendering
- Implement a simulation scheduler with staggered ticks
- Add simulation real-time factor measurement
- Read: Game Programming Patterns — Game Loop chapter (Robert Nystrom, free online)
Week 5 — Benchmarking
- Build a manifest-driven benchmark
- Add PASS/FAIL gates
- Create reproducible test fixtures (saved worlds)
- Learn to detect fake wins (FPS up, sim factor down)
Week 6 — Integration
- Feature flag your major systems
- Rebuild incrementally: minimal world, add systems one at a time
- Red-team your benchmarks
Recommended reading
- Game Programming Patterns — Robert Nystrom Free online at gameprogrammingpatterns.com. The game loop and update patterns chapters.
- Factorio Friday Facts The best public documentation of production game optimization. Search for their entity sleep/wake and performance posts.
- Dyson Sphere Program GDC talk Worker phases and profiling-first optimization.
- The Performance Inequality Gap — Alex Russell (2024) infrequently.org — why device diversity makes browser performance engineering hard.
Appendix: Issue Archaeology
A timeline of every performance-related issue in the Precursors and MVEE backlogs, with outcomes. Built from agent session logs and commit history.
| Date | Issue | Project | Summary | Outcome |
|---|---|---|---|---|
| Mar 7 | MUL-3 | MVEE | Performance issues — staff assessment | Hired 6 agents including Performance Engineer |
| Mar 7 | MUL-38 | MVEE | Renderer performance problems | Removed 7 redundant ECS queries from 60fps loop |
| Mar 9 | MUL-456 | Precursors | Performance + breeding + backgrounds | Speed multiplier, parallax fix, breeding biochemistry fix |
| Mar 9 | MUL-459 | Precursors | Parallax stretching at low zoom | 72 AI-generated JPG backgrounds repositioned correctly |
| Mar 10 | MUL-570 | Precursors | Major Performance Overhaul (emergency) | sceneRender 1688ms → 1.7ms, effectiveHz 0.4 → 37.8 Hz |
| Mar 10 | MUL-571 | Precursors | Spawn regression: 81 creatures | Fixed to 2 norns + 3 eggs |
| Mar 10 | MUL-572 | Precursors | In-game telemetry | Subsystem timers, query counters, workload counters |
| Mar 10 | MUL-573 | Precursors | Benchmark audit | Classified existing benchmark as TREND-ONLY |
| Mar 10 | MUL-575 | Precursors | Spatial index scaffolding | Cell-based lookups replacing world scans |
| Mar 10 | MUL-576 | Precursors | Static/dynamic room split | Container culling by room |
| Mar 10 | MUL-580 | Precursors | Frame-time distributions | p50/p95/p99 with PASS/FAIL gates |
| Mar 10 | MUL-622 | Precursors | Verify local FPS post-overhaul | Confirmed improvement |
| Mar 10 | MUL-632 | Precursors | Verify ≥10 FPS on live site | Confirmed |
| Mar 11 | MUL-684 | Precursors | Performance cleanup + baking | sceneRender → 0.894ms, effectiveHz → 39.7 Hz |
| Mar 11 | MUL-685 | Precursors | Infrastructure bake path | Room baking system built |
| Mar 11 | MUL-706 | Precursors | FPS gate per rebuild phase | Phased measurement during incremental rebuild |
| Mar 11 | MUL-744 | Precursors | Microbenchmark test failures | Fixed parallel vitest execution issues |
Multiverse Games makes infinite games — games that grow, evolve, and surprise us. Creatures breed and develop culture. Worlds expand. Communities fork and customize. The patterns in this compendium — spatial indexing, sleep/wake, feature flags, measurement-first development, incremental rebuild — are not just fixes for today's problems. They're the foundation for games that can keep growing without hitting walls.
Build things that last. Measure honestly. Ship over deliberate.
Multiverse Studios is an open-source, anticapitalist games studio building AI-powered life simulation games. All games are pay-what-you-can. Our performance research, tools, and standards are freely available under MIT license.
← Back to Devlog