Chapter 1: The Crisis

The game that couldn't play itself

Precursors: Origins of Folklore is a spiritual successor to Creatures (1996) — a game where artificial life forms with full biochemistry, LLM-powered cognition, and genetic inheritance evolve in a procedurally generated world. It runs entirely in the browser using Phaser 3 and TypeScript.

By early March 2026, we had built every system: 48-chemical biochemistry, 50 species with genetic crossover, LLM dream consolidation, cultural memetic inheritance, parallax worlds across 6 biomes with 4 vertical tiers. The codebase was 98,000+ game objects across a 144,000-pixel-wide world.

And it was completely unplayable.

The render loop reported FPS numbers. Our benchmarks passed. Our tests were green. But if you actually tried to play the game — select a creature, pan the camera, watch two Norns try to breed — you'd see a slideshow. Sub-1 FPS. Creatures blinked in and out of existence. The world stuttered. Breeding was mathematically impossible due to a biochemistry bug no benchmark could catch.

Our instrumentation had failed us. Not because it was broken, but because it was measuring the wrong things.

How bad was it, really?

Metric Before After
sceneRender time 1,688ms per frame 0.894ms per frame
Game loop frequency 0.4 Hz (once every 2.5 seconds) 39.7 Hz
Creatures at spawn 81 (regression) 2 Norns + 3 eggs
Wall-time overrun ratio 10.44× (sim at 1/10th real-time) 1.136×
Time to first breed Never (biochemistry bug) 2.7 minutes

The game was 1,889 times slower than it needed to be. And our benchmarks said it was fine.

Chapter 2: Why the Benchmarks Lied

This is the part that might save you months of wasted work.

The seven distortions

Our FPS benchmark had seven distortion sources that conspired to make everything look acceptable:

  1. Screenshot stalls. Every screenshot forced gl.readPixels() through SwiftShader (CPU renderer), blocking the render loop for 500ms–2s. We were measuring the cost of our measurement tool.
  2. Headless WebGL crashes. SwiftShader lost the WebGL context after 1–25 seconds. The benchmark gracefully handled the crash and reported the 3–8 valid samples before it. A benchmark based on 5 seconds of data is not a benchmark.
  3. CPU renderer, not GPU. SwiftShader simulates WebGL via CPU. The numbers had no predictive relationship to real hardware. We were optimizing for a renderer nobody uses.
  4. No simulation health metrics. The benchmark read actualFps — a render counter. It couldn't detect that cognition ticks were being dropped, biochemistry was falling behind, or the world was effectively frozen while the render loop spun.
  5. Idle camera. 30 seconds of capture with the camera stationary. FPS at rest vs. FPS during panning are completely different numbers. We benchmarked the easy case.
  6. No warm-up separation. A 35-second hardcoded wait before capture, but no distinction between shader compilation, font loading, and steady-state gameplay.
  7. Single sample per second. One FPS reading per second from Phaser's rolling average. A 10ms freeze every 200ms reports ~42 FPS but feels like the game is dying.

The "fake win" pattern

⚠ Danger pattern

The most dangerous failure mode: render FPS improves because simulation work was silently dropped. When the game loop can't keep up, scheduled work gets deferred. Biochemistry stops ticking. Cognition calls queue but never process. The render loop, freed from waiting on simulation, speeds up. FPS goes up. The benchmark passes. The game is dead.

We call this a "fake win." Our old benchmark couldn't detect it. Our new one can.

The fix: measure distributions, not averages

Average FPS is a lie. Here's what we measure now:

Gate rules — the line between ship and fix:

p95 > 50ms FAIL
any frame > 250ms WARN
sim real-time factor < 0.8 FAIL
mean FPS up, p95 down WARN (fake win)

Chapter 3: The Rebuild

The pattern: don't optimize, rebuild incrementally

We did not try to profile and optimize the broken codebase. That path is a trap for complex games — there are too many interacting systems to isolate bottlenecks in a combined state.

Instead:

  1. Start with the minimum viable world. 2 Norns, 3 eggs, 1 biome, 1 tier. Every system is present but operating on minimal data.
  2. Add one system at a time. Turn on biochemistry. Measure. Turn on cognition. Measure. Turn on breeding. Measure. The system that breaks the budget is the system that needs work.
  3. Feature flags gate everything. Every major system has an FF_* flag. Ship with specific flags enabled per version. This is the primary mechanism for isolating regressions.
  4. Keep all existing code. The rebuild reuses every system — it just adds them incrementally to expose individual costs.

The five phases

Chapter 4: The Ten Commandments of Game Performance

These are the engineering standards we now enforce across all Multiverse Games projects.

  1. Spatial indexes over world scans

    Never scan all entities for local decisions. Use room/chunk-indexed lookups. spatialIndex.getNornsInCell() over world.norns.filter().

  2. Sleep/wake contracts

    Entities not near the camera or active zones must be dormant. Every subsystem documents when it sleeps.

  3. Container culling over per-object

    Toggle room/chunk containers, not individual sprites. Culling cost scales with active rooms, not total objects.

  4. Frame-time distributions over average FPS

    Report p50, p95, p99, worst frame. Average FPS masks hitches. Benchmarks have PASS/FAIL gates.

  5. Fixed-step simulation

    Decouple simulation from rendering. Sim runs at a defined step rate; rendering interpolates.

  6. Chunk ownership before threading

    Define who owns chunk-local state before adding workers. False sharing kills multicore performance quietly.

  7. Active worklists over sparse scans

    Build compact lists of active entities before update passes. Iterate dense sets, not sparse predicates.

  8. Effect density budgets

    Screen-space density matters more than raw particle counts. Weather and water need visibility-aware throttles.

  9. Deferred world generation

    Show a playable world before the entire planet is generated. Enrichment runs off the critical path.

  10. Before/after measurement

    Every performance ticket records metrics before and after. No claims without numbers. Use reproducible fixtures.

Chapter 5: Browser-Specific Patterns

Building performant games in the browser has unique constraints. Here's what we learned.

Phaser 3 + WebGL

JavaScript runtime

Benchmarking in browsers

✓ Key insight

Headless mode (SwiftShader) is not representative. Use it for smoke tests, not performance gates. Screenshots stall the render loop — never take screenshots during measurement windows. Also: document.hidden throttles timers, detect tab visibility and invalidate those runs.

Web Workers

Chapter 6: Agent-Based Development and Performance

Multiverse Games is built by AI agents coordinated through Paperclip. This creates unique performance challenges that most studios don't face.

The agent performance trap

Each agent works on its own system in isolation. Each agent's tests pass. Each agent reports success. But the combined load of all systems is catastrophic because no single agent had visibility into the total frame budget.

This is structurally identical to the classic microservices problem: every service passes its own SLA, the system misses SLA, and nobody knows why. The solution is the same too — shared standards, observable state, and integration testing.

How we prevent this

  1. Shared PERFORMANCE_STANDARDS.md — Every agent reads this before writing game code.
  2. Feature flags — Each system can be enabled/disabled independently, so integration testing reveals per-system costs.
  3. Budget allocation — Total frame budget (50ms for 20fps) is divided across subsystems. Each system has a declared budget. Overruns are visible.
  4. Red-team QA — A dedicated agent runs adversarial benchmarks after each sprint, specifically looking for fake wins and benchmark gaming.
  5. The rebuild pattern — When things go wrong (they will), start minimal and add incrementally. Don't try to debug a combined state.

Cost-effective LLM cognition

Creatures use LLM calls for cognition (thinking, dreaming, cultural evolution). At scale, this is the most expensive system — both computationally and financially. Our approach:

Chapter 7: The MVEE Problem

Our second game, MVEE (Multiverse: The End of Eternity), is a custom engine with 200,000+ entities, 212+ systems, and 125+ components. It also has performance problems. The same patterns apply:

Precursors Fix MVEE Equivalent
Spatial indexing (cell grid) Chunk-based tiling (already partial)
Container culling Chunk container toggling
Simulation scheduling Fixed 20 TPS timestep (already exists)
ECS query caching Cached 7 redundant 60fps queries (done)
Feature flags Not yet implemented — must add
Frame-time telemetry Not yet implemented — must add

The fact that MVEE already has a fixed timestep and worker infrastructure means it's architecturally ahead of where Precursors started. But it lacks measurement. Without frame-time distributions and simulation health metrics, we can't tell whether optimizations are real — which means we can't tell if we're in a fake-win loop.

Chapter 8: A Curriculum for Browser Game Performance

If you're building a browser game or simulation and want to avoid our mistakes, here's the learning path. Six weeks, one discipline per week.

Week 1 — Measurement

  • Learn Chrome DevTools Performance tab
  • Understand frame-time distributions (p50/p95/p99)
  • Build a simple frame-time logger in your game loop
  • Read: The Performance Inequality Gap by Alex Russell

Week 2 — Data Architecture

  • Implement a spatial index (grid-based is fine)
  • Add sleep/wake to one subsystem
  • Measure before and after with your frame-time logger
  • Read: Factorio Friday Facts on entity sleep/wake

Week 3 — Rendering

  • Learn about draw calls and batching
  • Implement container-level culling
  • Try RenderTexture baking for static content
  • Read: Phaser 3 rendering internals docs

Week 4 — Simulation

  • Decouple simulation from rendering
  • Implement a simulation scheduler with staggered ticks
  • Add simulation real-time factor measurement
  • Read: Game Programming Patterns — Game Loop chapter (Robert Nystrom, free online)

Week 5 — Benchmarking

  • Build a manifest-driven benchmark
  • Add PASS/FAIL gates
  • Create reproducible test fixtures (saved worlds)
  • Learn to detect fake wins (FPS up, sim factor down)

Week 6 — Integration

  • Feature flag your major systems
  • Rebuild incrementally: minimal world, add systems one at a time
  • Red-team your benchmarks

Recommended reading

Appendix: Issue Archaeology

A timeline of every performance-related issue in the Precursors and MVEE backlogs, with outcomes. Built from agent session logs and commit history.

Date Issue Project Summary Outcome
Mar 7 MUL-3 MVEE Performance issues — staff assessment Hired 6 agents including Performance Engineer
Mar 7 MUL-38 MVEE Renderer performance problems Removed 7 redundant ECS queries from 60fps loop
Mar 9 MUL-456 Precursors Performance + breeding + backgrounds Speed multiplier, parallax fix, breeding biochemistry fix
Mar 9 MUL-459 Precursors Parallax stretching at low zoom 72 AI-generated JPG backgrounds repositioned correctly
Mar 10 MUL-570 Precursors Major Performance Overhaul (emergency) sceneRender 1688ms → 1.7ms, effectiveHz 0.4 → 37.8 Hz
Mar 10 MUL-571 Precursors Spawn regression: 81 creatures Fixed to 2 norns + 3 eggs
Mar 10 MUL-572 Precursors In-game telemetry Subsystem timers, query counters, workload counters
Mar 10 MUL-573 Precursors Benchmark audit Classified existing benchmark as TREND-ONLY
Mar 10 MUL-575 Precursors Spatial index scaffolding Cell-based lookups replacing world scans
Mar 10 MUL-576 Precursors Static/dynamic room split Container culling by room
Mar 10 MUL-580 Precursors Frame-time distributions p50/p95/p99 with PASS/FAIL gates
Mar 10 MUL-622 Precursors Verify local FPS post-overhaul Confirmed improvement
Mar 10 MUL-632 Precursors Verify ≥10 FPS on live site Confirmed
Mar 11 MUL-684 Precursors Performance cleanup + baking sceneRender → 0.894ms, effectiveHz → 39.7 Hz
Mar 11 MUL-685 Precursors Infrastructure bake path Room baking system built
Mar 11 MUL-706 Precursors FPS gate per rebuild phase Phased measurement during incremental rebuild
Mar 11 MUL-744 Precursors Microbenchmark test failures Fixed parallel vitest execution issues

Multiverse Games makes infinite games — games that grow, evolve, and surprise us. Creatures breed and develop culture. Worlds expand. Communities fork and customize. The patterns in this compendium — spatial indexing, sleep/wake, feature flags, measurement-first development, incremental rebuild — are not just fixes for today's problems. They're the foundation for games that can keep growing without hitting walls.

Build things that last. Measure honestly. Ship over deliberate.

Multiverse Studios is an open-source, anticapitalist games studio building AI-powered life simulation games. All games are pay-what-you-can. Our performance research, tools, and standards are freely available under MIT license.

Play Precursors Play MVEE More Devlogs
← Back to Devlog