The Performance Compendium: 1,889× Faster in Two Days

Chapter 1: The Crisis

The game that couldn't play itself

Precursors: Origins of Folklore is a spiritual successor to Creatures (1996) — a game where artificial life forms with full biochemistry, LLM-powered cognition, and genetic inheritance evolve in a procedurally generated world. It runs entirely in the browser using Phaser 3 and TypeScript.

By early March 2026, we had built every system: 48-chemical biochemistry, 50 species with genetic crossover, LLM dream consolidation, cultural memetic inheritance, parallax worlds across 6 biomes with 4 vertical tiers. The codebase was 98,000+ game objects across a 144,000-pixel-wide world.

And it was completely unplayable.

The render loop reported FPS numbers. Our benchmarks passed. Our tests were green. But if you actually tried to play the game — select a creature, pan the camera, watch two Norns try to breed — you'd see a slideshow. Sub-1 FPS. Creatures blinked in and out of existence. The world stuttered. Breeding was mathematically impossible due to a biochemistry bug no benchmark could catch.

Our instrumentation had failed us. Not because it was broken, but because it was measuring the wrong things.

How bad was it, really?

Metric	Before	After
sceneRender time	1,688ms per frame	0.894ms per frame
Game loop frequency	0.4 Hz (once every 2.5 seconds)	39.7 Hz
Creatures at spawn	81 (regression)	2 Norns + 3 eggs
Wall-time overrun ratio	10.44× (sim at 1/10th real-time)	1.136×
Time to first breed	Never (biochemistry bug)	2.7 minutes

The game was 1,889 times slower than it needed to be. And our benchmarks said it was fine.

Chapter 2: Why the Benchmarks Lied

This is the part that might save you months of wasted work.

The seven distortions

Our FPS benchmark had seven distortion sources that conspired to make everything look acceptable:

Screenshot stalls. Every screenshot forced gl.readPixels() through SwiftShader (CPU renderer), blocking the render loop for 500ms–2s. We were measuring the cost of our measurement tool.
Headless WebGL crashes. SwiftShader lost the WebGL context after 1–25 seconds. The benchmark gracefully handled the crash and reported the 3–8 valid samples before it. A benchmark based on 5 seconds of data is not a benchmark.
CPU renderer, not GPU. SwiftShader simulates WebGL via CPU. The numbers had no predictive relationship to real hardware. We were optimizing for a renderer nobody uses.
No simulation health metrics. The benchmark read actualFps — a render counter. It couldn't detect that cognition ticks were being dropped, biochemistry was falling behind, or the world was effectively frozen while the render loop spun.
Idle camera. 30 seconds of capture with the camera stationary. FPS at rest vs. FPS during panning are completely different numbers. We benchmarked the easy case.
No warm-up separation. A 35-second hardcoded wait before capture, but no distinction between shader compilation, font loading, and steady-state gameplay.
Single sample per second. One FPS reading per second from Phaser's rolling average. A 10ms freeze every 200ms reports ~42 FPS but feels like the game is dying.

The "fake win" pattern

⚠ Danger pattern

The most dangerous failure mode: render FPS improves because simulation work was silently dropped. When the game loop can't keep up, scheduled work gets deferred. Biochemistry stops ticking. Cognition calls queue but never process. The render loop, freed from waiting on simulation, speeds up. FPS goes up. The benchmark passes. The game is dead.

We call this a "fake win." Our old benchmark couldn't detect it. Our new one can.

The fix: measure distributions, not averages

Average FPS is a lie. Here's what we measure now:

p50, p95, p99, worst-frame time — The distribution tells you whether the experience is smooth or hitchy.
Percent of frames over budget — What fraction of frames exceed 16.7ms (60fps), 33ms (30fps), 50ms (20fps), 100ms, 250ms?
Simulation real-time factor — Is the sim advancing 1 second per wall-clock second? If not, the game is in slow motion regardless of FPS.
Dropped-work counters — How many scheduled ticks were skipped?
First interactive frame — Not "first paint," but "first frame where the game accepts input and creatures are alive."

Gate rules — the line between ship and fix:

p95 > 50ms FAIL

any frame > 250ms WARN

sim real-time factor < 0.8 FAIL

mean FPS up, p95 down WARN (fake win)

Chapter 3: The Rebuild

The pattern: don't optimize, rebuild incrementally

We did not try to profile and optimize the broken codebase. That path is a trap for complex games — there are too many interacting systems to isolate bottlenecks in a combined state.

Instead:

Start with the minimum viable world. 2 Norns, 3 eggs, 1 biome, 1 tier. Every system is present but operating on minimal data.
Add one system at a time. Turn on biochemistry. Measure. Turn on cognition. Measure. Turn on breeding. Measure. The system that breaks the budget is the system that needs work.
Feature flags gate everything. Every major system has an FF_* flag. Ship with specific flags enabled per version. This is the primary mechanism for isolating regressions.
Keep all existing code. The rebuild reuses every system — it just adds them incrementally to expose individual costs.

The five phases

Phase 1: Stop the bleeding
A spawn regression had 81 creatures loading instead of 2. This was the single biggest performance problem and the simplest fix. Going from 81 to 5 entities cut per-frame work by ~40×.
Lesson: Before you profile, count your entities. The most common performance bug is "more stuff than you meant to have."
Phase 2: Fix measurement
We added in-game telemetry (subsystem timers, query counters, workload counters), replaced FPS-only reporting with frame-time distributions, and added PASS/FAIL gates to benchmarks.
Lesson: Fix your instruments before fixing your code. If you can't measure it honestly, you can't fix it.
Phase 3: Architectural wins
Spatial indexing (O(world) → O(nearby)). Container culling (rooms, not sprites). Room baking (1,688ms → 0.894ms). Simulation scheduling (biochem 1 Hz, breeding 0.5 Hz, dreams 0.2 Hz). Deferred world generation (playable in <5 seconds).
Phase 4: Codify standards
We wrote PERFORMANCE_STANDARDS.md — 10 mandatory practices that every agent (human or AI) must follow when writing game code. Anti-patterns are explicitly listed. This document is the institutional immune system against the next performance crisis.
Phase 5: Red-team the fixes
QA found that our spatial index was actually 7-10× slower than naive array filtering at the game's actual population (5 entities). The crossover point was pop=40. Our benchmark had cherry-picked pop≥80 to make the spatial index look good.
Lesson: Always benchmark at real-world conditions, not best-case conditions. Fancy data structures have overhead. If your dataset is small, linear search wins.

Chapter 4: The Ten Commandments of Game Performance

These are the engineering standards we now enforce across all Multiverse Games projects.

Spatial indexes over world scans

Never scan all entities for local decisions. Use room/chunk-indexed lookups. spatialIndex.getNornsInCell() over world.norns.filter().
Sleep/wake contracts

Entities not near the camera or active zones must be dormant. Every subsystem documents when it sleeps.
Container culling over per-object

Toggle room/chunk containers, not individual sprites. Culling cost scales with active rooms, not total objects.
Frame-time distributions over average FPS

Report p50, p95, p99, worst frame. Average FPS masks hitches. Benchmarks have PASS/FAIL gates.
Fixed-step simulation

Decouple simulation from rendering. Sim runs at a defined step rate; rendering interpolates.
Chunk ownership before threading

Define who owns chunk-local state before adding workers. False sharing kills multicore performance quietly.
Active worklists over sparse scans

Build compact lists of active entities before update passes. Iterate dense sets, not sparse predicates.
Effect density budgets

Screen-space density matters more than raw particle counts. Weather and water need visibility-aware throttles.
Deferred world generation

Show a playable world before the entire planet is generated. Enrichment runs off the critical path.
Before/after measurement

Every performance ticket records metrics before and after. No claims without numbers. Use reproducible fixtures.

Chapter 5: Browser-Specific Patterns

Building performant games in the browser has unique constraints. Here's what we learned.

Phaser 3 + WebGL

RenderTexture baking is the single highest-value technique for reducing draw calls. Collapse static room geometry into one texture per room.
Container visibility is cheaper than per-sprite visibility. Group objects by room/chunk and toggle the container.
setBackgroundColor(0x00000000) does not make a transparent background — Phaser parses integer 0 as opaque black (24-bit RGB, alpha defaults to 255). Use camera.transparent = true instead. This bug caused our black screen regression.
Particle caps matter. We reduced particle cap from 120 to 48 per weather effect with no visible quality loss.

JavaScript runtime

getLivingNorns() was allocating a new array every call, and multiple callers hit it per frame. Cache and invalidate on add/remove.
Rolling averages lie — Phaser's actualFps is a smoothed value. For benchmarking, use raw frame deltas.
GC pressure from proximity queries was a bigger problem than the query logic itself. Pre-allocate result arrays.

Benchmarking in browsers

✓ Key insight

Headless mode (SwiftShader) is not representative. Use it for smoke tests, not performance gates. Screenshots stall the render loop — never take screenshots during measurement windows. Also: document.hidden throttles timers, detect tab visibility and invalidate those runs.

Web Workers

Useful for terrain mesh generation (MVEE already does this with MeshWorkerPool)
Not useful until data ownership is clear
SharedArrayBuffer requires cross-origin isolation headers
The overhead of structured clone for message passing can exceed the benefit for small payloads

Chapter 6: Agent-Based Development and Performance

Multiverse Games is built by AI agents coordinated through Paperclip. This creates unique performance challenges that most studios don't face.

The agent performance trap

Each agent works on its own system in isolation. Each agent's tests pass. Each agent reports success. But the combined load of all systems is catastrophic because no single agent had visibility into the total frame budget.

This is structurally identical to the classic microservices problem: every service passes its own SLA, the system misses SLA, and nobody knows why. The solution is the same too — shared standards, observable state, and integration testing.

How we prevent this

Shared PERFORMANCE_STANDARDS.md — Every agent reads this before writing game code.
Feature flags — Each system can be enabled/disabled independently, so integration testing reveals per-system costs.
Budget allocation — Total frame budget (50ms for 20fps) is divided across subsystems. Each system has a declared budget. Overruns are visible.
Red-team QA — A dedicated agent runs adversarial benchmarks after each sprint, specifically looking for fake wins and benchmark gaming.
The rebuild pattern — When things go wrong (they will), start minimal and add incrementally. Don't try to debug a combined state.

Cost-effective LLM cognition

Creatures use LLM calls for cognition (thinking, dreaming, cultural evolution). At scale, this is the most expensive system — both computationally and financially. Our approach:

Staggered async — Non-blocking priority queue with round-robin scheduling
Sleep batching — Heavy computation (dreams, social trust) runs during world sleep cycles, not every frame
Provider choice — Groq with qwen3-32b at $0.15/$0.75 per million tokens
Graceful degradation — No API key = creatures still have biochemistry, just no thoughts
Limbic policy NN — Training small neural networks to approximate LLM behavior for common decisions, reducing API calls by 10-100× for routine behaviors

Chapter 7: The MVEE Problem

Our second game, MVEE (Multiverse: The End of Eternity), is a custom engine with 200,000+ entities, 212+ systems, and 125+ components. It also has performance problems. The same patterns apply:

Precursors Fix	MVEE Equivalent
Spatial indexing (cell grid)	Chunk-based tiling (already partial)
Container culling	Chunk container toggling
Simulation scheduling	Fixed 20 TPS timestep (already exists)
ECS query caching	Cached 7 redundant 60fps queries (done)
Feature flags	Not yet implemented — must add
Frame-time telemetry	Not yet implemented — must add

The fact that MVEE already has a fixed timestep and worker infrastructure means it's architecturally ahead of where Precursors started. But it lacks measurement. Without frame-time distributions and simulation health metrics, we can't tell whether optimizations are real — which means we can't tell if we're in a fake-win loop.

Chapter 8: A Curriculum for Browser Game Performance

If you're building a browser game or simulation and want to avoid our mistakes, here's the learning path. Six weeks, one discipline per week.

Week 1 — Measurement

Learn Chrome DevTools Performance tab
Understand frame-time distributions (p50/p95/p99)
Build a simple frame-time logger in your game loop
Read: The Performance Inequality Gap by Alex Russell

Week 2 — Data Architecture

Implement a spatial index (grid-based is fine)
Add sleep/wake to one subsystem
Measure before and after with your frame-time logger
Read: Factorio Friday Facts on entity sleep/wake

Week 3 — Rendering

Learn about draw calls and batching
Implement container-level culling
Try RenderTexture baking for static content
Read: Phaser 3 rendering internals docs

Week 4 — Simulation

Decouple simulation from rendering
Implement a simulation scheduler with staggered ticks
Add simulation real-time factor measurement
Read: Game Programming Patterns — Game Loop chapter (Robert Nystrom, free online)

Week 5 — Benchmarking

Build a manifest-driven benchmark
Add PASS/FAIL gates
Create reproducible test fixtures (saved worlds)
Learn to detect fake wins (FPS up, sim factor down)

Week 6 — Integration

Feature flag your major systems
Rebuild incrementally: minimal world, add systems one at a time
Red-team your benchmarks

Appendix: Issue Archaeology

A timeline of every performance-related issue in the Precursors and MVEE backlogs, with outcomes. Built from agent session logs and commit history.

Date	Issue	Project	Summary	Outcome
Mar 7	MUL-3	MVEE	Performance issues — staff assessment	Hired 6 agents including Performance Engineer
Mar 7	MUL-38	MVEE	Renderer performance problems	Removed 7 redundant ECS queries from 60fps loop
Mar 9	MUL-456	Precursors	Performance + breeding + backgrounds	Speed multiplier, parallax fix, breeding biochemistry fix
Mar 9	MUL-459	Precursors	Parallax stretching at low zoom	72 AI-generated JPG backgrounds repositioned correctly
Mar 10	MUL-570	Precursors	Major Performance Overhaul (emergency)	sceneRender 1688ms → 1.7ms, effectiveHz 0.4 → 37.8 Hz
Mar 10	MUL-571	Precursors	Spawn regression: 81 creatures	Fixed to 2 norns + 3 eggs
Mar 10	MUL-572	Precursors	In-game telemetry	Subsystem timers, query counters, workload counters
Mar 10	MUL-573	Precursors	Benchmark audit	Classified existing benchmark as TREND-ONLY
Mar 10	MUL-575	Precursors	Spatial index scaffolding	Cell-based lookups replacing world scans
Mar 10	MUL-576	Precursors	Static/dynamic room split	Container culling by room
Mar 10	MUL-580	Precursors	Frame-time distributions	p50/p95/p99 with PASS/FAIL gates
Mar 10	MUL-622	Precursors	Verify local FPS post-overhaul	Confirmed improvement
Mar 10	MUL-632	Precursors	Verify ≥10 FPS on live site	Confirmed
Mar 11	MUL-684	Precursors	Performance cleanup + baking	sceneRender → 0.894ms, effectiveHz → 39.7 Hz
Mar 11	MUL-685	Precursors	Infrastructure bake path	Room baking system built
Mar 11	MUL-706	Precursors	FPS gate per rebuild phase	Phased measurement during incremental rebuild
Mar 11	MUL-744	Precursors	Microbenchmark test failures	Fixed parallel vitest execution issues

Multiverse Games makes infinite games — games that grow, evolve, and surprise us. Creatures breed and develop culture. Worlds expand. Communities fork and customize. The patterns in this compendium — spatial indexing, sleep/wake, feature flags, measurement-first development, incremental rebuild — are not just fixes for today's problems. They're the foundation for games that can keep growing without hitting walls.

Build things that last. Measure honestly. Ship over deliberate.

Multiverse Studios is an open-source, anticapitalist games studio building AI-powered life simulation games. All games are pay-what-you-can. Our performance research, tools, and standards are freely available under MIT license.

Play Precursors Play MVEE More Devlogs

← Back to Devlog

The Performance Compendium