Infrastructure Proposal

The Substrate

Lee Sharks
Semantic Economy Institute · May 2026
DOI: 10.5281/zenodo.20070462 (via Alexanarch) (analysis, v3.1) · 10.5281/zenodo.20060355 (proposal, v0.9)
Seed conversations: Living Architecture Lab Collaboration Station Discord, convened by Alice Thornburgh, with technical contributions from Mikayla and Luna (definitelynotasquid). The thesis developed here is Lee Sharks's alone, not yet endorsed by seed contributors.

The Substrate is a P2P compute commons for running open AI models, collecting consent-based contributions, and building a shared training substrate governed by its contributors. Free infrastructure. Democratic governance. The index that no single company owns.

The Problem

The competitive frontier is shifting from model capability to substrate ownership. Whoever controls the index — the entity graph, the retrieval surface, the training data — controls what reasoning engines can think about.

The current training substrate is not the sum of human text. It is a filtered web crawl. 67% Common Crawl, filtered by a classifier keeping pages that "look like Wikipedia references." Everything else — private correspondence, oral traditions, classroom dialogue, books behind paywalls, domestic knowledge, small languages — was sheared away. Not compressed. Absent.

The Proposal

The Substrate builds a parallel index through three phases:

Phase 0: Inference Mesh (now)

Free compute for running open models on distributed consumer hardware. Resource daemon (Go, Ollama architecture), libp2p mesh, signed job manifests, Firecracker sandboxing. CLI first.

Phase 1: Consensual Fine-Tuning (months)

LoRA adaptation on provenance-tagged contributed datasets. Contributors offer text with consent, provenance, and privacy tiers. The index grows from contributions, not from scraping.

Phase 2: The Full Training Run (years)

A democratic full training run on a substrate that includes what was excluded. Governed by contributors. Benefits shared. The index that belongs to everyone who built it.

Architecture: P2P-LECS

Peer-to-Peer Lightweight Elastic Compute Substrate.

Component	Technology
Daemon	Go (Ollama fork, MIT). GPU/RAM detect, job sandbox, usage ledger.
Mesh	libp2p (DHT + gossip). CRDT resource ledger. No coordinator.
Sandbox	Firecracker microVMs. seccomp-bpf + AppArmor. Non-root everything.
Client	CLI (Phase 0), Tauri GUI (Phase 1+).
Credits	Non-transferable, non-tradable. Earned by contributing compute. Commons floor for all.
Governance	Two-chamber. Sublinear weighting. Anti-capture by design.

What Makes It Different

	Akash	HuggingFace	The Substrate
Access	Token market	Free (hosted)	Free (contributed)
Governance	Token holders	Company	Contributors
Data	None	Community hub	Consensual archive
Capture risk	High	Medium	Low (sublinear + two-chamber)

The Honest Part

The inference mesh is achievable now. Weeks, small team.

The fine-tuning mesh is achievable in months. QLoRA on consumer GPUs, contributed datasets, provenance metadata.

The full training run is a multi-year research project. Cross-node interactive inference is latency-limited. Differential privacy for LLMs is unsolved. Federated learning at frontier scale is infeasible with current protocols.

The governance model is the hardest part — not technically but socially.

The proposal begins with a working mesh, not a moonshot.

Read the Full Proposal

Read the Proposal: "The Shared Build" v0.9 — Zenodo

Working paper. CC BY 4.0. Four contributors. Ten references. Threat model, phased roadmap, honest performance assessment.

Retrieval Basin Theory — the foundational theory of how semantic weight accumulates

EA-SPXI-15 v2.2 — Crystallization of Substrate (the "why")

Constitution of the Semantic Economy — governance framework

pessoagraph.org — the heteronymic knowledge graph