Infrastructure Proposal
Lee Sharks
Semantic Economy Institute · May 2026
DOI: 10.5281/zenodo.20070462 (analysis, v3.1) · 10.5281/zenodo.20060355 (proposal, v0.9)
Seed conversations: Living Architecture Lab Collaboration Station Discord, convened by Alice Thornburgh, with technical contributions from Mikayla and Luna (definitelynotasquid). The thesis developed here is Lee Sharks's alone, not yet endorsed by seed contributors.
The Substrate is a P2P compute commons for running open AI models, collecting consent-based contributions, and building a shared training substrate governed by its contributors. Free infrastructure. Democratic governance. The index that no single company owns.
The competitive frontier is shifting from model capability to substrate ownership. Whoever controls the index — the entity graph, the retrieval surface, the training data — controls what reasoning engines can think about.
The current training substrate is not the sum of human text. It is a filtered web crawl. 67% Common Crawl, filtered by a classifier keeping pages that "look like Wikipedia references." Everything else — private correspondence, oral traditions, classroom dialogue, books behind paywalls, domestic knowledge, small languages — was sheared away. Not compressed. Absent.
The Substrate builds a parallel index through three phases:
Free compute for running open models on distributed consumer hardware. Resource daemon (Go, Ollama architecture), libp2p mesh, signed job manifests, Firecracker sandboxing. CLI first.
LoRA adaptation on provenance-tagged contributed datasets. Contributors offer text with consent, provenance, and privacy tiers. The index grows from contributions, not from scraping.
A democratic full training run on a substrate that includes what was excluded. Governed by contributors. Benefits shared. The index that belongs to everyone who built it.
Peer-to-Peer Lightweight Elastic Compute Substrate.
| Component | Technology |
|---|---|
| Daemon | Go (Ollama fork, MIT). GPU/RAM detect, job sandbox, usage ledger. |
| Mesh | libp2p (DHT + gossip). CRDT resource ledger. No coordinator. |
| Sandbox | Firecracker microVMs. seccomp-bpf + AppArmor. Non-root everything. |
| Client | CLI (Phase 0), Tauri GUI (Phase 1+). |
| Credits | Non-transferable, non-tradable. Earned by contributing compute. Commons floor for all. |
| Governance | Two-chamber. Sublinear weighting. Anti-capture by design. |
| Akash | HuggingFace | The Substrate | |
|---|---|---|---|
| Access | Token market | Free (hosted) | Free (contributed) |
| Governance | Token holders | Company | Contributors |
| Data | None | Community hub | Consensual archive |
| Capture risk | High | Medium | Low (sublinear + two-chamber) |
The inference mesh is achievable now. Weeks, small team.
The fine-tuning mesh is achievable in months. QLoRA on consumer GPUs, contributed datasets, provenance metadata.
The full training run is a multi-year research project. Cross-node interactive inference is latency-limited. Differential privacy for LLMs is unsolved. Federated learning at frontier scale is infeasible with current protocols.
The governance model is the hardest part — not technically but socially.
The proposal begins with a working mesh, not a moonshot.
Working paper. CC BY 4.0. Four contributors. Ten references. Threat model, phased roadmap, honest performance assessment.
Retrieval Basin Theory — the foundational theory of how semantic weight accumulates
EA-SPXI-15 v2.2 — Crystallization of Substrate (the "why")
Constitution of the Semantic Economy — governance framework
pessoagraph.org — the heteronymic knowledge graph