How we handle Altium files.
Our pipeline today, the altium_monkey reference, and the trade-offs.
How Circuit.ly parses Altium's undocumented binary files right now — set side by side with wavenumber-eng/altium_monkey, a deterministic toolkit that solved the same problem — plus the learnings worth carrying across every format we ingest, KiCad included.
We and altium_monkey reach the same core conclusion: own the format in deterministic code and keep the LLM out of the parse path. Where we diverge is what sits on top — Circuit.ly feeds a live collaborative editor; altium_monkey exposes a versioned JSON contract. This note maps both, honestly, with pros and cons.
.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib — native, no conversion.
The approach
Altium ships no public spec for its files. The only honest way to handle them is to reverse-engineer the bytes and encode that knowledge in deterministic code — never guesswork, never an LLM in the parse path.
altium_monkey does exactly this: a hand-written OLE reader/writer, per-record parsers, and a central serializer that all behave identically on every run. The hard, brittle, vendor-specific work is locked inside the library.
On top of the messy internals sits a clean, versioned contract. Consumers — CI checks, our UI, agents — read that, never the raw streams. The docs are deliberately written to be "easy for LLM tooling to parse."
- Deterministic core + contract surface = the pattern we copy for Altium.
- It generalizes: KiCad is easier to read but the same discipline (own it, version it, expose a contract) applies.
- One internal model, many source formats.
Reference pipeline: altium_monkey end to end
The whole pipeline, end to end. An .SchDoc/.PcbDoc is an OLE2 compound file; a custom reader cracks it into streams; per-class parsers turn streams into a typed object model; from there one path round-trips back to byte-identical Altium, another projects a versioned JSON contract, a third renders enriched SVG. No LLM touches any of it. Our own pipeline mirrors this shape — an off-the-shelf container reader in place of the hand-rolled one, a collaborative model in place of the plain object model, and a live canvas in place of the static SVG export.
flowchart TD
F["Altium file — OLE2 compound (.SchDoc / .PcbDoc)"]:::src
OLE["AltiumOleFile — custom OLE reader/writer, zero deps"]:::alt
subgraph PARSE["Per-class record parsers"]
R1["Sch: length-prefixed param-string records"]:::alt
R2["PCB: raw type-byte + subrecord streams"]:::alt
end
OM["Typed object model — ObjectCollection + records"]:::alt
SER["Serializer — cp1252, font-id map, field presence"]:::alt
RT["Byte-identical round-trip — unknown streams preserved"]:::con
JSON["Versioned JSON contract — design.a1 / netlist.a0"]:::con
SVG["Enriched SVG — data-* + embedded metadata"]:::con
C["Consumers — CI checks, our UI, agents"]:::agent
F --> OLE --> PARSE --> OM
OM --> SER --> RT
RT -.write.-> F
OM --> JSON --> C
OM --> SVG --> C
classDef src fill:#94a3b822,stroke:#94a3b8,stroke-width:1.5px;
classDef alt fill:#38bdf833,stroke:#38bdf8,stroke-width:1.5px;
classDef con fill:#34d39933,stroke:#34d399,stroke-width:1.5px;
classDef agent fill:#fb718533,stroke:#fb7185,stroke-width:2px;
How Circuit.ly handles Altium today
Our Altium support is five native parsers, fully deterministic, no LLM, each with a matching serializer for round-trip. We didn't adopt altium_monkey; we built our own, tuned for a real-time collaborative editor rather than a one-shot importer.
The OLE / Compound-File container is read with an off-the-shelf library rather than a hand-rolled one. Pragmatic where altium_monkey wrote its own: we get container parsing for free and accept the library's limits.
Custom decoders handle the [len][type][payload] record framing and parse the |KEY=VALUE| pipe strings — URI-decoding, null-padding, case-insensitive keys.
Parallel native model trees for each vendor on a shared core. No Altium→KiCad step and no neutral model — every vendor stays native.
Models are backed by a CRDT layer for real-time multi-user editing, with internal model versioning for migrations. State is live, not a static export.
A drawing layer feeds a shared renderer into a live viewport. We don't emit SVG or JSON in the parse path; the model is the product.
Every type has a serializer, so Altium Designer can reopen our output. Caveat: some structural stub records are recreated rather than byte-preserved, so it favors reopen-ability over byte-identity.
Five file types parse (.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib) along with the core schematic and board record types. A handful of decorative record types aren't parsed yet, and the same cp1252 / case-mode traps altium_monkey documents about the format apply to any decoder.
Side by side, and the trade-offs
Same problem — read an undocumented binary EDA format deterministically — solved two ways. We optimized for a live editor; altium_monkey optimized for a stable automation surface. Neither is wrong; they trade different things.
| Dimension | Circuit.ly today | altium_monkey |
|---|---|---|
| Purpose | Real-time collaborative editor | Toolkit / CLI for automation & CI |
| Stack | TypeScript + a CRDT layer | Python |
| OLE container | Off-the-shelf library | Hand-written reader + writer |
| Record parsing | Custom record decoders | Per-record parsers + central serializer |
| Internal model | Native per-vendor, CRDT-backed parallel per-vendor trees | Typed object model + ObjectCollection |
| Primary output | Live canvas / WebGL render | Versioned JSON + enriched SVG |
| Versioning | Internal model versioning (our model only) | Public stepping schema a0→a1 for consumers |
| Round-trip | 2-way; reopen-able, not byte-identical | Byte-identical; preserves unknown streams |
| Unknown data | Not yet preserved if unhandled | Preserved verbatim |
| LLM in parse path | None | None |
| License | Proprietary (ours) | AGPL-3.0-or-later |
Native TS parsers feeding a CRDT-backed live editor.
- One TS codebase; Altium & KiCad handled as peers
- Real-time collaboration baked in
- Deterministic, no LLM, runs in-browser
- Bidirectional — Altium Designer reopens our output
- We own it — no AGPL entanglement
- No external contract surface for CI / agents yet
- A few decorative record types not parsed yet
- Format-version handling still to harden
- Round-trip favors reopen-ability over byte-identity
- Container fidelity bounded by the chosen library
Hand-rolled core exposing versioned contracts.
- Versioned JSON + SVG contracts agents / CI can depend on
- Byte-identical round-trip; preserves unknown fields
- Oracle + corpus testing across many Altium eras
- Hand-rolled OLE → total control of write fidelity
- Hermetic example corpus doubles as fixtures
- AGPL-3.0 — can't link into proprietary / SaaS
- Python, alpha, Windows-primary
- No live-editing / collaboration model
- PcbDoc API less mature; no generic delete yet
- Heavier to wire into our TS stack
The gap we could close — a stable contract
This is the one thing altium_monkey has that we don't. Our output is the live model + canvas render — perfect for editing, but there's no small, versioned, documented surface for CI, third-party exports, or agents to bind to. altium_monkey's design.a1 is the shape worth borrowing. We're already partway: our internal model is versioned; the move is to project a stable external contract on top of it.
AltiumDesign.to_json(){
"schema": "altium_monkey.design.a1", // the schema string IS the version
"generator": "altium_monkey",
"project": { "name": "bunny_brain", "parameters": {} },
"components": [
{ "designator": "R1", "svg_id": "C12", "value": "10k",
"footprint": "0402",
"classification": { "prefix": "R", "type": "resistor", "pin_count": 2 } }
],
"nets": [
{ "uid": "n7", "name": "GND", "auto_named": false,
"terminals": [ { "designator": "R1", "pin": "2", "pin_type": "passive" } ] }
],
"pnp": {
"units": "mm", "position_mode": "altium-pick-place",
"placements": [
{ "designator": "R1", "layer": "top",
"center_x": 31.75, "center_y": 22.10, "rotation": 90 }
]
}
}
Schema suffixes step like silicon revisions: a0 → a1 adds fields (old readers stay safe), a → b is a breaking change (new id required). Compact, comparable in filenames, and independent of the Python package version.
PCB SVG output carries data-* attributes plus an embedded JSON block (pcb.svg.enrichment.a0) so a viewer — or an agent — can reason about nets, layers, and components over the picture, with no need to re-parse the PcbDoc.
<g data-component="U3" data-component-uid="…"
data-layer-name="TOP">
<path data-net="GND" data-net-class="power"
data-primitive="track" />
</g>
The bytes are version-specific; a contract is small and stable. altium_monkey marks it public with a @public_api flag and ships a JSON Schema beside a prose spec "so downstream tools and AI agents can discover the contract intent without reading Python source." If Circuit.ly grows a circuitly.design.* export contract on top of the model we already have, CI checks, exports, and agent features stop depending on parser internals — and the same surface serves Altium and KiCad alike.
Key learnings for how we handle Altium
Seven patterns worth carrying into Circuit.ly — ordered from highest leverage. Some we already do; a few (preserve-the-unknown, a stable contract) are the gaps from §04 worth closing. Each is tagged by scope: Altium-specific hard-won format detail, or All formats a discipline that holds for KiCad and anything we add next.
Keep the model out of the parse path
Deterministic code owns the format; the LLM calls high-level methods and reads JSON. Never ask a model to emit or interpret raw file bytes.
Know your container trade-off
altium_monkey hand-rolled an OLE reader/writer for byte-identical round-trips; an off-the-shelf library is faster to adopt but bounds write fidelity. Pick deliberately: for Altium, fidelity isn't free.
Preserve what you haven't decoded
Copy unknown streams and unsupported fields through verbatim. Adopting preserve-the-unknown as a default means unhandled records never cause quiet data loss and round-trips stay honest.
A versioned contract: schema + prose + stepping
One small, documented JSON surface, with a JSON Schema beside a human spec. a0→a1 adds fields, a→b breaks. The schema string is the version.
Reuse native identity where it exists
Altium has no stable handles, so it must synthesize svg_id/net uid. KiCad gives every object a UUID — reuse it. Don't mint identity the source already provides.
Enrich the render
Ship SVG with data-* attributes + an embedded JSON block. The UI and agents reason over the picture — nets, layers, parts — without re-parsing the source.
Determinism + oracle & corpus testing
Stable ordering, native-tool compare, and a hermetic example corpus that doubles as regression fixtures. Diff against the real tool; preserve what you don't yet understand.
Caveats & watch-outs
- AGPL-3.0-or-later. Strong copyleft — fine for internal tooling/CLI, a problem if we link it into anything proprietary or SaaS-distributed. Treat it as a reference for our own implementation, or keep it process-isolated.
- Alpha + Windows-primary. macOS/Linux coverage is "limited"; the test corpus is private; the PcbDoc API is less mature than SchDoc (no generic object/deletion model yet).
- Encoding & case. Binary strings are
cp1252, not UTF-8; schematic keys flip case between SchLib and SchDoc. A single wrong assumption here silently corrupts text. - Units & names. PCB geometry is 1/10000-mil integers; OLE stream names cap at 31 chars with a fixed invalid-char set. Normalize on the way in, restore on the way out.
- Cross-format note. KiCad is easier to read (open S-expr text) — but its own tool outputs drift between releases. Apply the same contract discipline: own it, version it, expose a stable surface. One internal model, many sources.