Circuit.ly · Engineering notes · Altium file handling

How we handle Altium files.
Our pipeline today, the altium_monkey reference, and the trade-offs.

How Circuit.ly parses Altium's undocumented binary files right now — set side by side with wavenumber-eng/altium_monkey, a deterministic toolkit that solved the same problem — plus the learnings worth carrying across every format we ingest, KiCad included.

We and altium_monkey reach the same core conclusion: own the format in deterministic code and keep the LLM out of the parse path. Where we diverge is what sits on top — Circuit.ly feeds a live collaborative editor; altium_monkey exposes a versioned JSON contract. This note maps both, honestly, with pros and cons.

Altium file types we parse

.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib — native, no conversion.

OLE2

The Altium container

A compound binary; we read it with an off-the-shelf library, then layer custom record decoders on top.

LLM / conversions in the parse path

Deterministic native parsers — and no Altium→KiCad conversion step.

2-way

Parse + serialize back to Altium

Round-trips so Altium Designer reopens — but stub records are recreated, not byte-identical.

The approach

Own the format in code

Altium ships no public spec for its files. The only honest way to handle them is to reverse-engineer the bytes and encode that knowledge in deterministic code — never guesswork, never an LLM in the parse path.

altium_monkey does exactly this: a hand-written OLE reader/writer, per-record parsers, and a central serializer that all behave identically on every run. The hard, brittle, vendor-specific work is locked inside the library.

Expose a stable surface

On top of the messy internals sits a clean, versioned contract. Consumers — CI checks, our UI, agents — read that, never the raw streams. The docs are deliberately written to be "easy for LLM tooling to parse."

Deterministic core + contract surface = the pattern we copy for Altium.
It generalizes: KiCad is easier to read but the same discipline (own it, version it, expose a contract) applies.
One internal model, many source formats.

Reference pipeline: altium_monkey end to end

The whole pipeline, end to end. An .SchDoc/.PcbDoc is an OLE2 compound file; a custom reader cracks it into streams; per-class parsers turn streams into a typed object model; from there one path round-trips back to byte-identical Altium, another projects a versioned JSON contract, a third renders enriched SVG. No LLM touches any of it. Our own pipeline mirrors this shape — an off-the-shelf container reader in place of the hand-rolled one, a collaborative model in place of the plain object model, and a live canvas in place of the static SVG export.

flowchart TD
  F["Altium file — OLE2 compound (.SchDoc / .PcbDoc)"]:::src
  OLE["AltiumOleFile — custom OLE reader/writer, zero deps"]:::alt
  subgraph PARSE["Per-class record parsers"]
    R1["Sch: length-prefixed param-string records"]:::alt
    R2["PCB: raw type-byte + subrecord streams"]:::alt
  end
  OM["Typed object model — ObjectCollection + records"]:::alt
  SER["Serializer — cp1252, font-id map, field presence"]:::alt
  RT["Byte-identical round-trip — unknown streams preserved"]:::con
  JSON["Versioned JSON contract — design.a1 / netlist.a0"]:::con
  SVG["Enriched SVG — data-* + embedded metadata"]:::con
  C["Consumers — CI checks, our UI, agents"]:::agent

  F --> OLE --> PARSE --> OM
  OM --> SER --> RT
  RT -.write.-> F
  OM --> JSON --> C
  OM --> SVG --> C

  classDef src fill:#94a3b822,stroke:#94a3b8,stroke-width:1.5px;
  classDef alt fill:#38bdf833,stroke:#38bdf8,stroke-width:1.5px;
  classDef con fill:#34d39933,stroke:#34d399,stroke-width:1.5px;
  classDef agent fill:#fb718533,stroke:#fb7185,stroke-width:2px;

Deterministic parse/serialize core Stable outputs (round-trip · contract · render) Consumers Ctrl/⌘ + scroll to zoom · drag to pan

How Circuit.ly handles Altium today

Our Altium support is five native parsers, fully deterministic, no LLM, each with a matching serializer for round-trip. We didn't adopt altium_monkey; we built our own, tuned for a real-time collaborative editor rather than a one-shot importer.

Container — we take the dep

The OLE / Compound-File container is read with an off-the-shelf library rather than a hand-rolled one. Pragmatic where altium_monkey wrote its own: we get container parsing for free and accept the library's limits.

Records — custom decoders

Custom decoders handle the [len][type][payload] record framing and parse the |KEY=VALUE| pipe strings — URI-decoding, null-padding, case-insensitive keys.

Native models, no conversion

Parallel native model trees for each vendor on a shared core. No Altium→KiCad step and no neutral model — every vendor stays native.

Collaboration-first

Models are backed by a CRDT layer for real-time multi-user editing, with internal model versioning for migrations. State is live, not a static export.

Render — canvas / WebGL

A drawing layer feeds a shared renderer into a live viewport. We don't emit SVG or JSON in the parse path; the model is the product.

Two-way round-trip

Every type has a serializer, so Altium Designer can reopen our output. Caveat: some structural stub records are recreated rather than byte-preserved, so it favors reopen-ability over byte-identity.

Coverage today

Five file types parse (.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib) along with the core schematic and board record types. A handful of decorative record types aren't parsed yet, and the same cp1252 / case-mode traps altium_monkey documents about the format apply to any decoder.

Side by side, and the trade-offs

Same problem — read an undocumented binary EDA format deterministically — solved two ways. We optimized for a live editor; altium_monkey optimized for a stable automation surface. Neither is wrong; they trade different things.

Dimension	Circuit.ly today	altium_monkey
Purpose	Real-time collaborative editor	Toolkit / CLI for automation & CI
Stack	TypeScript + a CRDT layer	Python
OLE container	Off-the-shelf library	Hand-written reader + writer
Record parsing	Custom record decoders	Per-record parsers + central serializer
Internal model	Native per-vendor, CRDT-backed parallel per-vendor trees	Typed object model + `ObjectCollection`
Primary output	Live canvas / WebGL render	Versioned JSON + enriched SVG
Versioning	Internal model versioning (our model only)	Public stepping schema `a0→a1` for consumers
Round-trip	2-way; reopen-able, not byte-identical	Byte-identical; preserves unknown streams
Unknown data	Not yet preserved if unhandled	Preserved verbatim
LLM in parse path	None	None
License	Proprietary (ours)	AGPL-3.0-or-later

Circuit.ly approach

Native TS parsers feeding a CRDT-backed live editor.

Pros

One TS codebase; Altium & KiCad handled as peers
Real-time collaboration baked in
Deterministic, no LLM, runs in-browser
Bidirectional — Altium Designer reopens our output
We own it — no AGPL entanglement

Cons

No external contract surface for CI / agents yet
A few decorative record types not parsed yet
Format-version handling still to harden
Round-trip favors reopen-ability over byte-identity
Container fidelity bounded by the chosen library

altium_monkey approach

Hand-rolled core exposing versioned contracts.

Pros

Versioned JSON + SVG contracts agents / CI can depend on
Byte-identical round-trip; preserves unknown fields
Oracle + corpus testing across many Altium eras
Hand-rolled OLE → total control of write fidelity
Hermetic example corpus doubles as fixtures

Cons

AGPL-3.0 — can't link into proprietary / SaaS
Python, alpha, Windows-primary
No live-editing / collaboration model
PcbDoc API less mature; no generic delete yet
Heavier to wire into our TS stack

The gap we could close — a stable contract

This is the one thing altium_monkey has that we don't. Our output is the live model + canvas render — perfect for editing, but there's no small, versioned, documented surface for CI, third-party exports, or agents to bind to. altium_monkey's design.a1 is the shape worth borrowing. We're already partway: our internal model is versioned; the move is to project a stable external contract on top of it.

altium_monkey.design.a1 — emitted by AltiumDesign.to_json()

{
  "schema": "altium_monkey.design.a1",   // the schema string IS the version
  "generator": "altium_monkey",
  "project": { "name": "bunny_brain", "parameters": {} },
  "components": [
    { "designator": "R1", "svg_id": "C12", "value": "10k",
      "footprint": "0402",
      "classification": { "prefix": "R", "type": "resistor", "pin_count": 2 } }
  ],
  "nets": [
    { "uid": "n7", "name": "GND", "auto_named": false,
      "terminals": [ { "designator": "R1", "pin": "2", "pin_type": "passive" } ] }
  ],
  "pnp": {
    "units": "mm", "position_mode": "altium-pick-place",
    "placements": [
      { "designator": "R1", "layer": "top",
        "center_x": 31.75, "center_y": 22.10, "rotation": 90 }
    ]
  }
}

Stepping version scheme

Schema suffixes step like silicon revisions: a0 → a1 adds fields (old readers stay safe), a → b is a breaking change (new id required). Compact, comparable in filenames, and independent of the Python package version.

Enriched render

PCB SVG output carries data-* attributes plus an embedded JSON block (pcb.svg.enrichment.a0) so a viewer — or an agent — can reason about nets, layers, and components over the picture, with no need to re-parse the PcbDoc.

pcb.svg.enrichment.a0 — element-level metadata travels with the render

<g data-component="U3" data-component-uid="…"
   data-layer-name="TOP">
  <path data-net="GND" data-net-class="power"
        data-primitive="track" />
</g>

Why it's worth borrowing

The bytes are version-specific; a contract is small and stable. altium_monkey marks it public with a @public_api flag and ships a JSON Schema beside a prose spec "so downstream tools and AI agents can discover the contract intent without reading Python source." If Circuit.ly grows a circuitly.design.* export contract on top of the model we already have, CI checks, exports, and agent features stop depending on parser internals — and the same surface serves Altium and KiCad alike.

Key learnings for how we handle Altium

Seven patterns worth carrying into Circuit.ly — ordered from highest leverage. Some we already do; a few (preserve-the-unknown, a stable contract) are the gaps from §04 worth closing. Each is tagged by scope: Altium-specific hard-won format detail, or All formats a discipline that holds for KiCad and anything we add next.

01All formats

Keep the model out of the parse path

Deterministic code owns the format; the LLM calls high-level methods and reads JSON. Never ask a model to emit or interpret raw file bytes.

02Altium-specific

Know your container trade-off

altium_monkey hand-rolled an OLE reader/writer for byte-identical round-trips; an off-the-shelf library is faster to adopt but bounds write fidelity. Pick deliberately: for Altium, fidelity isn't free.

03Altium-specific

Preserve what you haven't decoded

Copy unknown streams and unsupported fields through verbatim. Adopting preserve-the-unknown as a default means unhandled records never cause quiet data loss and round-trips stay honest.

04All formats

A versioned contract: schema + prose + stepping

One small, documented JSON surface, with a JSON Schema beside a human spec. a0→a1 adds fields, a→b breaks. The schema string is the version.

05All formats

Reuse native identity where it exists

Altium has no stable handles, so it must synthesize svg_id/net uid. KiCad gives every object a UUID — reuse it. Don't mint identity the source already provides.

06All formats

Enrich the render

Ship SVG with data-* attributes + an embedded JSON block. The UI and agents reason over the picture — nets, layers, parts — without re-parsing the source.

07All formats

Determinism + oracle & corpus testing

Stable ordering, native-tool compare, and a hermetic example corpus that doubles as regression fixtures. Diff against the real tool; preserve what you don't yet understand.

Caveats & watch-outs

Before we build on altium_monkey itself

AGPL-3.0-or-later. Strong copyleft — fine for internal tooling/CLI, a problem if we link it into anything proprietary or SaaS-distributed. Treat it as a reference for our own implementation, or keep it process-isolated.
Alpha + Windows-primary. macOS/Linux coverage is "limited"; the test corpus is private; the PcbDoc API is less mature than SchDoc (no generic object/deletion model yet).

Altium handling gotchas to carry forward

Encoding & case. Binary strings are cp1252, not UTF-8; schematic keys flip case between SchLib and SchDoc. A single wrong assumption here silently corrupts text.
Units & names. PCB geometry is 1/10000-mil integers; OLE stream names cap at 31 chars with a fixed invalid-char set. Normalize on the way in, restore on the way out.
Cross-format note. KiCad is easier to read (open S-expr text) — but its own tool outputs drift between releases. Apply the same contract discipline: own it, version it, expose a stable surface. One internal model, many sources.