Circuit.ly · Engineering notes · Altium file handling

How we handle Altium files.
Our pipeline today, the altium_monkey reference, and the trade-offs.

How Circuit.ly parses Altium's undocumented binary files right now — set side by side with wavenumber-eng/altium_monkey, a deterministic toolkit that solved the same problem — plus the learnings worth carrying across every format we ingest, KiCad included.

We and altium_monkey reach the same core conclusion: own the format in deterministic code and keep the LLM out of the parse path. Where we diverge is what sits on top — Circuit.ly feeds a live collaborative editor; altium_monkey exposes a versioned JSON contract. This note maps both, honestly, with pros and cons.

5
Altium file types we parse
.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib — native, no conversion.
OLE2
The Altium container
A compound binary; we read it with an off-the-shelf library, then layer custom record decoders on top.
0
LLM / conversions in the parse path
Deterministic native parsers — and no Altium→KiCad conversion step.
2-way
Parse + serialize back to Altium
Round-trips so Altium Designer reopens — but stub records are recreated, not byte-identical.
01

The approach

Own the format in code

Altium ships no public spec for its files. The only honest way to handle them is to reverse-engineer the bytes and encode that knowledge in deterministic code — never guesswork, never an LLM in the parse path.

altium_monkey does exactly this: a hand-written OLE reader/writer, per-record parsers, and a central serializer that all behave identically on every run. The hard, brittle, vendor-specific work is locked inside the library.

Expose a stable surface

On top of the messy internals sits a clean, versioned contract. Consumers — CI checks, our UI, agents — read that, never the raw streams. The docs are deliberately written to be "easy for LLM tooling to parse."

  • Deterministic core + contract surface = the pattern we copy for Altium.
  • It generalizes: KiCad is easier to read but the same discipline (own it, version it, expose a contract) applies.
  • One internal model, many source formats.
02

Reference pipeline: altium_monkey end to end

The whole pipeline, end to end. An .SchDoc/.PcbDoc is an OLE2 compound file; a custom reader cracks it into streams; per-class parsers turn streams into a typed object model; from there one path round-trips back to byte-identical Altium, another projects a versioned JSON contract, a third renders enriched SVG. No LLM touches any of it. Our own pipeline mirrors this shape — an off-the-shelf container reader in place of the hand-rolled one, a collaborative model in place of the plain object model, and a live canvas in place of the static SVG export.

flowchart TD
  F["Altium file — OLE2 compound (.SchDoc / .PcbDoc)"]:::src
  OLE["AltiumOleFile — custom OLE reader/writer, zero deps"]:::alt
  subgraph PARSE["Per-class record parsers"]
    R1["Sch: length-prefixed param-string records"]:::alt
    R2["PCB: raw type-byte + subrecord streams"]:::alt
  end
  OM["Typed object model — ObjectCollection + records"]:::alt
  SER["Serializer — cp1252, font-id map, field presence"]:::alt
  RT["Byte-identical round-trip — unknown streams preserved"]:::con
  JSON["Versioned JSON contract — design.a1 / netlist.a0"]:::con
  SVG["Enriched SVG — data-* + embedded metadata"]:::con
  C["Consumers — CI checks, our UI, agents"]:::agent

  F --> OLE --> PARSE --> OM
  OM --> SER --> RT
  RT -.write.-> F
  OM --> JSON --> C
  OM --> SVG --> C

  classDef src fill:#94a3b822,stroke:#94a3b8,stroke-width:1.5px;
  classDef alt fill:#38bdf833,stroke:#38bdf8,stroke-width:1.5px;
  classDef con fill:#34d39933,stroke:#34d399,stroke-width:1.5px;
  classDef agent fill:#fb718533,stroke:#fb7185,stroke-width:2px;
      
Deterministic parse/serialize core Stable outputs (round-trip · contract · render) Consumers Ctrl/⌘ + scroll to zoom · drag to pan
03

How Circuit.ly handles Altium today

Our Altium support is five native parsers, fully deterministic, no LLM, each with a matching serializer for round-trip. We didn't adopt altium_monkey; we built our own, tuned for a real-time collaborative editor rather than a one-shot importer.

Container — we take the dep

The OLE / Compound-File container is read with an off-the-shelf library rather than a hand-rolled one. Pragmatic where altium_monkey wrote its own: we get container parsing for free and accept the library's limits.

Records — custom decoders

Custom decoders handle the [len][type][payload] record framing and parse the |KEY=VALUE| pipe strings — URI-decoding, null-padding, case-insensitive keys.

Native models, no conversion

Parallel native model trees for each vendor on a shared core. No Altium→KiCad step and no neutral model — every vendor stays native.

Collaboration-first

Models are backed by a CRDT layer for real-time multi-user editing, with internal model versioning for migrations. State is live, not a static export.

Render — canvas / WebGL

A drawing layer feeds a shared renderer into a live viewport. We don't emit SVG or JSON in the parse path; the model is the product.

Two-way round-trip

Every type has a serializer, so Altium Designer can reopen our output. Caveat: some structural stub records are recreated rather than byte-preserved, so it favors reopen-ability over byte-identity.

Coverage today

Five file types parse (.SchDoc · .PcbDoc · .PrjPcb · .SchLib · .PcbLib) along with the core schematic and board record types. A handful of decorative record types aren't parsed yet, and the same cp1252 / case-mode traps altium_monkey documents about the format apply to any decoder.

04

Side by side, and the trade-offs

Same problem — read an undocumented binary EDA format deterministically — solved two ways. We optimized for a live editor; altium_monkey optimized for a stable automation surface. Neither is wrong; they trade different things.

Dimension Circuit.ly today altium_monkey
PurposeReal-time collaborative editorToolkit / CLI for automation & CI
StackTypeScript + a CRDT layerPython
OLE containerOff-the-shelf libraryHand-written reader + writer
Record parsingCustom record decodersPer-record parsers + central serializer
Internal modelNative per-vendor, CRDT-backed parallel per-vendor treesTyped object model + ObjectCollection
Primary outputLive canvas / WebGL renderVersioned JSON + enriched SVG
VersioningInternal model versioning (our model only)Public stepping schema a0→a1 for consumers
Round-trip2-way; reopen-able, not byte-identicalByte-identical; preserves unknown streams
Unknown dataNot yet preserved if unhandledPreserved verbatim
LLM in parse pathNoneNone
LicenseProprietary (ours)AGPL-3.0-or-later
Circuit.ly approach

Native TS parsers feeding a CRDT-backed live editor.

Pros
  • One TS codebase; Altium & KiCad handled as peers
  • Real-time collaboration baked in
  • Deterministic, no LLM, runs in-browser
  • Bidirectional — Altium Designer reopens our output
  • We own it — no AGPL entanglement
Cons
  • No external contract surface for CI / agents yet
  • A few decorative record types not parsed yet
  • Format-version handling still to harden
  • Round-trip favors reopen-ability over byte-identity
  • Container fidelity bounded by the chosen library
altium_monkey approach

Hand-rolled core exposing versioned contracts.

Pros
  • Versioned JSON + SVG contracts agents / CI can depend on
  • Byte-identical round-trip; preserves unknown fields
  • Oracle + corpus testing across many Altium eras
  • Hand-rolled OLE → total control of write fidelity
  • Hermetic example corpus doubles as fixtures
Cons
  • AGPL-3.0 — can't link into proprietary / SaaS
  • Python, alpha, Windows-primary
  • No live-editing / collaboration model
  • PcbDoc API less mature; no generic delete yet
  • Heavier to wire into our TS stack
05

The gap we could close — a stable contract

This is the one thing altium_monkey has that we don't. Our output is the live model + canvas render — perfect for editing, but there's no small, versioned, documented surface for CI, third-party exports, or agents to bind to. altium_monkey's design.a1 is the shape worth borrowing. We're already partway: our internal model is versioned; the move is to project a stable external contract on top of it.

altium_monkey.design.a1 — emitted by AltiumDesign.to_json()
{
  "schema": "altium_monkey.design.a1",   // the schema string IS the version
  "generator": "altium_monkey",
  "project": { "name": "bunny_brain", "parameters": {} },
  "components": [
    { "designator": "R1", "svg_id": "C12", "value": "10k",
      "footprint": "0402",
      "classification": { "prefix": "R", "type": "resistor", "pin_count": 2 } }
  ],
  "nets": [
    { "uid": "n7", "name": "GND", "auto_named": false,
      "terminals": [ { "designator": "R1", "pin": "2", "pin_type": "passive" } ] }
  ],
  "pnp": {
    "units": "mm", "position_mode": "altium-pick-place",
    "placements": [
      { "designator": "R1", "layer": "top",
        "center_x": 31.75, "center_y": 22.10, "rotation": 90 }
    ]
  }
}
Stepping version scheme

Schema suffixes step like silicon revisions: a0 → a1 adds fields (old readers stay safe), a → b is a breaking change (new id required). Compact, comparable in filenames, and independent of the Python package version.

Enriched render

PCB SVG output carries data-* attributes plus an embedded JSON block (pcb.svg.enrichment.a0) so a viewer — or an agent — can reason about nets, layers, and components over the picture, with no need to re-parse the PcbDoc.

pcb.svg.enrichment.a0 — element-level metadata travels with the render
<g data-component="U3" data-component-uid="…"
   data-layer-name="TOP">
  <path data-net="GND" data-net-class="power"
        data-primitive="track" />
</g>
Why it's worth borrowing

The bytes are version-specific; a contract is small and stable. altium_monkey marks it public with a @public_api flag and ships a JSON Schema beside a prose spec "so downstream tools and AI agents can discover the contract intent without reading Python source." If Circuit.ly grows a circuitly.design.* export contract on top of the model we already have, CI checks, exports, and agent features stop depending on parser internals — and the same surface serves Altium and KiCad alike.

06

Key learnings for how we handle Altium

Seven patterns worth carrying into Circuit.ly — ordered from highest leverage. Some we already do; a few (preserve-the-unknown, a stable contract) are the gaps from §04 worth closing. Each is tagged by scope: Altium-specific hard-won format detail, or All formats a discipline that holds for KiCad and anything we add next.

01All formats

Keep the model out of the parse path

Deterministic code owns the format; the LLM calls high-level methods and reads JSON. Never ask a model to emit or interpret raw file bytes.

02Altium-specific

Know your container trade-off

altium_monkey hand-rolled an OLE reader/writer for byte-identical round-trips; an off-the-shelf library is faster to adopt but bounds write fidelity. Pick deliberately: for Altium, fidelity isn't free.

03Altium-specific

Preserve what you haven't decoded

Copy unknown streams and unsupported fields through verbatim. Adopting preserve-the-unknown as a default means unhandled records never cause quiet data loss and round-trips stay honest.

04All formats

A versioned contract: schema + prose + stepping

One small, documented JSON surface, with a JSON Schema beside a human spec. a0→a1 adds fields, a→b breaks. The schema string is the version.

05All formats

Reuse native identity where it exists

Altium has no stable handles, so it must synthesize svg_id/net uid. KiCad gives every object a UUID — reuse it. Don't mint identity the source already provides.

06All formats

Enrich the render

Ship SVG with data-* attributes + an embedded JSON block. The UI and agents reason over the picture — nets, layers, parts — without re-parsing the source.

07All formats

Determinism + oracle & corpus testing

Stable ordering, native-tool compare, and a hermetic example corpus that doubles as regression fixtures. Diff against the real tool; preserve what you don't yet understand.

07

Caveats & watch-outs

Before we build on altium_monkey itself
  • AGPL-3.0-or-later. Strong copyleft — fine for internal tooling/CLI, a problem if we link it into anything proprietary or SaaS-distributed. Treat it as a reference for our own implementation, or keep it process-isolated.
  • Alpha + Windows-primary. macOS/Linux coverage is "limited"; the test corpus is private; the PcbDoc API is less mature than SchDoc (no generic object/deletion model yet).
Altium handling gotchas to carry forward
  • Encoding & case. Binary strings are cp1252, not UTF-8; schematic keys flip case between SchLib and SchDoc. A single wrong assumption here silently corrupts text.
  • Units & names. PCB geometry is 1/10000-mil integers; OLE stream names cap at 31 chars with a fixed invalid-char set. Normalize on the way in, restore on the way out.
  • Cross-format note. KiCad is easier to read (open S-expr text) — but its own tool outputs drift between releases. Apply the same contract discipline: own it, version it, expose a stable surface. One internal model, many sources.

Source analyzed: github.com/wavenumber-eng/altium_monkey — Python toolkit for reading, writing, and rendering Altium files (AGPL-3.0). ~139k LOC · 184 modules · 84 examples.

Real schemas referenced: altium_monkey.design.a1, netlist.a0, pcb.svg.enrichment.a0. Circuit.ly engineering note — what we learn from how they handle Altium, applied to our own multi-format pipeline.