Foundry program · 2026 cohort open

Inference,
materialized.
Rented.

We turn open-weight language models into application-specific silicon. The result is a sealed inference appliance you own — no API, no telemetry, no seat fees, no provider that can change the terms next quarter.

Process
N6 · N5 · roadmap to N3
Models
DeepSeek · Kimi · GLM · Qwen
Tape-out
Q3 2026
§ 01 — Thesis

General-purpose silicon is paying for flexibility you don't need.

A GPU is built to run any program. An inference workload is one program, executed a trillion times. The gap between those two facts is where the next thousand-fold improvement in efficiency lives — and where the case for ownership becomes unanswerable.

01

The weights are the program.

Open-weight models are not API endpoints. They are 100% specified, frozen, and yours to embody in any substrate you choose. Silicon is one such substrate.

02

Inference is not training.

Training needs flexibility, gradient flow, experimentation. Inference needs one numerical recipe executed at maximum efficiency. Different problem, different machine.

03

Sealed is the new private.

If your model fits in a sealed appliance with no network egress, your data never leaves your facility. Privacy stops being a policy and becomes a wiring diagram.

04

The cloud is a balance sheet, not a destiny.

Per-token pricing made sense when nobody could afford the hardware. The hardware is now affordable. The pricing model is not.

§ 02 — Process

From open weights to sealed silicon, in seven steps.

We do not sell chips off a shelf. Each customer engagement compiles a specific model — frozen at a specific checkpoint — into a specific accelerator, packaged into a specific enclosure with a specific security posture. Below is the standard program.

01 / model

Model selection & freeze.

Pick an open-weight model. Pick a quantization. Pick a context window. That checkpoint is now your product.

02 / profile

Workload profiling.

We instrument your real traffic. Token mixes, batch shapes, latency targets. The silicon is sized to the workload, not the spec sheet.

03 / floorplan

Architecture & floorplan.

Layer mapping, dataflow, memory hierarchy. SRAM-heavy for low-latency decode. HBM-heavy for long-context prefill.

04 / rtl

RTL & verification.

Bit-exact match against your frozen reference implementation. No "close enough." Determinism is the product.

05 / tape-out

Tape-out & first silicon.

Mask set, MPW or full reticle, first wafers. Twelve to sixteen weeks from sign-off to wafers in hand.

06 / appliance

Appliance integration.

Board, chassis, firmware, network isolation. A sealed unit that boots your model and serves nothing else.

07 / deliver

Delivery & custody.

Bonded transport to your facility. Tamper-evident seals. Source-code escrow. The chip is yours; we keep a key only for warranty silicon.

§ 03 — The math

Why a fixed-function accelerator beats a general-purpose GPU on inference.

A GPU spends most of its die area on things you do not use during inference: the rasterizer, the texture units, the FP64 path, the speculative scheduler. We delete every transistor that does not multiply a weight or move a token.

Metric General-purpose GPU Hardagent-01 · DeepSeek-V4 build Delta
Die area spent on inference-relevant logic ~35 % ~92 % 2.6×
Tokens / second / watt (671B MoE, batch=1) 0.45 14.2 31×
Idle power draw (model resident, no traffic) ~280 W ~40 W
Time-to-first-token, 32k context 2.4 s 0.31 s 7.7×
5-year TCO (1B tok/day workload) $18.2 M $3.9 M 4.7×

Figures are simulation targets for the Hardagent-01 reference design. Independent silicon validation pending first wafers. Comparison baseline: H100 SXM5 at MLPerf Inference v4.1 reference settings.

§ 04 — Engage

If your inference budget exceeds your headcount, we should talk.

The Hardagent program admits four customers per fabrication cohort. Engagement begins with a workload review under mutual NDA. Minimum order: one appliance rack. Maximum: one fab line.

§ 05 — Careers

Wanted: people who have shipped real silicon.

We are hiring physical-design, RTL, packaging, and firmware engineers with first-silicon experience on AI accelerators or HPC parts. Remote-friendly for design roles, on-site for bring-up. Equity is generous, optics are not.

RTL

Senior RTL Engineer — Tensor Core.

You will own the MAC array and dataflow. SystemVerilog, formal experience preferred.

PD

Physical Design Lead.

Floorplan, P&R, timing closure for an 800+ mm² die. Bring your war stories.

FW

Firmware & Bring-up.

From JTAG to running a model. Hands-on, sleepless, satisfying.

PKG

Packaging & Substrate.

2.5D CoWoS, HBM stacking, thermal. Coordinate with the foundry directly.