Real-Time Code Generation¶

Status: shipped — Python codegen pipeline + C99 target + PIL parity bench. ARM Cortex-M7 / Zynq target profiles + cycle-budget enforcement are the natural follow-ups.

pulsim.codegen takes a Pulsim Circuit plus a fixed step dt and emits a self-contained C99 module that integrates the same dynamics on any target with a float and a basic libm. The generated model.c has no malloc, no globals, no platform dependencies — drop it into your control loop as-is.

This is the headline feature for HIL (Hardware-in-the-Loop) / RCP (Rapid Control Prototyping) workflows: simulate on the desktop, code-gen, run the same model on a Cortex-M7 / Zynq / microcontroller / FPGA-soft-core / RTOS without re-tuning anything.

TL;DR¶

import pulsim
from pulsim.codegen import generate

ckt = ...  # your existing Circuit
summary = generate(ckt, dt=1e-6, out_dir="gen/buck")
print(f"State: {summary.state_size}D, "
      f"stability radius {summary.stability_radius:.4f}")
print(f"ROM ≈ {summary.rom_estimate_bytes} B,  "
      f"RAM ≈ {summary.ram_estimate_bytes} B")

That writes three files:

File	Purpose
`model.h`	Header with `PulsimModel` struct + step API
`model.c`	Discrete-time matrices as `static const float[][]` + step body
`model_test.c`	Tiny PIL harness — reads `n_steps`, `u_const` from argv, prints the y trace

The step API is target-agnostic:

typedef struct PulsimModel {
    float x[PULSIM_STATE_SIZE];
} PulsimModel;

void pulsim_model_init(PulsimModel* m);
void pulsim_model_step(PulsimModel* m, const float* u, float* y);

Pipeline¶

        ┌─ DC OP            ┌─ matrix exponential
        ▼                   ▼
 Circuit → linearize → reduce → discretize → emit C99
            (E,A,B,C,D)  (drop alg. rows)  (A_d, B_d)   (model.c/h)
                                  ↓
                          stability check
                          rho(A_d) < 1

Linearize at the DC operating point via the C++ Simulator::linearize_around (shipped in add-frequency-domain-analysis). Returns continuous-time (E, A, B, C, D) in descriptor form.
Reduce the descriptor form to a regular state-space by eliminating the algebraic rows of E (V-source branch equations, etc). Outputs a regular (A_red, B_red) plus the state-projection matrix.
Discretize via the matrix exponential trick:

┌A B⎤ ┌A_d B_d⎤ ⎢ ⎥ ·dt → exp ⎢ ⎥ ⎣0 0⎦ ⎣ 0 I ⎦

Single expm call gives both A_d and B_d even when A is singular (integrator-only systems). 4. Stability check: every eigenvalue of A_d must sit inside the unit circle. Codegen fails loud with RuntimeError when rho(A_d) ≥ 1 and recommends shrinking dt. 5. Emit C99 from a Python template. Matrices are baked in as static const float[N][N]; the step body is a plain triply-nested loop (no BLAS dependency).

Target profiles¶

Target	Status
`c99`	shipped — portable baseline; tested via gcc on macOS / Linux
`arm-cortex-m7`	deferred — wants CMSIS-DSP optional matrix-vector + cycle-budget enforcement
`zynq-baremetal`	deferred — wants HW-IP-block stub + Xilinx tool integration

The generated c99 code is naturally portable: any C99 compiler including arm-none-eabi-gcc, xc32-gcc, riscv-gcc, etc. will compile it. The deferred targets are about target-specific optimization (CMSIS-DSP, custom NEON / SSE intrinsics) and toolchain integration (Makefiles, IDE projects), not about the C99 itself.

PIL parity (gate G.1)¶

test_codegen.py::test_codegen_pil_parity_against_native is the contract:

Generate model.c / model.h / model_test.c from a Pulsim Circuit.
Compile the harness with the system gcc (or cc).
Run the binary with constant input for N steps; capture the y trace.
Compute the same trace independently in Python via direct matrix multiplication using the summary.A_d / B_d / C / D.
Assert per-step output agreement within ±0.1 % relative tolerance (with a small absolute floor near zero) — gate G.1 of the change.

When no C compiler is on PATH the test gracefully pytest.skips so it survives bare CI environments.

Stability check¶

The codegen pipeline rejects any (circuit, dt) pair whose discrete state matrix has spectral radius ≥ 1. The error message names the spectral radius and recommends shrinking dt:

RuntimeError: codegen: discrete state matrix is unstable
(spectral radius 1.234567 ≥ 1.0). Choose a smaller dt or check the
circuit for sign-flipped passive elements.

The standard rule of thumb is |λ_max(A) · dt| ≤ ln(2) ≈ 0.693, which gives rho(A_d) ≤ 2. We enforce the tighter < 1 because anything ≥ 1 doesn't decay; the error margin is the user's call to make explicitly.

ROM / RAM estimate¶

CodegenSummary.rom_estimate_bytes is 4·(|A_d| + |B_d| + |C| + |D|) — the total number of floats baked in. RAM is 4·2·state_size (the state vector plus one scratch). These are coarse but accurate to ±20 % on the gcc -O2 baseline; tighter estimates require actually compiling for the target which is the deferred work.

Circuit	State	ROM	RAM
RC low-pass	1	56 B	8 B
RLC	2	192 B	16 B
Buck output stage (LC + R)	2	192 B	16 B
16-topology PFC	16	~ 4 KB	128 B

The 8-KB ROM / 512-B RAM gate from the proposal is comfortable for single-topology converters; multi-topology PWL state machines (switching between several A_d matrices per topology bitmask) push toward the budget on Cortex-M0 / M3, which is why the cycle-budget enforcement (Phase 6) is tracked as a follow-up.

Limitations / follow-ups¶

Multi-topology switching: today the codegen captures a single topology (the linearization at the DC OP). A production HIL model for a switching converter wants N discrete matrix sets keyed by the switch state bitmask, with a switch (topology) body in pulsim_model_step. Tracked alongside the refactor-pwl-switching-engine follow-up that surfaces the topology bitmask as a runtime input.
Cycle-budget enforcement (--max-cycles flag): requires a target-specific cycle model (Cortex-M7 in-order, Zynq A9, etc). Deferred.
YAML codegen: section: declarative config for dt, target, out_dir, reachable_topologies. Lands once the Circuit-variant integration parser dispatch is final.
CLI pulsim codegen <netlist.yaml>: the Python API at pulsim.codegen.generate(...) is final; the CLI wrapper is a thin argparse layer that can ship later.
PIL on Cortex-M7 via qemu-arm: works in principle (the C99 is portable), needs the qemu integration in CI to run.