EulerNPU

NPU Inference Composition & Simulation Stack

Define inference graphs with 123 operators across 13 groups and 10 data types. Compile spec.yaml to .npuart artifacts and simulate or deploy to Zynq-7020 FPGA hardware — all from a single CLI.

Open Source

Operator Set & Compilation Pipeline

123 operators, 13 groups, 10 data types — from spec to deployment artifact

Operator Groups (13)

A comprehensive operator set covering all common inference operations, organized into 13 logical groups.

Operators 123 operators in 13 groups (arithmetic, activation, reduce, normalization, pooling, convolution, recurrent, attention, elementwise, shape, quantization, custom, control)
Data Types 10 dtypes: float32, float16, bfloat16, int8, uint8, int16, int32, int64, bool, complex64
Spec Format spec.yaml — declarative graph definition with typed edges and operator parameters

Compilation Pipeline

From YAML specification to deployable hardware artifact in a validated, reproducible pipeline.

Validate Schema + operator compatibility checks
Compile spec.yaml → IR → optimization passes → .npuart
Simulate Cycle-accurate simulation with profiling data
Deploy Zynq-7020 FPGA target with board-smoke verification

Design Principles

Deterministic, reproducible, and auditable inference at every step

Declarative Specification

All inference graphs are defined in spec.yaml — human-readable, version-controllable, and diffable. No hidden state or implicit configuration.

Bit-Exact Reproducibility

Simulation results are bit-exact across runs. The same spec.yaml always produces the same .npuart artifact and the same inference outputs.

Hardware-First Validation

Board-smoke tests verify hardware compatibility before deployment. Calibration and profiling ensure real-world performance matches simulation.

CLI Reference

Single entry point eulernpu — 11 subcommands cover the entire workflow

validate

Validate spec.yaml schema, operator compatibility, and dtype constraints.

compile

Compile spec.yaml to .npuart artifact through the optimization pipeline.

run

Execute a compiled .npuart artifact with input data and produce outputs.

sim

Cycle-accurate simulation of the inference graph with timing data.

profile

Generate per-operator latency, memory, and throughput profiling reports.

explain

Human-readable summary of the graph structure, operator count, and data flow.

board-smoke

Run hardware compatibility smoke tests on the target FPGA board.

calibrate

Calibrate quantization parameters using representative input data.

benchmark

Run throughput and latency benchmarks on compiled artifacts.

replay

Replay a recorded inference session for debugging and validation.

compress-cache

Compress and manage the compilation cache for faster rebuilds.

Tutorials

Step-by-step guides to get started with EulerNPU quickly

Tutorials coming soon.

Installation & Getting Started

Install EulerNPU and compile your first inference graph

Installation

pip install -e ".[dev]"

# Validate and compile
eulernpu validate spec.yaml
eulernpu compile spec.yaml -o model.npuart

Requirements

Python 3.12+

Zynq-7020 FPGA board (for hardware deployment)

Start NPU Inference Development with EulerNPU

From spec.yaml to hardware deployment, in a single CLI.

Get Started on GitHub Contact Us