EulerForge CLI Reference

GitHub Repository: https://github.com/eulerwa/eulerforge

한국어 버전: cli.md

Global Options

`--lang LANG` — Set Output Language

Use --lang with any subcommand to change the CLI output language.

# Output in English
eulerforge --lang en train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml

# Output in Japanese
eulerforge --lang ja bench --preset configs/bench/sft_target_only.yml

Language Code	Language
`ko`	한국어 (default)
`en`	English
`zh`	中文
`ja`	日本語
`es`	Español

You can also set the language via the EULERFORGE_LANG environment variable. The --lang flag takes precedence over the environment variable.

export EULERFORGE_LANG=en
eulerforge train --preset ...    # Output in English

Commands

`eulerforge train`

Main training pipeline.

eulerforge train [OPTIONS]

Option	Description
`--preset PATH`	YAML preset configuration file
`--set KEY=VALUE`	Override a configuration value (repeatable)
`--output-dir DIR`	Checkpoint output directory
`--run-name NAME`	Optional run name suffix
`--print-config`	Print the resolved configuration as JSON
`--validate-only`	Validate configuration only; do not train
`--preflight`	Load model, apply injection, verify phases — without training
`--metrics-level LEVEL`	Metrics level: `minimal` (default) or `advanced` (+ MoE routing stats)
`--debug`	Enable verbose debug logging
`--debug-every N`	Print debug logs every N steps (default: 50)
`--debug-max-modules N`	Maximum modules per debug section (default: 50)
`--debug-topk-grad N`	Number of top-K gradient norms to print (default: 20)
`--debug-attn`	Print attention projection summaries
`--debug-trainable-names`	Dump trainable parameter names

Pipeline Continuation (Checkpoint Auto-Detection)

When model_name points to a previous EulerForge training checkpoint, the system automatically detects the base model and loads it correctly.

# SFT → DPO pipeline example
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_dpo.yml \
    --set model_name=outputs/sft/final \
    --output-dir outputs/dpo

How it works: 1. If model_name directory contains lora_info.json, it's recognized as an EulerForge LoRA checkpoint 2. Reads resolved_config.json from the parent directory to find the original base model 3. Auto-overrides the current config's injection settings (lora_r, lora_alpha, target_keywords, start_layer, etc.) with those from the previous checkpoint — prevents LoRA structure mismatch in cases like SFT(lora_r=48)→DPO(lora_r=24) 4. Loads the base model and applies LoRA injection (using checkpoint-derived parameters) 5. Restores the LoRA adapter weights from the previous checkpoint

Note: Auto-detection requires resolved_config.json. Only checkpoints produced by eulerforge train are supported. If you need a different LoRA structure for DPO, run it as a separate training session rather than through the pipeline.

Environment Variables

Environment Variable	Description
`EULERFORGE_MOE_PERF=1`	Enable per-module MoEFFN forward time/token distribution profiling. Outputs `[MoEPerf]` logs
`EULERFORGE_MOE_DTYPE_DEBUG=1`	Enable dtype flow debug logging at MoEFFN merge points. Outputs `[MoEDType]` logs (hidden/router/weights/expert_out/buffer dtype)
`EULERFORGE_TRAIN_DEBUG=1`	Verbose training loop debug: per-group LR, grad norm mean, param dtype distribution, dequantize stats (mean/std/min/max) on phase transitions. Outputs `[TrainDebug]`/`[DequantDebug]` logs

`eulerforge convert`

Converts arbitrary JSONL into EulerForge standard raw JSONL. The core philosophy is --map-based mapping; recipes are provided as convenience shortcuts.

Three modes: - Map mode (default): --map OUT_KEY=EXPR (repeatable) — general-purpose field mapping - Recipe mode: --recipe <name> — built-in recipes for common data structure transformations - Validate mode: --validate — validate input file schema without performing any conversion

# Inspect fields (explore input file structure)
eulerforge convert --task sft --input data/sft_10k.jsonl --print-sample-flat

# Map mode (flat fields)
eulerforge convert --task sft --input data/custom.jsonl --output data/out.jsonl \
    --map prompt=instruction --map response=output

# Map mode (nested fields — automatic dot-path flattening)
eulerforge convert --task prompted_preference --input data/dpo_10k.jsonl --output data/dpo_10k_raw.jsonl \
    --map prompt=instruction.value --map chosen=chosen.value --map rejected=rejected.value

# Recipe mode (messages array)
eulerforge convert --task sft --input data/sft_10k.jsonl --output data/sft_10k_raw.jsonl \
    --recipe sft_messages --messages-expr json_record.messages

# Validate mode (schema validation without conversion)
eulerforge convert --task sft --input data/sft_10k_raw.jsonl --validate

Option	Description
`--task`	Target task: `sft`, `preference`, `prompted_preference`
`--input`	Input JSONL file path
`--output`	Output standard raw JSONL path (not required for `--validate` / `--print-sample-flat`)
`--map OUT_KEY=EXPR`	Output key to field expression mapping (repeatable). E.g., `--map prompt=instruction`
`--flatten dot\\|none`	Flatten mode: `dot` (default) flattens nested dicts to dot-keys; `none` navigates dot-paths directly
`--join-sep SEP`	Separator for joining list[str] values (default: newline)
`--strict`	Error if a resolved value is None, empty string, or empty list
`--recipe`	Built-in recipe name
`--messages-expr EXPR`	Dot-path to messages array (required with `--recipe sft_messages`)
`--num-proc`	Number of parallel workers (default: 1)
`--overwrite`	Overwrite output file if it already exists
`--max-rows`	Maximum number of rows to convert (sampling)
`--validate`	Validate-only mode: validate input schema, no conversion
`--print-sample-flat`	Print flattened key list from first 1-2 rows and exit (for field discovery)

Built-in Recipes

Recipe	Task	Input Structure	Output
`sft_messages`	`sft`	Any dot-path to messages array (`--messages-expr` required)	`{prompt, response}`
`sft_instruction_output`	`sft`	`{instruction, output}`	`{prompt, response}`
`dpo_nested_value`	`prompted_preference`	`{instruction.value, chosen.value, rejected.value}`	`{prompt, chosen, rejected}`
`sft_messages_v1`	`sft`	`{json_record.messages: [{role,content}]}`	`{prompt, response}`
`dpo_nested_v1`	`prompted_preference`	`{instruction.value, chosen.value, rejected.value}`	`{prompt, chosen, rejected}`
`passthrough_prompted_preference_v1`	`prompted_preference`	`{prompt, chosen, rejected}`	Same (extra keys removed)
`passthrough_sft_prompt_response_v1`	`sft`	`{prompt, response}`	Same (extra keys removed)
`passthrough_preference_v1`	`preference`	`{chosen, rejected}`	Same (extra keys removed)

Details: tutorials/00_data_preprocessing.md

`eulerforge preprocess`

Converts raw JSONL into tokenized processed JSONL.

eulerforge preprocess --task TASK --input RAW.jsonl --output PROCESSED.jsonl --model-name MODEL

Option	Description
`--task`	Task type: `sft`, `preference`, `prompted_preference`
`--input`	Input raw JSONL file path
`--output`	Output processed JSONL file path
`--model-name`	HuggingFace model/tokenizer name
`--max-length`	Maximum sequence length (default: 512)
`--num-proc`	Number of parallel workers (default: 50% of CPU cores)
`--text-col`	SFT text column name (default: `text`)
`--prompt-col`	Prompt column name (default: `prompt`)
`--response-col`	Response column name (default: `response`)
`--chosen-col`	Chosen column name (default: `chosen`)
`--rejected-col`	Rejected column name (default: `rejected`)

Details: tutorials/00_data_preprocessing.md

`eulerforge bench`

Inference benchmark. Configure target/baseline/judge models via a YAML spec, sample from benchmark data, and compare inference results. The target supports API models (Ollama/OpenAI/Gemini) or locally trained HF checkpoints.

eulerforge bench [OPTIONS]

Option	Description
`--preset PATH`	Bench YAML spec file path (required)
`--set KEY=VALUE`	Override a configuration value (repeatable)
`--output-dir DIR`	Results output directory
`--validate-only`	Validate configuration only
`--dry-run`	Extract data samples only (no model calls)
`--target-output-dir PATH`	Target local model: training output root directory
`--checkpoint TYPE`	Checkpoint type: `final` (default) \| `latest` \| `best`
`--target-model-dir PATH`	Target local model: specify HF save_pretrained directory directly
`--target-device DEVICE`	Target local model device override (e.g., `cuda:0`, `cuda:1`, `cpu`)

Sequential Model Loading (OOM Prevention)

Models are loaded one at a time, process the entire dataset, then unloaded. No more than one model resides in GPU memory at any time.

Phase 1: Load target → run inference on all samples → unload
Phase 2: Load baseline → run inference on all samples → unload (when enabled)
Phase 3: Load judge → evaluate all samples → unload (when enabled)
Phase 4: Aggregate results → same output format as before

LocalHFClient.unload() deletes the model/tokenizer and calls torch.cuda.empty_cache(). For API clients (ChatClient/JudgeClient), unload() is a no-op.

Execution Modes

Condition	Mode	Output
Target only	target-only	Target responses
Target + baseline	comparison	Both outputs
Judge + target only	pointwise	Score (1-10) + explanation
Judge + target + baseline	pairwise	Winner (A/B/tie) + score + explanation

Bench YAML Spec

Target specification methods (use exactly one):

bench:
  task: sft                                  # sft | preference
  data_path: data/sft_1k_bench_raw.jsonl
  sample:
    k: 10
    seed: 42
    shuffle: true
  generation:
    max_new_tokens: 256
    temperature: 0.7
    top_p: 0.95
  models:
    target:
      # Method A: API model (Ollama/OpenAI/Gemini)
      provider: ollama                       # ollama | openai | gemini
      model: "qwen3:0.6b"
      base_url: "http://localhost:11434/v1"

      # Method B: Training output directory (automatic checkpoint resolution)
      # output_dir: "outputs/run_20260301_120000"
      # checkpoint: "final"                  # final | latest | best (default: final)
      # device: "auto"                       # default: auto
      # dtype: "auto"                        # default: auto

      # Method C: Specify HF model directory directly
      # provider: local_hf                   # can be omitted (auto-configured)
      # model_dir: "outputs/run_20260301_120000/final"
    baseline:
      enabled: false
      provider: ollama
      model: "qwen3:4b"
    judge:
      enabled: false
      provider: ollama
      model: "gemma3:27b"
      mode: pointwise                        # pointwise | pairwise
      mitigate_position_bias: true           # pairwise A/B swap (2 rounds)
  output:
    out_dir: outputs/bench
    save_jsonl: true
    print_examples: true
    print_max_chars: 1500

Usage Examples

# Using an API model
eulerforge bench --preset configs/bench/sft_target_only.yml

# Load final checkpoint from training output directory
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000

# Use latest checkpoint
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000 --checkpoint latest

# Specify model directory directly
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-model-dir outputs/run_20260301_120000/final

# Specify GPU (load target on GPU 1 when judge is using GPU 0)
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000 --target-device cuda:1

# Override configuration
eulerforge bench --preset configs/bench/sft_target_only.yml --set bench.sample.k=20

# Validate only
eulerforge bench --preset configs/bench/sft_target_only.yml --validate-only

# Preview samples (no model calls)
eulerforge bench --preset configs/bench/sft_target_only.yml --dry-run

Bench Data Format

File	Keys	Task
`sft_1k_bench_raw.jsonl`	`prompt`, `response`	sft
`dpo_1k_bench_raw.jsonl`	`prompt`, `chosen`, `rejected`	preference

Details: tutorials/11_bench.md

`eulerforge eval` (stub)

Perplexity evaluation (not yet implemented).

`eulerforge grid`

Hyperparameter grid / random / Bayesian search via Optuna. See the eulerforge grid section below for full documentation.

Training Types (`training.type`)

Type	Description	Data Format	Reference Model
`sft`	Supervised Fine-Tuning (default)	`{input_ids, attention_mask, labels}`	Not required
`dpo`	Direct Preference Optimization	chosen/rejected pairs (DPO format)	Adapter disabled
`orpo`	Odds Ratio Preference Optimization	chosen/rejected pairs (DPO format)	Not required
`rm`	Reward Model (Bradley-Terry)	chosen/rejected pairs (DPO format)	Not required
`ppo`	PPO/RLHF	Prompt-only + reward model	Adapter disabled

Common Training Settings

Key	Alias	Default	Description
`max_train_steps`	`max_steps`	`10000`	Maximum training steps (micro-step basis)
`batch_size`	—	`4`	Batch size
`grad_accum_steps`	—	`1`	Gradient accumulation steps
`lr`	—	`1e-5`	Learning rate
`weight_decay`	—	`0.01`	Weight decay
`warmup_steps`	—	`500`	Warmup steps
`max_grad_norm`	—	`1.0`	Maximum gradient clipping norm
`log_steps`	—	`100`	Logging interval (micro-steps)
`save_steps`	—	`1000`	Checkpoint saving interval (micro-steps)
`val_steps`	—	`save_steps` value	Validation interval (micro-steps). Defaults to `save_steps` if unset

Note: max_train_steps and max_steps are synonymous. If both are specified, max_steps takes precedence. In presets, max_train_steps is the canonical key.

Step Terminology: - Micro-step: Each batch forward/backward constitutes 1 micro-step. max_steps is counted in micro-steps. - Optimizer step: Occurs every grad_accum_steps batches. Total optimizer steps = max_steps / grad_accum_steps. - Effective batch: batch_size x grad_accum_steps. The actual number of samples processed per optimizer step.

ORPO Options

training:
  type: orpo
  orpo_lambda: 1.0  # ORPO term weight (required, positive)

RM Options

training:
  type: rm
  # RewardHead is auto-generated (hidden_size → 1)

PPO Options

training:
  type: ppo
  ppo:
    clip_range: 0.2     # PPO clipping range
    kl_coef: 0.1        # KL penalty coefficient
    epochs: 4           # PPO update epochs per batch
    max_gen_len: 64     # Maximum generation length
    temperature: 1.0    # Sampling temperature
  reward_model:
    model_name: "path/to/model"  # Reward model path
    checkpoint_path: ""          # Checkpoint (optional)

Model Load Precision (`model.load_precision`)

Declare precision and quantization options for model loading in YAML.

model:
  load_precision:
    mode: int4               # fp32 | fp16 | bf16 | int8 | int4
    compute_dtype: bf16      # Compute dtype for int8/int4 operations (fp16 | bf16)
    quant_type: nf4           # int4 only: nf4 | fp4
    double_quant: true        # int4 only
    dequantize_on_train: true # Auto-dequantize when base_ffn becomes trainable

Behavior by Mode

Mode	torch_dtype	BitsAndBytes	Use Case
`fp32`	float32	—	CPU / debugging
`fp16`	float16	—	GPU memory savings
`bf16`	bfloat16	—	GPU default (Ampere+)
`int8`	—	`load_in_8bit`	8-bit quantization
`int4`	—	`load_in_4bit`	4-bit quantization (QLoRA)

Phase Dequantize Policy

When dequantize_on_train=true (the default) and base_ffn becomes trainable during a phase transition, quantized modules are replaced with nn.Linear and weights are restored to compute_dtype.

Mode	Replacement Target	Dequantize Method
int4	`Linear4bit` → `nn.Linear`	`dequantize_4bit()`
int8	`Linear8bitLt` → `nn.Linear`	`int8_vectorwise_dequant()`
bf16/fp16/fp32	No replacement	Set `requires_grad` directly

Replacement is performed automatically just before requires_grad=True is set within set_trainable_by_groups(). Since quantized module forward() re-allocates int8/int4 data, simple parameter casting is insufficient — the module itself must be replaced.

Automatic target_layers scoping: When base_ffn is active and no explicit target_layers is specified, target layers are automatically derived from injection.start_layer/injection.num_layers. Only layers where MoE is injected are dequantized/unfrozen.

Backward Compatibility

Using the legacy training.quant_bits produces a warning and automatic conversion: - quant_bits: 4 → mode: int4 - quant_bits: 8 → mode: int8 - quant_bits: 16 → mode: bf16

Spec details: docs/fixtures/specs/load_precision_spec.md

Data Input (`data` Section)

EulerForge supports three standard data task types:

Task	Raw Schema	Processed Schema
`sft`	`{text}` or `{prompt, response}`	`{input_ids, attention_mask, labels}`
`preference`	`{chosen, rejected}`	`{chosen_input_ids, chosen_attention_mask, chosen_labels, rejected_input_ids, rejected_attention_mask, rejected_labels}`
`prompted_preference`	`{prompt, chosen, rejected}`	Above + `{prompt_input_ids, prompt_attention_mask, prompt_len}`

Configuration Examples

# Method 1: Already tokenized processed JSONL
data:
  format: processed
  path: data/sft_processed.jsonl

# Method 2: Raw JSONL (auto-preprocessed and cached during training)
data:
  format: raw
  task: prompted_preference
  path: data/dpo_10k_raw.jsonl
  max_length: 512
  # num_proc: default = 50% of CPU cores (automatic)
  # cache_dir: default = outputs/.cache (shared cache across runs)
  #
  # Cache filename convention: {input_filename}_{task}_{model_name}_len{max_length}.jsonl
  # Example: dpo_10k_raw_prompted_preference_qwen3.5-0.8b-base_len512.jsonl
  # Automatically reused when parameters match (no re-preprocessing)

# Method 3 (legacy): Specify processed_data_path directly
processed_data_path: data/sft_processed.jsonl

Labels Masking Policy

SFT text-only: labels = input_ids (full sequence causal LM)
SFT prompt/response: Prompt token region set to -100; only the response contributes to the loss
Preference: Full sequence labels for both chosen and rejected
Prompted-Preference: Prompt tokens masked with -100; only completions contribute to loss/logp

Integration Test Data Policy

Integration tests: use 1k data (sft_1k.jsonl, dpo_1k.jsonl)
Tutorials: based on 10k data (sft_10k_raw.jsonl, dpo_10k_raw.jsonl, dpo_10k_raw.jsonl)
Bench tests: use *_1k_bench_raw.jsonl

Data Validation

Processed data is checked against the first 5 rows for required keys before training starts. On missing keys, a 3-line error is shown:

Data: processed dataset row 0 missing 'input_ids'
Fix: Run `eulerforge preprocess ...` or set data.format=raw with schema mapping
See: docs/tutorials/en/00_data_preprocessing.md

data.task / training.type Compatibility

When data.format=raw, the compatibility between data.task and training.type is automatically validated:

data.task	Compatible training.type
`sft`	`sft`, `ppo`
`preference`	`dpo`, `orpo`, `rm`
`prompted_preference`	`dpo`, `orpo`, `rm`
`prompt_only`	`ppo`

On mismatch, an error is raised:

Data Config: data.task='sft' is incompatible with training.type='dpo'. Task 'sft' produces data for sft, ppo training
Fix: Set --set training.type=sft or change data.task to match
See: docs/tutorials/en/00_data_preprocessing.md

Configuration Override Mechanism

Use --set with dot-path notation to override values:

--set training.lr=2e-5
--set injection.lora_r=32
--set training.phases.1.step=100  # Update a specific list element's field (sparse)
--set training.phases.0.trainable=[lora,attn_lora]  # List-of-list element value

List index override: Use training.phases.N.field=value to update only the field at index N. Other elements and other fields within that element are preserved from the preset. Out-of-range indices raise a clear error.

Type Inference

Value	Parsed As
`true`	`bool`
`false`	`bool`
`null`	`None`
`42`	`int`
`3.14`	`float`
`2e-5`	`float`
`hello`	`str`

Precedence

CLI --set overrides > YAML preset values

Output and Checkpoints

Training produces the following directory structure:

outputs/run_YYYYMMDD_HHMMSS/
├── resolved_config.json    # Full resolved configuration (for reproducibility)
├── checkpoint-latest/      # Latest checkpoint
│   ├── model files...
│   ├── tokenizer files...
│   └── training_state.pt   # Optimizer, scheduler, step, best_val_loss
├── checkpoint-best/        # Best validation loss checkpoint
│   └── ...
└── final/                  # Final model after training completes
    └── ...

checkpoint-best is saved at val_steps intervals, and checkpoint-latest is saved at save_steps intervals, independently. Example: with val_steps=500, save_steps=1000 — best is saved at step 500, latest at step 1000.

Automatic Training Resume

If checkpoint-latest/training_state.pt exists, training resumes automatically from that point: - Restores micro_step, optimizer_step, best_val_loss - LR scheduler fast-forward: the cosine schedule is immediately restored to the optimizer step at resume (no re-warmup from 0) - ETA calculation: remaining time is estimated from steps taken since resume, not from total steps

MoE Stability Validation

When using MoE strategies (mixture_lora, moe_expert_lora), the validator automatically performs stability checks.

Required Parameters

Key	Role	Recommended Value	Rationale
`moe.router_z_loss_coef`	Router logit stabilization	`0.001`	ST-MoE: prevents softmax overflow
`moe.load_balance.type`	Load balancing policy	`aux_loss`	Prevents routing collapse
`moe.load_balance.aux_loss_coef`	Auxiliary loss weight	`0.01`	Required when type=aux_loss

Optional Parameters

Key	Role	Default	Recommended Range
`moe.capacity_factor_train`	Expert capacity upper bound during training	Model default	1.0-2.0 (ST-MoE: 1.25)
`moe.capacity_factor_eval`	Expert capacity upper bound during evaluation	Model default	>= train value (ST-MoE: 2.0)
`moe.router_dtype`	Router computation precision	`float32`	`float32` (float16/bfloat16 can be numerically unstable)
`moe.load_balance.bias_update_speed`	Expert bias adaptation speed	—	Required when type=aux_loss_free (0.001)

Error Message Format

MoE validation failures are reported in 3-line format:

MoE Config: moe.router_z_loss_coef is required for MoE strategies.
Fix: Set moe.router_z_loss_coef: 0.001 (ST-MoE recommended)
See: docs/tutorials/en/09_moe_stability_and_validation.md

Warnings (Non-fatal)

Condition	Meaning
`router_z_loss_coef = 0`	z-loss disabled — may be unstable in large-scale training
`router_z_loss_coef > 0.1`	Unusually large value — excessive router constraint
`load_balance.type = none`	No load balancing — risk of routing collapse
`aux_loss_coef = 0`	Effectively disables load balancing
`router_dtype = float16/bfloat16`	Potential numerical instability in router softmax
`capacity_factor_eval < train`	Increased token dropping during evaluation
No phase includes `router`	Router not trained — cannot adapt to data

Preflight Checks

--preflight performs runtime checks after loading the model:

eulerforge train --preset PRESET.yml --preflight

Check Item	Description
Group parameter count	Error if a group referenced by a phase has 0 parameters
target_layers range	Error if indices exceed the model's layer count

Details: tutorials/09_moe_stability_and_validation.md

Logging and Metrics

The training loop logs through a two-tier metrics system.

Metrics Levels

Level	Recorded Items
`minimal` (default)	step, main_loss, total_loss, aux_loss, lr, grad_norm, throughput, tokens/samples, training-type-specific metrics
`advanced`	minimal + MoE routing stats (token_frac, entropy, importance_cv, router_logit_max, etc.)

Minimal Metrics

Tag	Description
`train/main_loss`	Primary loss (SFT/DPO/ORPO/RM/PPO)
`train/total_loss`	Total loss (main + aux * weight)
`train/aux_loss`	Sum of MoE auxiliary losses
`train/learning_rate`	Current learning rate
`train/grad_norm`	Global L2 gradient norm
`train/tokens_seen`	Cumulative training tokens (labels != -100)
`train/samples_seen`	Cumulative processed samples (preference: counted as pairs)
`train/optimizer_step`	Cumulative optimizer steps
`train/micro_step`	Cumulative micro-steps
`train/effective_batch`	Effective batch size (batch_size x grad_accum_steps)
DPO: `train/reward_margin`	Chosen-rejected reward margin
ORPO: `train/sft_loss`, `train/orpo_loss`	Individual SFT/ORPO losses
PPO: `train/kl`, `train/reward_mean`	KL divergence, mean reward

Advanced Metrics (MoE Only)

Tag	Description
`moe/token_frac_mean`	Mean token fraction per expert
`moe/token_frac_std`	Standard deviation of token fraction per expert
`moe/token_frac_max`	Fraction of the most selected expert
`moe/entropy_mean`	Mean router entropy
`moe/importance_cv`	Importance coefficient of variation (imbalance detection)
`moe/aux_loss_total`	Total aux loss sum
`moe/router_logit_max`	Maximum router logit (numerical explosion detection)
`moe/num_moe_modules`	Number of MoE modules

LoRA Handoff Logging

When lora_handoff is configured, the following milestone events are automatically logged during training:

Event	Log Message	Timing
Schedule init	`[Handoff] Schedule: expert_lora(step 4000→6000, ...)`	At training start
Fade start	`[Handoff] expert_lora fade started at step 4000 (curve=cosine, ...)`	When `start_step` is reached
Fade complete	`[Handoff] expert_lora fade complete at step 6000 (scale=0.0000)`	When `start_step + duration_steps` is reached
Freeze	`[Handoff] expert_lora frozen at step 6000 (scale=0.0000)`	When `end_action=freeze`
Ramp start	`[Handoff] base_ffn_ramp started at step 2000 (LR x1.0→x3.0)`	When ramp `start_step` is reached
Ramp complete	`[Handoff] base_ffn_ramp complete at step 4000 (LR x3.0)`	When ramp `end_step` is reached

Each event is logged exactly once (one-shot). attn_lora follows the same pattern.

Periodic logs (at log_steps intervals) automatically include handoff state:

[Phase2] Step 400/600 (micro 4500/6000) | Loss: 0.1234 | LR: 1.00e-05 | ... | Handoff[expert_lora=0.75, attn_lora=0.80, ffn_lr_mult=2.50]

TensorBoard Integration

TensorBoard logging is an optional dependency:

pip install eulerforge[tb]

Configuration example:

logging:
  metrics_level: advanced      # "minimal" | "advanced"
  tensorboard:
    enabled: true              # Enable TensorBoard logging (default: false)
    log_dir: "outputs/tb"      # Log directory
  log_interval: 50             # Write to TensorBoard every N steps
  max_experts_log: 16          # Log detailed stats for top N experts in advanced mode

Override metrics level via CLI:

eulerforge train --preset PRESET.yml --metrics-level advanced

If tensorboard is not installed, a warning is printed and training proceeds normally.

Details: tutorials/10_metrics_monitoring.md

eulerforge grid

Basic Usage

# Validate spec only (dry-run)
eulerforge grid configs/grid/sft_random_search.yml --dry-run

# Run
eulerforge grid configs/grid/sft_random_search.yml

# Specify project root explicitly
eulerforge grid configs/grid/sft_random_search.yml --project-root /path/to/project

Options

Option	Default	Description
`spec`	(required)	Grid search YAML spec file path
`--dry-run`	false	Validate spec only; do not run training
`--project-root DIR`	cwd	Base directory for resolving relative paths

Optuna Dependency

pip install eulerforge[hpo]

If optuna is not installed, a message suggesting pip install eulerforge[hpo] is shown and the process exits.

Optuna Sampler by Method

Method	Sampler	Notes
`grid`	`GridSampler`	Discrete spaces only (categorical / choices)
`random`	`RandomSampler`	Supports both continuous and discrete
`bayes`	`TPESampler`	Supports both continuous and discrete

Combining method: "grid" + type: "float" + low/high is an error. To search continuous floats in grid mode, use the choices: [val1, val2] format.

Output

<output_root>/
├── trial_0000/
│   ├── metrics.jsonl       # Per-step metrics (train/total_loss, etc.)
│   └── checkpoint-latest/
├── trial_0001/
│   └── ...
├── summary.json            # Overall results (best_trial + all_trials)
└── summary.csv

Spec format details: docs/fixtures/specs/grid_search_spec.md

Python API: `eulerforge.loader`

Public API for loading EulerForge-trained checkpoints directly from Python.

from eulerforge import load_model

`load_model(path, *, checkpoint, device, dtype, load_precision) -> LoadedModel`

Parameter	Default	Description
`path`	(required)	Path to a run_dir or checkpoint_dir
`checkpoint`	`"final"`	Checkpoint selection for run_dir: `final` \| `best` \| `latest`
`device`	`"auto"`	`"auto"` \| `"cpu"` \| `"cuda"` \| `"cuda:0"`, etc.
`dtype`	`"auto"`	`"auto"` \| `"float32"` \| `"bfloat16"`, etc.
`load_precision`	`None`	Load precision: `None` \| `"fp32"` \| `"fp16"` \| `"bf16"` \| `"int8"` \| `"int4"`

Return Type

@dataclass
class LoadedModel:
    model: nn.Module           # eval mode
    tokenizer: PreTrainedTokenizer
    metadata: ModelMetadata

@dataclass
class ModelMetadata:
    strategy: str              # "dense_lora" | "moe_expert_lora" | "mixture_lora" | "none"
    backbone: str              # "qwen3" | "llama" | "gemma3" | ""
    path_type: str             # "run_dir" | "checkpoint_dir"
    checkpoint_dir: str        # Actual checkpoint directory path
    lora_config: dict | None   # {"lora_r": int, "lora_alpha": float}
    structure_preserved: bool  # Whether MoE/MixtureLoRA structure is preserved
    load_precision: str | None # "fp32" | "fp16" | "bf16" | "int8" | "int4" | None

Automatic Path Classification

Path Type	Detection Criteria	Behavior
`run_dir`	`resolved_config.json` exists	Resolves subdirectory via `checkpoint` parameter
`checkpoint_dir`	`config.json` exists	Loads directly

Usage Examples

from eulerforge import load_model

# 1. Load from training output directory (final checkpoint)
result = load_model("outputs/run_20260311_163425")
print(result.metadata.strategy)     # "moe_expert_lora"
print(result.metadata.backbone)     # "qwen3"

# 2. Load best checkpoint
result = load_model("outputs/run_20260311_163425", checkpoint="best")

# 3. Specify checkpoint directory directly
result = load_model("outputs/run_20260311_163425/final", device="cuda:0")

# 4. Quantized loading (int4/int8)
result = load_model("outputs/run_20260311_163425", load_precision="int4")
result = load_model("outputs/run_20260311_163425", load_precision="bf16")

# 5. Inference
messages = [{"role": "user", "content": "Hello"}]
text = result.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = result.tokenizer(text, return_tensors="pt").to(result.model.device)
with torch.no_grad():
    out = result.model.generate(**inputs, max_new_tokens=128)
print(result.tokenizer.decode(out[0], skip_special_tokens=True))

# 6. Using metadata
if result.metadata.structure_preserved:
    print("This checkpoint preserves the MoE structure")
if result.metadata.lora_config:
    print(f"LoRA r={result.metadata.lora_config['lora_r']}")

Loading Strategies

Strategy	Loading Method	Structure Preserved
`dense_lora`	LoRA merge → dense model	No
`moe_expert_lora`	Reconstruct MoE architecture (expert+router)	Yes
`mixture_lora`	Reconstruct MixtureLoRA architecture	Yes
`none`	Load via `from_pretrained()` directly	N/A

Note: moe_expert_lora and mixture_lora require resolved_config.json. Without it, the system falls back to averaging experts into a dense model (with a WARNING).

eulerforge export-hf

Exports an EulerForge checkpoint as a HuggingFace Transformers-compatible model.

Basic Usage

# dense_lora → merged HF model
eulerforge export-hf --checkpoint outputs/run_20260311_163425 --output ./exported

# MoE → custom_moe HF model (load with trust_remote_code=True)
eulerforge export-hf --checkpoint outputs/run_moe --output ./exported_moe

# dry-run (print plan only)
eulerforge export-hf --checkpoint outputs/run --output ./out --dry-run

# validate-only
eulerforge export-hf --checkpoint outputs/run --output ./out --validate-only

Options

Option	Default	Description
`--checkpoint`	(required)	Path to run_dir or checkpoint_dir
`--output`	(required)	Export output directory (error if it already exists)
`--format`	`auto`	`auto` \| `merged` \| `custom_moe`
`--select-checkpoint`	`final`	`final` \| `best` \| `latest`
`--dtype`	`auto`	`auto` \| `fp32` \| `fp16` \| `bf16`
`--safe-serialization`	`True`	Whether to use safetensors (disable with `--no-safe-serialization`)
`--copy-tokenizer`	`True`	Whether to copy the tokenizer (disable with `--no-copy-tokenizer`)
`--dry-run`	`False`	Print plan only; no actual export
`--validate-only`	`False`	Validate only
`--skip-diversity-check`	`False`	Skip expert diversity check (experts may not be differentiated with lightweight training)

Export Format by Strategy

Strategy	format=auto	Result	HF Loading
`dense_lora`	`merged`	Standard dense HF model	`from_pretrained(path)`
`mixture_lora`	`custom_moe`	base+router+N LoRA experts	`from_pretrained(path, trust_remote_code=True)`
`moe_expert_lora`	`custom_moe`	N expert FFN+router	`from_pretrained(path, trust_remote_code=True)`

Key point: Exporting moe_expert_lora/mixture_lora strategies as merged destroys the expert structure. format=merged + MoE raises a ValueError.

Spec details: docs/fixtures/specs/export_hf_spec.md

`eulerforge pretrain` (Plugin)

Plugin command: pretrain is provided via the plugin system. It is registered in the CLI only when eulerforge.plugins.pretrain_plugin is present. Excluding this module from a public distribution automatically hides the command.

Performs scratch pretraining on a HuggingFace-format model exported by EulerStack or a similar tool, using raw text data. This is a full-parameter causal LM training pipeline, completely separate from the train command (which applies LoRA/MoE injection).

eulerforge pretrain [OPTIONS]

Option	Description
`--preset PATH`	Pretrain YAML preset path (required)
`--set KEY=VALUE`	Config override (repeatable)
`--output-dir DIR`	Output directory
`--validate-only`	Validate config only, no training

`pretrain` vs `train` differences

Item	`train` (fine-tuning)	`pretrain` (scratch)
Model load	`from_pretrained` (with trained weights)	`from_pretrained` (initialized weights)
Injection	LoRA/MoE applied	None (all parameters trainable)
Phase schedule	freeze/unfreeze control	None (everything trainable)
Data	instruction/preference format	raw text (packed chunking)
Forbidden keys	—	`injection`, `moe`, `backbone` → error

Pretrain YAML preset structure

# Device
device: "cuda:0"             # cuda:0, cuda:1, cpu

# Model (EulerStack export directory)
model_dir: "outputs/full_hybrid_moe"
trust_remote_code: true

# Tokenizer (specify separately if not in model_dir)
tokenizer: "gpt2"

# Data
data:
  path: "data/dolma_10k.jsonl"
  text_column: "text"        # text key in JSONL
  max_length: 1024
  packing: true              # packed chunking (concatenate → split to fixed length)

# Training
training:
  max_steps: 500
  batch_size: 2
  grad_accum_steps: 4
  lr: 3.0e-4
  weight_decay: 0.1
  warmup_steps: 50
  max_grad_norm: 1.0
  log_steps: 10
  save_steps: 250
  dtype: "float32"           # Hybrid models (Hyena/RetNet) require float32
  amp: false                 # FFT ops don't support bf16 → disabled
  seed: 42

Usage examples

# Basic run
eulerforge pretrain --preset configs/presets/pretrain/eulerstack_hybrid_moe.yml

# Override settings
eulerforge pretrain --preset ... --set training.max_steps=1000 --set training.lr=1e-4

# Validate only
eulerforge pretrain --preset ... --validate-only

# Specify output directory
eulerforge pretrain --preset ... --output-dir outputs/my_pretrain

Output structure

outputs/pretrain_YYYYMMDD_HHMMSS/
├── pretrain_config.json         # Config snapshot
├── metrics.jsonl                # Per-step loss, lr
├── checkpoint-250/              # Mid-training checkpoint
│   ├── config.json
│   ├── model.safetensors
│   └── tokenizer files...
└── final/                       # Final model
    ├── config.json
    ├── model.safetensors
    └── tokenizer files...

After pretraining, point eulerforge train's model_name to the final/ directory for LoRA fine-tuning.

EulerForge CLI Reference

Global Options

--lang LANG — Set Output Language

Commands

eulerforge train

Pipeline Continuation (Checkpoint Auto-Detection)

Environment Variables

eulerforge convert

Built-in Recipes

eulerforge preprocess

eulerforge bench

Sequential Model Loading (OOM Prevention)

Execution Modes

Bench YAML Spec

Usage Examples

Bench Data Format

eulerforge eval (stub)

eulerforge grid

Training Types (training.type)

Common Training Settings

ORPO Options

RM Options

PPO Options

Model Load Precision (model.load_precision)

Behavior by Mode

Phase Dequantize Policy

Backward Compatibility

Data Input (data Section)

Configuration Examples

Labels Masking Policy

Integration Test Data Policy

Data Validation

data.task / training.type Compatibility

Configuration Override Mechanism

Type Inference

Precedence

Output and Checkpoints

Automatic Training Resume

MoE Stability Validation

Required Parameters

Optional Parameters

Error Message Format

Warnings (Non-fatal)

Preflight Checks

Logging and Metrics

Metrics Levels

Minimal Metrics

Advanced Metrics (MoE Only)

LoRA Handoff Logging

TensorBoard Integration

eulerforge grid

Basic Usage

Options

Optuna Dependency

Optuna Sampler by Method

Output

Python API: eulerforge.loader

load_model(path, *, checkpoint, device, dtype, load_precision) -> LoadedModel

Return Type

Automatic Path Classification

Usage Examples

Loading Strategies

eulerforge export-hf

Basic Usage

Options

Export Format by Strategy

eulerforge pretrain (Plugin)

pretrain vs train differences

Pretrain YAML preset structure

Usage examples

Output structure

`--lang LANG` — Set Output Language

`eulerforge train`

`eulerforge convert`

`eulerforge preprocess`

`eulerforge bench`

`eulerforge eval` (stub)

`eulerforge grid`

Training Types (`training.type`)

Model Load Precision (`model.load_precision`)

Data Input (`data` Section)

Python API: `eulerforge.loader`

`load_model(path, *, checkpoint, device, dtype, load_precision) -> LoadedModel`

`eulerforge pretrain` (Plugin)

`pretrain` vs `train` differences