Home > EulerForge > CLI Reference

EulerForge CLI Reference

GitHub Repository: https://github.com/eulerwa/eulerforge

한국어 버전: cli.md

Global Options

--lang LANG — Set Output Language

Use --lang with any subcommand to change the CLI output language.

# Output in English
eulerforge --lang en train --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml

# Output in Japanese
eulerforge --lang ja bench --preset configs/bench/sft_target_only.yml
Language Code Language
ko 한국어 (default)
en English
zh 中文
ja 日本語
es Español

You can also set the language via the EULERFORGE_LANG environment variable. The --lang flag takes precedence over the environment variable.

export EULERFORGE_LANG=en
eulerforge train --preset ...    # Output in English

Commands

eulerforge train

Main training pipeline.

eulerforge train [OPTIONS]
Option Description
--preset PATH YAML preset configuration file
--set KEY=VALUE Override a configuration value (repeatable)
--output-dir DIR Checkpoint output directory
--run-name NAME Optional run name suffix
--print-config Print the resolved configuration as JSON
--validate-only Validate configuration only; do not train
--preflight Load model, apply injection, verify phases — without training
--metrics-level LEVEL Metrics level: minimal (default) or advanced (+ MoE routing stats)
--debug Enable verbose debug logging
--debug-every N Print debug logs every N steps (default: 50)
--debug-max-modules N Maximum modules per debug section (default: 50)
--debug-topk-grad N Number of top-K gradient norms to print (default: 20)
--debug-attn Print attention projection summaries
--debug-trainable-names Dump trainable parameter names

Pipeline Continuation (Checkpoint Auto-Detection)

When model_name points to a previous EulerForge training checkpoint, the system automatically detects the base model and loads it correctly.

# SFT → DPO pipeline example
eulerforge train --preset configs/presets/qwen3.5_0.8b_dense_lora_dpo.yml \
    --set model_name=outputs/sft/final \
    --output-dir outputs/dpo

How it works: 1. If model_name directory contains lora_info.json, it's recognized as an EulerForge LoRA checkpoint 2. Reads resolved_config.json from the parent directory to find the original base model 3. Auto-overrides the current config's injection settings (lora_r, lora_alpha, target_keywords, start_layer, etc.) with those from the previous checkpoint — prevents LoRA structure mismatch in cases like SFT(lora_r=48)→DPO(lora_r=24) 4. Loads the base model and applies LoRA injection (using checkpoint-derived parameters) 5. Restores the LoRA adapter weights from the previous checkpoint

Note: Auto-detection requires resolved_config.json. Only checkpoints produced by eulerforge train are supported. If you need a different LoRA structure for DPO, run it as a separate training session rather than through the pipeline.

Environment Variables

Environment Variable Description
EULERFORGE_MOE_PERF=1 Enable per-module MoEFFN forward time/token distribution profiling. Outputs [MoEPerf] logs
EULERFORGE_MOE_DTYPE_DEBUG=1 Enable dtype flow debug logging at MoEFFN merge points. Outputs [MoEDType] logs (hidden/router/weights/expert_out/buffer dtype)
EULERFORGE_TRAIN_DEBUG=1 Verbose training loop debug: per-group LR, grad norm mean, param dtype distribution, dequantize stats (mean/std/min/max) on phase transitions. Outputs [TrainDebug]/[DequantDebug] logs

eulerforge convert

Converts arbitrary JSONL into EulerForge standard raw JSONL. The core philosophy is --map-based mapping; recipes are provided as convenience shortcuts.

Three modes: - Map mode (default): --map OUT_KEY=EXPR (repeatable) — general-purpose field mapping - Recipe mode: --recipe <name> — built-in recipes for common data structure transformations - Validate mode: --validate — validate input file schema without performing any conversion

# Inspect fields (explore input file structure)
eulerforge convert --task sft --input data/sft_10k.jsonl --print-sample-flat

# Map mode (flat fields)
eulerforge convert --task sft --input data/custom.jsonl --output data/out.jsonl \
    --map prompt=instruction --map response=output

# Map mode (nested fields — automatic dot-path flattening)
eulerforge convert --task prompted_preference --input data/dpo_10k.jsonl --output data/dpo_10k_raw.jsonl \
    --map prompt=instruction.value --map chosen=chosen.value --map rejected=rejected.value

# Recipe mode (messages array)
eulerforge convert --task sft --input data/sft_10k.jsonl --output data/sft_10k_raw.jsonl \
    --recipe sft_messages --messages-expr json_record.messages

# Validate mode (schema validation without conversion)
eulerforge convert --task sft --input data/sft_10k_raw.jsonl --validate
Option Description
--task Target task: sft, preference, prompted_preference
--input Input JSONL file path
--output Output standard raw JSONL path (not required for --validate / --print-sample-flat)
--map OUT_KEY=EXPR Output key to field expression mapping (repeatable). E.g., --map prompt=instruction
--flatten dot\|none Flatten mode: dot (default) flattens nested dicts to dot-keys; none navigates dot-paths directly
--join-sep SEP Separator for joining list[str] values (default: newline)
--strict Error if a resolved value is None, empty string, or empty list
--recipe Built-in recipe name
--messages-expr EXPR Dot-path to messages array (required with --recipe sft_messages)
--num-proc Number of parallel workers (default: 1)
--overwrite Overwrite output file if it already exists
--max-rows Maximum number of rows to convert (sampling)
--validate Validate-only mode: validate input schema, no conversion
--print-sample-flat Print flattened key list from first 1-2 rows and exit (for field discovery)

Built-in Recipes

Recipe Task Input Structure Output
sft_messages sft Any dot-path to messages array (--messages-expr required) {prompt, response}
sft_instruction_output sft {instruction, output} {prompt, response}
dpo_nested_value prompted_preference {instruction.value, chosen.value, rejected.value} {prompt, chosen, rejected}
sft_messages_v1 sft {json_record.messages: [{role,content}]} {prompt, response}
dpo_nested_v1 prompted_preference {instruction.value, chosen.value, rejected.value} {prompt, chosen, rejected}
passthrough_prompted_preference_v1 prompted_preference {prompt, chosen, rejected} Same (extra keys removed)
passthrough_sft_prompt_response_v1 sft {prompt, response} Same (extra keys removed)
passthrough_preference_v1 preference {chosen, rejected} Same (extra keys removed)

Details: tutorials/00_data_preprocessing.md

eulerforge preprocess

Converts raw JSONL into tokenized processed JSONL.

eulerforge preprocess --task TASK --input RAW.jsonl --output PROCESSED.jsonl --model-name MODEL
Option Description
--task Task type: sft, preference, prompted_preference
--input Input raw JSONL file path
--output Output processed JSONL file path
--model-name HuggingFace model/tokenizer name
--max-length Maximum sequence length (default: 512)
--num-proc Number of parallel workers (default: 50% of CPU cores)
--text-col SFT text column name (default: text)
--prompt-col Prompt column name (default: prompt)
--response-col Response column name (default: response)
--chosen-col Chosen column name (default: chosen)
--rejected-col Rejected column name (default: rejected)

Details: tutorials/00_data_preprocessing.md

eulerforge bench

Inference benchmark. Configure target/baseline/judge models via a YAML spec, sample from benchmark data, and compare inference results. The target supports API models (Ollama/OpenAI/Gemini) or locally trained HF checkpoints.

eulerforge bench [OPTIONS]
Option Description
--preset PATH Bench YAML spec file path (required)
--set KEY=VALUE Override a configuration value (repeatable)
--output-dir DIR Results output directory
--validate-only Validate configuration only
--dry-run Extract data samples only (no model calls)
--target-output-dir PATH Target local model: training output root directory
--checkpoint TYPE Checkpoint type: final (default) | latest | best
--target-model-dir PATH Target local model: specify HF save_pretrained directory directly
--target-device DEVICE Target local model device override (e.g., cuda:0, cuda:1, cpu)

Sequential Model Loading (OOM Prevention)

Models are loaded one at a time, process the entire dataset, then unloaded. No more than one model resides in GPU memory at any time.

Phase 1: Load target → run inference on all samples → unload
Phase 2: Load baseline → run inference on all samples → unload (when enabled)
Phase 3: Load judge → evaluate all samples → unload (when enabled)
Phase 4: Aggregate results → same output format as before

LocalHFClient.unload() deletes the model/tokenizer and calls torch.cuda.empty_cache(). For API clients (ChatClient/JudgeClient), unload() is a no-op.

Execution Modes

Condition Mode Output
Target only target-only Target responses
Target + baseline comparison Both outputs
Judge + target only pointwise Score (1-10) + explanation
Judge + target + baseline pairwise Winner (A/B/tie) + score + explanation

Bench YAML Spec

Target specification methods (use exactly one):

bench:
  task: sft                                  # sft | preference
  data_path: data/sft_1k_bench_raw.jsonl
  sample:
    k: 10
    seed: 42
    shuffle: true
  generation:
    max_new_tokens: 256
    temperature: 0.7
    top_p: 0.95
  models:
    target:
      # Method A: API model (Ollama/OpenAI/Gemini)
      provider: ollama                       # ollama | openai | gemini
      model: "qwen3:0.6b"
      base_url: "http://localhost:11434/v1"

      # Method B: Training output directory (automatic checkpoint resolution)
      # output_dir: "outputs/run_20260301_120000"
      # checkpoint: "final"                  # final | latest | best (default: final)
      # device: "auto"                       # default: auto
      # dtype: "auto"                        # default: auto

      # Method C: Specify HF model directory directly
      # provider: local_hf                   # can be omitted (auto-configured)
      # model_dir: "outputs/run_20260301_120000/final"
    baseline:
      enabled: false
      provider: ollama
      model: "qwen3:4b"
    judge:
      enabled: false
      provider: ollama
      model: "gemma3:27b"
      mode: pointwise                        # pointwise | pairwise
      mitigate_position_bias: true           # pairwise A/B swap (2 rounds)
  output:
    out_dir: outputs/bench
    save_jsonl: true
    print_examples: true
    print_max_chars: 1500

Usage Examples

# Using an API model
eulerforge bench --preset configs/bench/sft_target_only.yml

# Load final checkpoint from training output directory
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000

# Use latest checkpoint
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000 --checkpoint latest

# Specify model directory directly
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-model-dir outputs/run_20260301_120000/final

# Specify GPU (load target on GPU 1 when judge is using GPU 0)
eulerforge bench --preset configs/bench/sft_local.yml \
  --target-output-dir outputs/run_20260301_120000 --target-device cuda:1

# Override configuration
eulerforge bench --preset configs/bench/sft_target_only.yml --set bench.sample.k=20

# Validate only
eulerforge bench --preset configs/bench/sft_target_only.yml --validate-only

# Preview samples (no model calls)
eulerforge bench --preset configs/bench/sft_target_only.yml --dry-run

Bench Data Format

File Keys Task
sft_1k_bench_raw.jsonl prompt, response sft
dpo_1k_bench_raw.jsonl prompt, chosen, rejected preference

Details: tutorials/11_bench.md

eulerforge eval (stub)

Perplexity evaluation (not yet implemented).

eulerforge grid

Hyperparameter grid / random / Bayesian search via Optuna. See the eulerforge grid section below for full documentation.


Training Types (training.type)

Type Description Data Format Reference Model
sft Supervised Fine-Tuning (default) {input_ids, attention_mask, labels} Not required
dpo Direct Preference Optimization chosen/rejected pairs (DPO format) Adapter disabled
orpo Odds Ratio Preference Optimization chosen/rejected pairs (DPO format) Not required
rm Reward Model (Bradley-Terry) chosen/rejected pairs (DPO format) Not required
ppo PPO/RLHF Prompt-only + reward model Adapter disabled

Common Training Settings

Key Alias Default Description
max_train_steps max_steps 10000 Maximum training steps (micro-step basis)
batch_size 4 Batch size
grad_accum_steps 1 Gradient accumulation steps
lr 1e-5 Learning rate
weight_decay 0.01 Weight decay
warmup_steps 500 Warmup steps
max_grad_norm 1.0 Maximum gradient clipping norm
log_steps 100 Logging interval (micro-steps)
save_steps 1000 Checkpoint saving interval (micro-steps)
val_steps save_steps value Validation interval (micro-steps). Defaults to save_steps if unset

Note: max_train_steps and max_steps are synonymous. If both are specified, max_steps takes precedence. In presets, max_train_steps is the canonical key.

Step Terminology: - Micro-step: Each batch forward/backward constitutes 1 micro-step. max_steps is counted in micro-steps. - Optimizer step: Occurs every grad_accum_steps batches. Total optimizer steps = max_steps / grad_accum_steps. - Effective batch: batch_size x grad_accum_steps. The actual number of samples processed per optimizer step.

ORPO Options

training:
  type: orpo
  orpo_lambda: 1.0  # ORPO term weight (required, positive)

RM Options

training:
  type: rm
  # RewardHead is auto-generated (hidden_size → 1)

PPO Options

training:
  type: ppo
  ppo:
    clip_range: 0.2     # PPO clipping range
    kl_coef: 0.1        # KL penalty coefficient
    epochs: 4           # PPO update epochs per batch
    max_gen_len: 64     # Maximum generation length
    temperature: 1.0    # Sampling temperature
  reward_model:
    model_name: "path/to/model"  # Reward model path
    checkpoint_path: ""          # Checkpoint (optional)

Model Load Precision (model.load_precision)

Declare precision and quantization options for model loading in YAML.

model:
  load_precision:
    mode: int4               # fp32 | fp16 | bf16 | int8 | int4
    compute_dtype: bf16      # Compute dtype for int8/int4 operations (fp16 | bf16)
    quant_type: nf4           # int4 only: nf4 | fp4
    double_quant: true        # int4 only
    dequantize_on_train: true # Auto-dequantize when base_ffn becomes trainable

Behavior by Mode

Mode torch_dtype BitsAndBytes Use Case
fp32 float32 CPU / debugging
fp16 float16 GPU memory savings
bf16 bfloat16 GPU default (Ampere+)
int8 load_in_8bit 8-bit quantization
int4 load_in_4bit 4-bit quantization (QLoRA)

Phase Dequantize Policy

When dequantize_on_train=true (the default) and base_ffn becomes trainable during a phase transition, quantized modules are replaced with nn.Linear and weights are restored to compute_dtype.

Mode Replacement Target Dequantize Method
int4 Linear4bitnn.Linear dequantize_4bit()
int8 Linear8bitLtnn.Linear int8_vectorwise_dequant()
bf16/fp16/fp32 No replacement Set requires_grad directly

Replacement is performed automatically just before requires_grad=True is set within set_trainable_by_groups(). Since quantized module forward() re-allocates int8/int4 data, simple parameter casting is insufficient — the module itself must be replaced.

Automatic target_layers scoping: When base_ffn is active and no explicit target_layers is specified, target layers are automatically derived from injection.start_layer/injection.num_layers. Only layers where MoE is injected are dequantized/unfrozen.

Backward Compatibility

Using the legacy training.quant_bits produces a warning and automatic conversion: - quant_bits: 4mode: int4 - quant_bits: 8mode: int8 - quant_bits: 16mode: bf16

Spec details: docs/fixtures/specs/load_precision_spec.md


Data Input (data Section)

EulerForge supports three standard data task types:

Task Raw Schema Processed Schema
sft {text} or {prompt, response} {input_ids, attention_mask, labels}
preference {chosen, rejected} {chosen_input_ids, chosen_attention_mask, chosen_labels, rejected_input_ids, rejected_attention_mask, rejected_labels}
prompted_preference {prompt, chosen, rejected} Above + {prompt_input_ids, prompt_attention_mask, prompt_len}

Configuration Examples

# Method 1: Already tokenized processed JSONL
data:
  format: processed
  path: data/sft_processed.jsonl

# Method 2: Raw JSONL (auto-preprocessed and cached during training)
data:
  format: raw
  task: prompted_preference
  path: data/dpo_10k_raw.jsonl
  max_length: 512
  # num_proc: default = 50% of CPU cores (automatic)
  # cache_dir: default = outputs/.cache (shared cache across runs)
  #
  # Cache filename convention: {input_filename}_{task}_{model_name}_len{max_length}.jsonl
  # Example: dpo_10k_raw_prompted_preference_qwen3.5-0.8b-base_len512.jsonl
  # Automatically reused when parameters match (no re-preprocessing)

# Method 3 (legacy): Specify processed_data_path directly
processed_data_path: data/sft_processed.jsonl

Labels Masking Policy

Integration Test Data Policy

Data Validation

Processed data is checked against the first 5 rows for required keys before training starts. On missing keys, a 3-line error is shown:

Data: processed dataset row 0 missing 'input_ids'
Fix: Run `eulerforge preprocess ...` or set data.format=raw with schema mapping
See: docs/tutorials/en/00_data_preprocessing.md

data.task / training.type Compatibility

When data.format=raw, the compatibility between data.task and training.type is automatically validated:

data.task Compatible training.type
sft sft, ppo
preference dpo, orpo, rm
prompted_preference dpo, orpo, rm
prompt_only ppo

On mismatch, an error is raised:

Data Config: data.task='sft' is incompatible with training.type='dpo'. Task 'sft' produces data for sft, ppo training
Fix: Set --set training.type=sft or change data.task to match
See: docs/tutorials/en/00_data_preprocessing.md

Configuration Override Mechanism

Use --set with dot-path notation to override values:

--set training.lr=2e-5
--set injection.lora_r=32
--set training.phases.1.step=100  # Update a specific list element's field (sparse)
--set training.phases.0.trainable=[lora,attn_lora]  # List-of-list element value

List index override: Use training.phases.N.field=value to update only the field at index N. Other elements and other fields within that element are preserved from the preset. Out-of-range indices raise a clear error.

Type Inference

Value Parsed As
true bool
false bool
null None
42 int
3.14 float
2e-5 float
hello str

Precedence

CLI --set overrides > YAML preset values

Output and Checkpoints

Training produces the following directory structure:

outputs/run_YYYYMMDD_HHMMSS/
├── resolved_config.json    # Full resolved configuration (for reproducibility)
├── checkpoint-latest/      # Latest checkpoint
│   ├── model files...
│   ├── tokenizer files...
│   └── training_state.pt   # Optimizer, scheduler, step, best_val_loss
├── checkpoint-best/        # Best validation loss checkpoint
│   └── ...
└── final/                  # Final model after training completes
    └── ...

checkpoint-best is saved at val_steps intervals, and checkpoint-latest is saved at save_steps intervals, independently. Example: with val_steps=500, save_steps=1000 — best is saved at step 500, latest at step 1000.

Automatic Training Resume

If checkpoint-latest/training_state.pt exists, training resumes automatically from that point: - Restores micro_step, optimizer_step, best_val_loss - LR scheduler fast-forward: the cosine schedule is immediately restored to the optimizer step at resume (no re-warmup from 0) - ETA calculation: remaining time is estimated from steps taken since resume, not from total steps


MoE Stability Validation

When using MoE strategies (mixture_lora, moe_expert_lora), the validator automatically performs stability checks.

Required Parameters

Key Role Recommended Value Rationale
moe.router_z_loss_coef Router logit stabilization 0.001 ST-MoE: prevents softmax overflow
moe.load_balance.type Load balancing policy aux_loss Prevents routing collapse
moe.load_balance.aux_loss_coef Auxiliary loss weight 0.01 Required when type=aux_loss

Optional Parameters

Key Role Default Recommended Range
moe.capacity_factor_train Expert capacity upper bound during training Model default 1.0-2.0 (ST-MoE: 1.25)
moe.capacity_factor_eval Expert capacity upper bound during evaluation Model default >= train value (ST-MoE: 2.0)
moe.router_dtype Router computation precision float32 float32 (float16/bfloat16 can be numerically unstable)
moe.load_balance.bias_update_speed Expert bias adaptation speed Required when type=aux_loss_free (0.001)

Error Message Format

MoE validation failures are reported in 3-line format:

MoE Config: moe.router_z_loss_coef is required for MoE strategies.
Fix: Set moe.router_z_loss_coef: 0.001 (ST-MoE recommended)
See: docs/tutorials/en/09_moe_stability_and_validation.md

Warnings (Non-fatal)

Condition Meaning
router_z_loss_coef = 0 z-loss disabled — may be unstable in large-scale training
router_z_loss_coef > 0.1 Unusually large value — excessive router constraint
load_balance.type = none No load balancing — risk of routing collapse
aux_loss_coef = 0 Effectively disables load balancing
router_dtype = float16/bfloat16 Potential numerical instability in router softmax
capacity_factor_eval < train Increased token dropping during evaluation
No phase includes router Router not trained — cannot adapt to data

Preflight Checks

--preflight performs runtime checks after loading the model:

eulerforge train --preset PRESET.yml --preflight
Check Item Description
Group parameter count Error if a group referenced by a phase has 0 parameters
target_layers range Error if indices exceed the model's layer count

Details: tutorials/09_moe_stability_and_validation.md


Logging and Metrics

The training loop logs through a two-tier metrics system.

Metrics Levels

Level Recorded Items
minimal (default) step, main_loss, total_loss, aux_loss, lr, grad_norm, throughput, tokens/samples, training-type-specific metrics
advanced minimal + MoE routing stats (token_frac, entropy, importance_cv, router_logit_max, etc.)

Minimal Metrics

Tag Description
train/main_loss Primary loss (SFT/DPO/ORPO/RM/PPO)
train/total_loss Total loss (main + aux * weight)
train/aux_loss Sum of MoE auxiliary losses
train/learning_rate Current learning rate
train/grad_norm Global L2 gradient norm
train/tokens_seen Cumulative training tokens (labels != -100)
train/samples_seen Cumulative processed samples (preference: counted as pairs)
train/optimizer_step Cumulative optimizer steps
train/micro_step Cumulative micro-steps
train/effective_batch Effective batch size (batch_size x grad_accum_steps)
DPO: train/reward_margin Chosen-rejected reward margin
ORPO: train/sft_loss, train/orpo_loss Individual SFT/ORPO losses
PPO: train/kl, train/reward_mean KL divergence, mean reward

Advanced Metrics (MoE Only)

Tag Description
moe/token_frac_mean Mean token fraction per expert
moe/token_frac_std Standard deviation of token fraction per expert
moe/token_frac_max Fraction of the most selected expert
moe/entropy_mean Mean router entropy
moe/importance_cv Importance coefficient of variation (imbalance detection)
moe/aux_loss_total Total aux loss sum
moe/router_logit_max Maximum router logit (numerical explosion detection)
moe/num_moe_modules Number of MoE modules

LoRA Handoff Logging

When lora_handoff is configured, the following milestone events are automatically logged during training:

Event Log Message Timing
Schedule init [Handoff] Schedule: expert_lora(step 4000→6000, ...) At training start
Fade start [Handoff] expert_lora fade started at step 4000 (curve=cosine, ...) When start_step is reached
Fade complete [Handoff] expert_lora fade complete at step 6000 (scale=0.0000) When start_step + duration_steps is reached
Freeze [Handoff] expert_lora frozen at step 6000 (scale=0.0000) When end_action=freeze
Ramp start [Handoff] base_ffn_ramp started at step 2000 (LR x1.0→x3.0) When ramp start_step is reached
Ramp complete [Handoff] base_ffn_ramp complete at step 4000 (LR x3.0) When ramp end_step is reached

Each event is logged exactly once (one-shot). attn_lora follows the same pattern.

Periodic logs (at log_steps intervals) automatically include handoff state:

[Phase2] Step 400/600 (micro 4500/6000) | Loss: 0.1234 | LR: 1.00e-05 | ... | Handoff[expert_lora=0.75, attn_lora=0.80, ffn_lr_mult=2.50]

TensorBoard Integration

TensorBoard logging is an optional dependency:

pip install eulerforge[tb]

Configuration example:

logging:
  metrics_level: advanced      # "minimal" | "advanced"
  tensorboard:
    enabled: true              # Enable TensorBoard logging (default: false)
    log_dir: "outputs/tb"      # Log directory
  log_interval: 50             # Write to TensorBoard every N steps
  max_experts_log: 16          # Log detailed stats for top N experts in advanced mode

Override metrics level via CLI:

eulerforge train --preset PRESET.yml --metrics-level advanced

If tensorboard is not installed, a warning is printed and training proceeds normally.

Details: tutorials/10_metrics_monitoring.md


eulerforge grid

Basic Usage

# Validate spec only (dry-run)
eulerforge grid configs/grid/sft_random_search.yml --dry-run

# Run
eulerforge grid configs/grid/sft_random_search.yml

# Specify project root explicitly
eulerforge grid configs/grid/sft_random_search.yml --project-root /path/to/project

Options

Option Default Description
spec (required) Grid search YAML spec file path
--dry-run false Validate spec only; do not run training
--project-root DIR cwd Base directory for resolving relative paths

Optuna Dependency

pip install eulerforge[hpo]

If optuna is not installed, a message suggesting pip install eulerforge[hpo] is shown and the process exits.

Optuna Sampler by Method

Method Sampler Notes
grid GridSampler Discrete spaces only (categorical / choices)
random RandomSampler Supports both continuous and discrete
bayes TPESampler Supports both continuous and discrete

Combining method: "grid" + type: "float" + low/high is an error. To search continuous floats in grid mode, use the choices: [val1, val2] format.

Output

<output_root>/
├── trial_0000/
│   ├── metrics.jsonl       # Per-step metrics (train/total_loss, etc.)
│   └── checkpoint-latest/
├── trial_0001/
│   └── ...
├── summary.json            # Overall results (best_trial + all_trials)
└── summary.csv

Spec format details: docs/fixtures/specs/grid_search_spec.md


Python API: eulerforge.loader

Public API for loading EulerForge-trained checkpoints directly from Python.

from eulerforge import load_model

load_model(path, *, checkpoint, device, dtype, load_precision) -> LoadedModel

Parameter Default Description
path (required) Path to a run_dir or checkpoint_dir
checkpoint "final" Checkpoint selection for run_dir: final | best | latest
device "auto" "auto" | "cpu" | "cuda" | "cuda:0", etc.
dtype "auto" "auto" | "float32" | "bfloat16", etc.
load_precision None Load precision: None | "fp32" | "fp16" | "bf16" | "int8" | "int4"

Return Type

@dataclass
class LoadedModel:
    model: nn.Module           # eval mode
    tokenizer: PreTrainedTokenizer
    metadata: ModelMetadata

@dataclass
class ModelMetadata:
    strategy: str              # "dense_lora" | "moe_expert_lora" | "mixture_lora" | "none"
    backbone: str              # "qwen3" | "llama" | "gemma3" | ""
    path_type: str             # "run_dir" | "checkpoint_dir"
    checkpoint_dir: str        # Actual checkpoint directory path
    lora_config: dict | None   # {"lora_r": int, "lora_alpha": float}
    structure_preserved: bool  # Whether MoE/MixtureLoRA structure is preserved
    load_precision: str | None # "fp32" | "fp16" | "bf16" | "int8" | "int4" | None

Automatic Path Classification

Path Type Detection Criteria Behavior
run_dir resolved_config.json exists Resolves subdirectory via checkpoint parameter
checkpoint_dir config.json exists Loads directly

Usage Examples

from eulerforge import load_model

# 1. Load from training output directory (final checkpoint)
result = load_model("outputs/run_20260311_163425")
print(result.metadata.strategy)     # "moe_expert_lora"
print(result.metadata.backbone)     # "qwen3"

# 2. Load best checkpoint
result = load_model("outputs/run_20260311_163425", checkpoint="best")

# 3. Specify checkpoint directory directly
result = load_model("outputs/run_20260311_163425/final", device="cuda:0")

# 4. Quantized loading (int4/int8)
result = load_model("outputs/run_20260311_163425", load_precision="int4")
result = load_model("outputs/run_20260311_163425", load_precision="bf16")

# 5. Inference
messages = [{"role": "user", "content": "Hello"}]
text = result.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = result.tokenizer(text, return_tensors="pt").to(result.model.device)
with torch.no_grad():
    out = result.model.generate(**inputs, max_new_tokens=128)
print(result.tokenizer.decode(out[0], skip_special_tokens=True))

# 6. Using metadata
if result.metadata.structure_preserved:
    print("This checkpoint preserves the MoE structure")
if result.metadata.lora_config:
    print(f"LoRA r={result.metadata.lora_config['lora_r']}")

Loading Strategies

Strategy Loading Method Structure Preserved
dense_lora LoRA merge → dense model No
moe_expert_lora Reconstruct MoE architecture (expert+router) Yes
mixture_lora Reconstruct MixtureLoRA architecture Yes
none Load via from_pretrained() directly N/A

Note: moe_expert_lora and mixture_lora require resolved_config.json. Without it, the system falls back to averaging experts into a dense model (with a WARNING).


eulerforge export-hf

Exports an EulerForge checkpoint as a HuggingFace Transformers-compatible model.

Basic Usage

# dense_lora → merged HF model
eulerforge export-hf --checkpoint outputs/run_20260311_163425 --output ./exported

# MoE → custom_moe HF model (load with trust_remote_code=True)
eulerforge export-hf --checkpoint outputs/run_moe --output ./exported_moe

# dry-run (print plan only)
eulerforge export-hf --checkpoint outputs/run --output ./out --dry-run

# validate-only
eulerforge export-hf --checkpoint outputs/run --output ./out --validate-only

Options

Option Default Description
--checkpoint (required) Path to run_dir or checkpoint_dir
--output (required) Export output directory (error if it already exists)
--format auto auto | merged | custom_moe
--select-checkpoint final final | best | latest
--dtype auto auto | fp32 | fp16 | bf16
--safe-serialization True Whether to use safetensors (disable with --no-safe-serialization)
--copy-tokenizer True Whether to copy the tokenizer (disable with --no-copy-tokenizer)
--dry-run False Print plan only; no actual export
--validate-only False Validate only
--skip-diversity-check False Skip expert diversity check (experts may not be differentiated with lightweight training)

Export Format by Strategy

Strategy format=auto Result HF Loading
dense_lora merged Standard dense HF model from_pretrained(path)
mixture_lora custom_moe base+router+N LoRA experts from_pretrained(path, trust_remote_code=True)
moe_expert_lora custom_moe N expert FFN+router from_pretrained(path, trust_remote_code=True)

Key point: Exporting moe_expert_lora/mixture_lora strategies as merged destroys the expert structure. format=merged + MoE raises a ValueError.

Spec details: docs/fixtures/specs/export_hf_spec.md


eulerforge pretrain (Plugin)

Plugin command: pretrain is provided via the plugin system. It is registered in the CLI only when eulerforge.plugins.pretrain_plugin is present. Excluding this module from a public distribution automatically hides the command.

Performs scratch pretraining on a HuggingFace-format model exported by EulerStack or a similar tool, using raw text data. This is a full-parameter causal LM training pipeline, completely separate from the train command (which applies LoRA/MoE injection).

eulerforge pretrain [OPTIONS]
Option Description
--preset PATH Pretrain YAML preset path (required)
--set KEY=VALUE Config override (repeatable)
--output-dir DIR Output directory
--validate-only Validate config only, no training

pretrain vs train differences

Item train (fine-tuning) pretrain (scratch)
Model load from_pretrained (with trained weights) from_pretrained (initialized weights)
Injection LoRA/MoE applied None (all parameters trainable)
Phase schedule freeze/unfreeze control None (everything trainable)
Data instruction/preference format raw text (packed chunking)
Forbidden keys injection, moe, backbone → error

Pretrain YAML preset structure

# Device
device: "cuda:0"             # cuda:0, cuda:1, cpu

# Model (EulerStack export directory)
model_dir: "outputs/full_hybrid_moe"
trust_remote_code: true

# Tokenizer (specify separately if not in model_dir)
tokenizer: "gpt2"

# Data
data:
  path: "data/dolma_10k.jsonl"
  text_column: "text"        # text key in JSONL
  max_length: 1024
  packing: true              # packed chunking (concatenate → split to fixed length)

# Training
training:
  max_steps: 500
  batch_size: 2
  grad_accum_steps: 4
  lr: 3.0e-4
  weight_decay: 0.1
  warmup_steps: 50
  max_grad_norm: 1.0
  log_steps: 10
  save_steps: 250
  dtype: "float32"           # Hybrid models (Hyena/RetNet) require float32
  amp: false                 # FFT ops don't support bf16 → disabled
  seed: 42

Usage examples

# Basic run
eulerforge pretrain --preset configs/presets/pretrain/eulerstack_hybrid_moe.yml

# Override settings
eulerforge pretrain --preset ... --set training.max_steps=1000 --set training.lr=1e-4

# Validate only
eulerforge pretrain --preset ... --validate-only

# Specify output directory
eulerforge pretrain --preset ... --output-dir outputs/my_pretrain

Output structure

outputs/pretrain_YYYYMMDD_HHMMSS/
├── pretrain_config.json         # Config snapshot
├── metrics.jsonl                # Per-step loss, lr
├── checkpoint-250/              # Mid-training checkpoint
│   ├── config.json
│   ├── model.safetensors
│   └── tokenizer files...
└── final/                       # Final model
    ├── config.json
    ├── model.safetensors
    └── tokenizer files...

After pretraining, point eulerforge train's model_name to the final/ directory for LoRA fine-tuning.