EulerForge

An LLM fine-tuning toolkit that trains dense models in an MoE style

A research-oriented fine-tuning framework that injects LoRA into HuggingFace models and lets you train a dense model as a Mixture-of-LoRAs or MoE Expert LoRA structure. On top of a familiar dense-SFT flow, EulerForge adds Dense → MoE conversion and phase scheduling, so expert specialization, routing, and MoE stability can be studied reproducibly on a modest GPU budget — without rewriting model code. A single YAML preset carries you through SFT → DPO/ORPO → RM → PPO.

What EulerForge focuses on

Rather than aiming at a general-purpose SFT framework, we have put the weight on expressing an MoE research flow as a standardized configuration.

Problems we focus on

  • Dense → MoE conversion as a first-class citizen
  • Reproducible experiments around routing and expert specialization
  • Stable large-model fine-tuning via phase scheduling
  • Consistent observation of MoE router stability and aux loss

Design choices

  • Experiments live in YAML configuration, not model code
  • Preflight + MoE stability checks catch issues before training starts
  • Multi-stage pipeline as auto checkpoint handoff (SFT → DPO/ORPO → RM → PPO), not separate scripts
  • 4-bit / 8-bit quantized training so experiments fit a modest GPU budget

Core capabilities

The four pillars that make EulerForge a good fit for MoE research

1. Dense → MoE conversion

Via mixture_lora and moe_expert_lora injection — turn any dense Qwen / Llama / Gemma into an MoE-style trainable model. No model-code rewrites required.

2. LoRA Handoff + Phase Scheduling

Staged unfreezing (router → LoRA → base FFN) that makes large-model fine-tuning stable and reproducible.

3. One preset, full pipeline

SFT → DPO / ORPO → RM → PPO as one command sequence, with automatic base / LoRA detection between stages.

4. Preflight + MoE stability validation

Catches configuration errors and MoE router-collapse risk before a single GPU cycle is burned.

Four injection strategies

Start from the same dense backbone and decide what kind of MoE experiment to run — in one YAML line.

dense_lora

Classic LoRA adapters — the fastest path to domain adaptation. Ideal as a baseline control against MoE variants.

mixture_lora

Router + multiple LoRA experts. Turns a dense model into a token-level multi-task-routed structure.

moe_expert_lora

Replace the FFN with an MoE block and inject LoRA into each expert (DeepSeek-style). Converts a dense backbone into a full MoE training target.

native_moe_expert_lora

Inject LoRA into each expert of an already-MoE model such as Mixtral or Gemma 4 MoE for efficient fine-tuning.

Five training paths — in one pipeline

SFT → DPO / ORPO → RM → PPO. Checkpoints from each stage flow automatically into the next.

Training type Description
SFTSupervised Fine-Tuning — the baseline alignment stage
DPODirect Preference Optimization — no reference model, memory efficient
ORPOOdds Ratio Preference Optimization — single-forward-pass alignment
RMReward Model (Bradley-Terry)
PPOProximal Policy Optimization — final RLHF stage

Dense → MoE conversion pipeline

EulerForge automates every step required to turn a HuggingFace dense model into an MoE training target.

# One YAML preset converts a dense Qwen into an MoE Expert LoRA target
eulerforge train --preset configs/presets/qwen3.5_0.8b_moe_expert_lora_sft.yml

# Internally:
# 1. Load HuggingFace AutoModel (bnb 4/8-bit optional)
# 2. Backbone adapter locates FFN / attention modules
# 3. Replace FFN with MoE block; inject LoRA per expert
# 4. Phase scheduler: router warmup -> LoRA -> base FFN
# 5. Preflight + MoE stability validation before training
YAML preset
Config + preflight
Base model load
MoE injection
Phase-scheduled train
HF export

Phase Scheduling & LoRA Handoff

Stage who is trainable over time — large-model fine-tuning becomes stable and reproducible.

Router warmup

Early in training, only the router is trainable so that the token-to-expert distribution stabilizes. Without it, router / expert collapse is common.

LoRA-only phase

Once the router is stable, only the expert LoRAs are trained. The base FFN remains frozen.

LoRA Handoff (fade / ramp)

Gradually fade LoRA weights while knowledge is handed off to the base FFN, keeping the MoE structure at inference while reducing LoRA dependence.

Supported backbones

The same injection / training code works across every backbone family.

Backbone Models
QwenQwen2 / Qwen3 / Qwen3.5 (dense)
LlamaLlama 2 / Llama 3 / Llama 3.2, TinyLlama, Mistral
Gemma 3Gemma 3 1B / 4B (dense)
Gemma 4Gemma 4 dense (e2b / e4b) + native MoE (26b a4b)
MixtralMixtral 8x7B / 8x22B (native MoE)
Quantized trainingnf4 / int4 / int8 via bitsandbytes

Selected presets

Ready-to-run YAML presets live under configs/presets/.

Preset Strategy Training
qwen3.5_0.8b_dense_lora_sft.ymlDense LoRASFT
qwen3.5_0.8b_mixture_lora_sft.ymlMixture-of-LoRAsSFT
qwen3.5_0.8b_moe_expert_lora_sft.ymlMoE Expert LoRASFT
qwen3.5_0.8b_moe_expert_lora_dpo.ymlMoE Expert LoRADPO
llama3_1b_moe_expert_lora_sft_handoff.ymlMoE Expert LoRA + HandoffSFT
gemma3_4b_moe_expert_lora_orpo_handoff.ymlMoE Expert LoRA + HandoffORPO
gemma4_26b_a4b_native_expert_lora_sft.ymlNative MoE Expert LoRASFT
mixtral_native_expert_lora_sft.ymlNative MoE Expert LoRASFT

Install & quickstart

v0.1.0 — Requirements: Python ≥ 3.9, PyTorch ≥ 2.1, Transformers ≥ 5.5.

# Install from source (v0.1.0)
git clone https://github.com/eulerwa/eulerforge && cd eulerforge && pip install -e .

# Optional: HPO / TensorBoard extras
pip install -e .[hpo]
pip install -e .[tb]

# 1. Train — Dense LoRA SFT on Qwen3.5-0.8B
eulerforge train \
  --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
  --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

# 2. Evaluate (target / baseline / judge)
eulerforge bench --preset configs/bench/sft_with_judge.yml \
  --target-output-dir outputs/run_YYYYMMDD_HHMMSS

# 3. Export as a standard HF directory
eulerforge export-hf \
  --checkpoint outputs/run_YYYYMMDD_HHMMSS --output ./exported

# 4. Load in Python
# from eulerforge import load_model
# result = load_model("outputs/run_YYYYMMDD_HHMMSS")

Tutorials & CLI reference

Step-by-step guides and the complete command surface

Internationalized CLI

Five-language log output — a collaborating team can use the same tool in their own language.

eulerforge --lang ko train --preset PRESET.yml
eulerforge --lang en train --preset PRESET.yml
eulerforge --lang zh train --preset PRESET.yml
eulerforge --lang ja train --preset PRESET.yml
eulerforge --lang es train --preset PRESET.yml

Start your MoE experiments with EulerForge

v0.1.0 released — open source, reproducible research.

Get started on GitHub Contact Us