EulerForge

An LLM fine-tuning toolkit that trains dense models in an MoE style

A research-oriented fine-tuning framework that injects LoRA into HuggingFace models and lets you train a dense model as a Mixture-of-LoRAs or MoE Expert LoRA structure. On top of a familiar dense-SFT flow, EulerForge adds Dense → MoE conversion and phase scheduling, so expert specialization, routing, and MoE stability can be studied reproducibly on a modest GPU budget — without rewriting model code. A single YAML preset carries you through SFT → DPO/ORPO → RM → PPO.

What EulerForge focuses on

Rather than aiming at a general-purpose SFT framework, we have put the weight on expressing an MoE research flow as a standardized configuration.

Problems we focus on

Dense → MoE conversion as a first-class citizen
Reproducible experiments around routing and expert specialization
Stable large-model fine-tuning via phase scheduling
Consistent observation of MoE router stability and aux loss

Design choices

Experiments live in YAML configuration, not model code
Preflight + MoE stability checks catch issues before training starts
Multi-stage pipeline as auto checkpoint handoff (SFT → DPO/ORPO → RM → PPO), not separate scripts
4-bit / 8-bit quantized training so experiments fit a modest GPU budget

Core capabilities

The four pillars that make EulerForge a good fit for MoE research

1. Dense → MoE conversion

Via mixture_lora and moe_expert_lora injection — turn any dense Qwen / Llama / Gemma into an MoE-style trainable model. No model-code rewrites required.

2. LoRA Handoff + Phase Scheduling

Staged unfreezing (router → LoRA → base FFN) that makes large-model fine-tuning stable and reproducible.

3. One preset, full pipeline

SFT → DPO / ORPO → RM → PPO as one command sequence, with automatic base / LoRA detection between stages.

4. Preflight + MoE stability validation

Catches configuration errors and MoE router-collapse risk before a single GPU cycle is burned.

Four injection strategies

Start from the same dense backbone and decide what kind of MoE experiment to run — in one YAML line.

`dense_lora`

Classic LoRA adapters — the fastest path to domain adaptation. Ideal as a baseline control against MoE variants.

`mixture_lora`

Router + multiple LoRA experts. Turns a dense model into a token-level multi-task-routed structure.

`moe_expert_lora`

Replace the FFN with an MoE block and inject LoRA into each expert (DeepSeek-style). Converts a dense backbone into a full MoE training target.

`native_moe_expert_lora`

Inject LoRA into each expert of an already-MoE model such as Mixtral or Gemma 4 MoE for efficient fine-tuning.

Five training paths — in one pipeline

SFT → DPO / ORPO → RM → PPO. Checkpoints from each stage flow automatically into the next.

Training type	Description
SFT	Supervised Fine-Tuning — the baseline alignment stage
DPO	Direct Preference Optimization — no reference model, memory efficient
ORPO	Odds Ratio Preference Optimization — single-forward-pass alignment
RM	Reward Model (Bradley-Terry)
PPO	Proximal Policy Optimization — final RLHF stage

Dense → MoE conversion pipeline

EulerForge automates every step required to turn a HuggingFace dense model into an MoE training target.

# One YAML preset converts a dense Qwen into an MoE Expert LoRA target
eulerforge train --preset configs/presets/qwen3.5_0.8b_moe_expert_lora_sft.yml

# Internally:
# 1. Load HuggingFace AutoModel (bnb 4/8-bit optional)
# 2. Backbone adapter locates FFN / attention modules
# 3. Replace FFN with MoE block; inject LoRA per expert
# 4. Phase scheduler: router warmup -> LoRA -> base FFN
# 5. Preflight + MoE stability validation before training

YAML preset

→

Config + preflight

→

Base model load

→

MoE injection

→

Phase-scheduled train

→

HF export

Phase Scheduling & LoRA Handoff

Stage who is trainable over time — large-model fine-tuning becomes stable and reproducible.

Router warmup

Early in training, only the router is trainable so that the token-to-expert distribution stabilizes. Without it, router / expert collapse is common.

LoRA-only phase

Once the router is stable, only the expert LoRAs are trained. The base FFN remains frozen.

LoRA Handoff (fade / ramp)

Gradually fade LoRA weights while knowledge is handed off to the base FFN, keeping the MoE structure at inference while reducing LoRA dependence.

Supported backbones

The same injection / training code works across every backbone family.

Backbone	Models
Qwen	Qwen2 / Qwen3 / Qwen3.5 (dense)
Llama	Llama 2 / Llama 3 / Llama 3.2, TinyLlama, Mistral
Gemma 3	Gemma 3 1B / 4B (dense)
Gemma 4	Gemma 4 dense (e2b / e4b) + native MoE (26b a4b)
Mixtral	Mixtral 8x7B / 8x22B (native MoE)
Quantized training	nf4 / int4 / int8 via bitsandbytes

Selected presets

Ready-to-run YAML presets live under configs/presets/.

Preset	Strategy	Training
`qwen3.5_0.8b_dense_lora_sft.yml`	Dense LoRA	SFT
`qwen3.5_0.8b_mixture_lora_sft.yml`	Mixture-of-LoRAs	SFT
`qwen3.5_0.8b_moe_expert_lora_sft.yml`	MoE Expert LoRA	SFT
`qwen3.5_0.8b_moe_expert_lora_dpo.yml`	MoE Expert LoRA	DPO
`llama3_1b_moe_expert_lora_sft_handoff.yml`	MoE Expert LoRA + Handoff	SFT
`gemma3_4b_moe_expert_lora_orpo_handoff.yml`	MoE Expert LoRA + Handoff	ORPO
`gemma4_26b_a4b_native_expert_lora_sft.yml`	Native MoE Expert LoRA	SFT
`mixtral_native_expert_lora_sft.yml`	Native MoE Expert LoRA	SFT

Install & quickstart

v0.1.0 — Requirements: Python ≥ 3.9, PyTorch ≥ 2.1, Transformers ≥ 5.5.

# Install from source (v0.1.0)
git clone https://github.com/eulerwa/eulerforge && cd eulerforge && pip install -e .

# Optional: HPO / TensorBoard extras
pip install -e .[hpo]
pip install -e .[tb]

# 1. Train — Dense LoRA SFT on Qwen3.5-0.8B
eulerforge train \
  --preset configs/presets/qwen3.5_0.8b_dense_lora_sft.yml \
  --set data.path=data/sft_10k_raw.jsonl --set data.max_length=512

# 2. Evaluate (target / baseline / judge)
eulerforge bench --preset configs/bench/sft_with_judge.yml \
  --target-output-dir outputs/run_YYYYMMDD_HHMMSS

# 3. Export as a standard HF directory
eulerforge export-hf \
  --checkpoint outputs/run_YYYYMMDD_HHMMSS --output ./exported

# 4. Load in Python
# from eulerforge import load_model
# result = load_model("outputs/run_YYYYMMDD_HHMMSS")

Tutorials & CLI reference

Step-by-step guides and the complete command surface

Tutorial series →

23 numbered guides from Getting Started to a full MoE lab pipeline. Available in English and Korean.

CLI reference →

train, convert, preprocess, bench, export-hf, grid — all flags and options.

GitHub →

Source, issues, releases. Apache License 2.0.

Internationalized CLI

Five-language log output — a collaborating team can use the same tool in their own language.

eulerforge --lang ko train --preset PRESET.yml
eulerforge --lang en train --preset PRESET.yml
eulerforge --lang zh train --preset PRESET.yml
eulerforge --lang ja train --preset PRESET.yml
eulerforge --lang es train --preset PRESET.yml