An LLM fine-tuning toolkit that trains dense models in an MoE style
A research-oriented fine-tuning framework that injects LoRA into HuggingFace models and lets you train a dense model as a Mixture-of-LoRAs or MoE Expert LoRA structure. On top of a familiar dense-SFT flow, EulerForge adds Dense → MoE conversion and phase scheduling, so expert specialization, routing, and MoE stability can be studied reproducibly on a modest GPU budget — without rewriting model code. A single YAML preset carries you through SFT → DPO/ORPO → RM → PPO.
Rather than aiming at a general-purpose SFT framework, we have put the weight on expressing an MoE research flow as a standardized configuration.
The four pillars that make EulerForge a good fit for MoE research
Via mixture_lora and moe_expert_lora injection — turn any dense Qwen / Llama / Gemma into an MoE-style trainable model. No model-code rewrites required.
Staged unfreezing (router → LoRA → base FFN) that makes large-model fine-tuning stable and reproducible.
SFT → DPO / ORPO → RM → PPO as one command sequence, with automatic base / LoRA detection between stages.
Catches configuration errors and MoE router-collapse risk before a single GPU cycle is burned.
Start from the same dense backbone and decide what kind of MoE experiment to run — in one YAML line.
dense_loraClassic LoRA adapters — the fastest path to domain adaptation. Ideal as a baseline control against MoE variants.
mixture_loraRouter + multiple LoRA experts. Turns a dense model into a token-level multi-task-routed structure.
moe_expert_loraReplace the FFN with an MoE block and inject LoRA into each expert (DeepSeek-style). Converts a dense backbone into a full MoE training target.
native_moe_expert_loraInject LoRA into each expert of an already-MoE model such as Mixtral or Gemma 4 MoE for efficient fine-tuning.
SFT → DPO / ORPO → RM → PPO. Checkpoints from each stage flow automatically into the next.
| Training type | Description |
|---|---|
| SFT | Supervised Fine-Tuning — the baseline alignment stage |
| DPO | Direct Preference Optimization — no reference model, memory efficient |
| ORPO | Odds Ratio Preference Optimization — single-forward-pass alignment |
| RM | Reward Model (Bradley-Terry) |
| PPO | Proximal Policy Optimization — final RLHF stage |
EulerForge automates every step required to turn a HuggingFace dense model into an MoE training target.
Stage who is trainable over time — large-model fine-tuning becomes stable and reproducible.
Early in training, only the router is trainable so that the token-to-expert distribution stabilizes. Without it, router / expert collapse is common.
Once the router is stable, only the expert LoRAs are trained. The base FFN remains frozen.
Gradually fade LoRA weights while knowledge is handed off to the base FFN, keeping the MoE structure at inference while reducing LoRA dependence.
The same injection / training code works across every backbone family.
| Backbone | Models |
|---|---|
| Qwen | Qwen2 / Qwen3 / Qwen3.5 (dense) |
| Llama | Llama 2 / Llama 3 / Llama 3.2, TinyLlama, Mistral |
| Gemma 3 | Gemma 3 1B / 4B (dense) |
| Gemma 4 | Gemma 4 dense (e2b / e4b) + native MoE (26b a4b) |
| Mixtral | Mixtral 8x7B / 8x22B (native MoE) |
| Quantized training | nf4 / int4 / int8 via bitsandbytes |
Ready-to-run YAML presets live under configs/presets/.
| Preset | Strategy | Training |
|---|---|---|
qwen3.5_0.8b_dense_lora_sft.yml | Dense LoRA | SFT |
qwen3.5_0.8b_mixture_lora_sft.yml | Mixture-of-LoRAs | SFT |
qwen3.5_0.8b_moe_expert_lora_sft.yml | MoE Expert LoRA | SFT |
qwen3.5_0.8b_moe_expert_lora_dpo.yml | MoE Expert LoRA | DPO |
llama3_1b_moe_expert_lora_sft_handoff.yml | MoE Expert LoRA + Handoff | SFT |
gemma3_4b_moe_expert_lora_orpo_handoff.yml | MoE Expert LoRA + Handoff | ORPO |
gemma4_26b_a4b_native_expert_lora_sft.yml | Native MoE Expert LoRA | SFT |
mixtral_native_expert_lora_sft.yml | Native MoE Expert LoRA | SFT |
v0.1.0 — Requirements: Python ≥ 3.9, PyTorch ≥ 2.1, Transformers ≥ 5.5.
Step-by-step guides and the complete command surface
Five-language log output — a collaborating team can use the same tool in their own language.
v0.1.0 released — open source, reproducible research.
Get started on GitHub Contact Us