A low-cost, low-memory end-to-end fine-tuning and inference workflow for large MoE models with KTransformers, LLaMA-Factory, and SGLang.
Nov 1, 2025