Explore our latest articles on LLM inference, optimization techniques, and system architecture.
KT-FT v0.6.1 connects MoE SFT and local SGLang serving into an end-to-end loop; split LoRA serving bridges KT expert and SGLang non-expert adapters for Qwen3.5 MoE.
By integrating Mooncake into OpenClaw's real inference path, we not only improved fast-path latency, but also sharply reduced TTFT tail latency in multi-session, long-context workloads, turning a system that was usually fast but occasionally slow into one that feels consistently smooth.
Mooncake is now part of the PyTorch Ecosystem, complementing PyTorch-native LLM serving with high-performance disaggregated data transfer and storage.
A low-cost, low-memory end-to-end fine-tuning and inference workflow for large MoE models with KTransformers, LLaMA-Factory, and SGLang.