Essentially, a decoder-only Transformer model transforms data from any modality into KVCache, positioning it as a central element in LLM serving optimizations. These optimizations include, but are not limited to, caching, scheduling, compression, and offloading. KVCache.AI is a collaborative endeavor with leading industry partners such as Approaching.AI and Moonshot AI. The project focuses on developing effective and practical techniques that enhance both academic research and open-source development.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations. Made with Approaching AI !
Mooncake, delicious mooncake made with Moonshot AI !
A KVCache-centric disaggregated architecture for LLM serving.