KV Cache Hit Rate Simulator

Trace

Block size

Customize trace Choose JSONL / JSONL.GZ Drop or choose a JSONL / JSONL.GZ trace. Each valid line needs hash_ids and input_length; block_size is optional when set above. hash_ids must be decimal integers or decimal strings. Requests are replayed in file order, so sort production traces by timestamp first. Computed locally in your browser.

Example

{"block_size":64,"hash_ids":[2001,2002],"input_length":128}
{"block_size":64,"hash_ids":[2003,2004],"input_length":128}
{"block_size":64,"hash_ids":[2005,2006],"input_length":128}

Do you know you can deploy this tool locally? Check our repo.

We also provide a Python version for large local traces.

Model family Model KV precision

KV Cache Eviction

When adding a new block would exceed the selected memory budget, the simulator evicts one cached block based on the following policies: FIFO, LRU, or Optimal.

Want to see the KV cache memory breakdown? Try our KV Cache Size Calculator.

Want to scale KV cache storage across your inference cluster? Try Mooncake.

Found an issue or have feedback? Open an issue and let us know.

Calculating...

KV Cache Hit Rate Simulator

KV Cache Hit Rate

Ideal Prefill Throughput Speedup