Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token

Rajveer Bachkaniwala, Chengqi Luo, Richard So, Divya Mahajan, Kexin Rong

The Ninth Annual Conference on Machine Learning and Systems (MLSys'26)

Award:

Artifact:

[Deep Wiki]