Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token
MLSys '26
Rajveer Bachkaniwala, Chengqi Luo, Richard So, Divya Mahajan, Kexin Rong
[PDF] [Project Page] [Code] [Slides] [Deep Wiki]
Lotus: Characterize Architecture Level CPU-based Preprocessing in Machine Learning Pipelines
HotInfra '24
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska
[PDF] [Project Page] [Code] [Slides] [Deep Wiki]
Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
IISWC '24 · 🏆 Best paper nominee
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska
[PDF] [Project Page] [Code] [Slides] [Deep Wiki]