1. Problem
ML pipelines often experience significant slowdowns in the data preprocessing stage. Existing tools either lack visibility into fine-grained operation timings or cannot link high-level Python functions with low-level hardware behavior, making bottlenecks hard to diagnose.
2. Motivation
Preprocessing can consume up to 65% of training time, leading to poor GPU utilization. Tools that can bridge the semantic gap between Python-level operations and CPU microarchitectural activity are missing, limiting actionable insights for optimization.
3. Contribution / Solution
The authors introduce Lotus, a profiling framework consisting of:
- LotusTrace: Captures fine-grained (<10ms) timings of individual preprocessing steps in PyTorch’s DataLoader with minimal overhead.
- LotusMap: Bridges Python and C++ layers by mapping preprocessing operations to their corresponding low-level C/C++ functions, enabling correlation with hardware counters from tools like Intel VTune or AMD uProf.
Together, they provide full-stack visibility into preprocessing performance at both the software and hardware level.
4. Results / Observations
- Short-lived ops dominate: Most preprocessing operations take less than 10ms; many are under 100µs, making them invisible to traditional profilers.
- High variance in batch times: Variability in image sizes and randomness in transforms leads to 5–15% standard deviation in batch times, complicating resource provisioning.
- Out-of-order arrivals hurt performance: Shared queues between DataLoader workers cause batches to arrive out of order, introducing main-process wait times and delaying GPU consumption.
- Diminishing returns with more workers (cores): Increasing DataLoader workers initially reduces job time, but beyond a threshold (e.g., 20), it increases CPU contention with minimal end-to-end gains.
- Lotus is comparatively better: Compared to profilers like
py-spy
, austin
, and Scalene
, Lotus incurs lower overhead and provides richer insights with <2% runtime overhead and fine-grained batch-level instrumentation.
(Google form link) Click here to provide feedback