1. Problem

As ML pipelines grow more complex, data preprocessing—loading, decoding, transforming—has become a major bottleneck, often using more compute and power than expected. Yet most profiling tools overlook this stage, especially how it interacts with the CPU hardware.

2. Motivation

Today’s ML systems are built on heterogeneous infrastructure. To make smart decisions about hardware (e.g., which CPUs to choose), we need detailed visibility into how preprocessing performs—not just at the code level, but down to the CPU microarchitecture.

3. Contribution / Solution

Lotus is a lightweight profiling tool purpose-built for ML preprocessing. It captures:

Lotus makes it easy to answer: Where is preprocessing slow? Is the CPU saturated? Would more cores help?

4. Results / Observations