Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MLtraq: Track your AI experiments at hyperspeed

MLtraq: Track your AI experiments at hyperspeed

Slides from my talk at Munich MLOps Community Meetup #7

Every second spent waiting for initializations and obscure delays hindering high-frequency logging, further limited by what you can track, an experiment dies. Wouldn't it be nice to start tracking in nearly zero time? What if we could track more and faster, even handling arbitrarily large, complex Python objects with ease?

In this talk, I will present the results of comparative benchmarks covering Weights & Biases, MLflow, FastTrackML, Neptune, Aim, Comet, and MLtraq. You will learn their strengths and weaknesses, what makes them slow and fast, and what sets MLtraq apart, making it 100x faster and capable of handling tens of thousands of experiments.

The talk will be inspiring and valuable for anyone interested in AI/ML experimentation and portable, safe serialization of Python objects.

Michele Dallachiesa

April 11, 2024
Tweet

More Decks by Michele Dallachiesa

Other Decks in Research

Transcript

  1. Scope of this talk https://mltraq.com/ benchmarks/speed/ You will learn: •

    What is experiment tracking • What makes different frameworks fast and slow • How to select an experiment tracker for your projects
  2. Experimentation • Definition: “The process of systematically changing and testing

    different input values in an algorithm to observe their impact on performance, behavior, or outcomes.”
  3. Experiment tracking • Definition: “The process of recording the inputs,

    outputs, and performance metrics of an experiment.” • Examples: Code, notebooks, scripts, environment setup, parameters, configurations, evaluation metrics, model weights, system stats, inputs, outputs, accuracy, prompts, cost metadata, ...
  4. Applications of experiment tracking • Explore and understand the impact

    on performance of different algorithms, parameters, and datasets • Automation and observability: live monitoring of long-term experiments, reproducibility, documentation, collaboration, ...
  5. Modelling experiments • An experiment is a collection of runs

    • A run is an instantiation of the experiment with a fixed set of inputs
  6. Why tracking speed matters: Initialization (1/3) • Slow imports negatively

    impact development, CI/CD tests, and debugging speed • High run initialization times impact on our ability to experiment with hundreds of thousands of runs Wouldn't it be nice to start tracking almost instantly?
  7. Why tracking speed matters: High frequency (2/3) • At times,

    it's necessary to record metrics that occur frequently (loss, reward, state, ...) • Workarounds to handle too much information come at a complexity/completeness/accuracy cost: threading, downsampling, summarization, and histograms What if we could avoid these limitations altogether?
  8. Why tracking speed matters: Large, complex objects (3/3) • Python

    data structures (dictionaries, lists, tuples), NumPy arrays, data frames, datasets, model weights, timeseries, forecasts, media files such as images, audio recordings, and videos, ... • Existing solutions are primitive and slow, using tech (JSON, uuencoding) from 25-40 years ago What if we could track more with less constraints?
  9. • A new open-source experiment tracker designed to work with

    any SQL database, fast and interoperable • Serialization powered by native SQL database types, Numpy, PyArrow, and safe Python pickles • Funding: You can star the project on GitHub and/or hire me to make your experiments run faster
  10. Benchmarking experiment tracking frameworks Frameworks • Weights & Biases (0.16.3)

    • MLflow (2.11.0) • FastTrackML (0.5.0b2) • Neptune (1.9.1) • Aim (3.18.1) • Comet (3.38.1) • MLtraq (0.0.125) Latest update: 2024.03.06 Varying • Value type: float, ndarray • Count of values • Count of runs • Array length How • As MLtraq experiments! • 10 independent runs • Local storage
  11. What takes most of the time? • W&B: threading, IPC

    • MLflow: Alembic migration • Aim: threading, RocksDB • Comet: threading • FastTrackML: fast but requires running server • MLtraq: SQLite operations • Neptune: direct writes to FS How much time to track 1 run and 1 value? Start up time Neptune vs W&B: 400x
  12. • Entity-attribute-value database model with no batching kills MLflow/FastTrackML performance

    How much time to track 1 run and 100-10K values? 0.85 “accuracy” Experiment ID/name Source: https://community.intersystems.com/post/entity-attribute-value-model-relational-databases-should-globals-be-emulated-tables-part-1 High frequency tracking MLtraq vs MLflow: 355x
  13. • Threading/IPC expensive for W&B How much time to track

    10 runs and 1 value? MLtraq vs W&B: 1563x
  14. How much time to track 1K runs and 1K values?

    What makes MLtraq faster • SQLite vs filesystem • Safe pickling vs JSON MLtraq vs Neptune: 23x
  15. How much time to track 10^6 float64 values (8MB)? •

    MLtraq: Pickle, numpy.lib.format • W&B: wandb.Table, JSON format • Neptune: JSON, uuencoded binary blob • MLflow: mlflow.log_text, binary blob • FastTrackML: c.log_text, binary blob • Aim: run.track, binary blob • Comet: run.log_text, binary blob binary blob = weak semantics! MLtraq vs W&B: 113x Tracking large objects
  16. • Write speed of np.zeros(size, dtype=np.int8) • Variants: MLtraq-fs vs

    MLtraq-db-mem vs MLtraq-db-fs How much time to track up to 10^9 int8 values (1GB)?
  17. • Trade-offs: threading/IPC, data storage design, batching vs streaming •

    Uuencoding and JSON-like formats are slow with poor semantics, the future is native types with PyArrow • Beyond “tracking speed”: backward compatibility, cloud, backend, third-party integrations, reporting, complete model lifecycle management, ... • Disclaimer: lots of simplifications in these slides, check out full article and notebooks for details! Conclusion