Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MLtraq: Track your ML/AI experiments at hypers...

Michele Dallachiesa
July 15, 2024
16

MLtraq: Track your ML/AI experiments at hyperspeed

Every second spent waiting for initializations and obscure delays hindering high-frequency logging, further limited by what you can track, an experiment dies. Wouldn’t loading and starting tracking in nearly zero time be nice? What if we could track more and faster, even handling arbitrarily large, complex Python objects with ease?

In this talk, I will present the results of comparative benchmarks covering Weights & Biases, MLflow, FastTrackML, Neptune, Aim, Comet, and MLtraq. You will learn their strengths and weaknesses, what makes them slow and fast, and what sets MLtraq apart, making it 100x faster and capable of handling tens of thousands of experiments.

This presentation will not only be enlightening for those involved in AI/ML experimentation but will also be invaluable for anyone interested in the efficient and safe serialization of Python objects.

Michele Dallachiesa

July 15, 2024
Tweet

Transcript

  1. Why tracking? Explore and understand impact on performance varying algorithms,

    parameters, datasets Experimentation process Hypothesis Design model Train model Evaluate model
  2. Tracking code notebooks scripts environment setup parameters configurations inputs metrics

    model weights system stats outputs LLM prompts cost metadata predictions git commit version author images generated text audio video debug data
  3. What is an experiment? Experiment Run N Run 2 Run

    1 ... • Experiment: collection of runs • Run: experiment instantiation varying inputs results1 = train_eval(inputs1) results2 = train_eval(inputs2) ...
  4. Example of experiment Classifier • DummyClassifier • LogisticRegression • DecisionTreeClassifier

    • RandomForestClassifier Dataset • Iris • Digits • Wine Random seed • 1 • 2 • .. • 10 4 x 3 x 10 configurations ⇒ 120 runs x x
  5. Solutions for experiment tracking MLflow 51% W&B 45% Comet 3%

    Aim 1% Neptune 1% 😕 Slowness and type limitations Others < 1% Percentage of PyPI monthly downloads as proxy to market share
  6. • Containers: dict, list, set, tuple • Scalars: int, str,

    time, bool • Arrays: NumPy, data frames • ... Beyond float and bytes
  7. Team DB Execution Reporting Tracking Copy Public DB Compute nodes

    MLtraq is flexible execute and persist locally or remote Private DB ...
  8. 01 from mltraq import create_session 02 03 session = create_session("sqlite:../local.db")

    04 experiment = session.create_experiment("test") 05 06 with experiment.run() as run: 07 run.fields.accuracy = .9 08 09 experiment.persist() 10 11 session.db.query("SELECT * FROM experiment_test") ╭────────────────┬──────────────┬──────────╮ │ id_experiment │ id_run │ accuracy │ ├────────────────┼──────────────┼──────────┤ │ 4d4c4f7a... │ 457f89c38... │ 0.9 │ ╰────────────────┴──────────────┴──────────╯
  9. Start up time High frequency tracking Large objects tracking •

    Full analysis at https://mltraq.com/benchmarks/speed • Tracking speed of floats (scalars and NumPy arrays) • Statistical profiling with Pyinstrument • Reporting averaged results on 10 repeated runs Let’s experiment!
  10. Start up time What takes most of the time? •

    W&B: threading, IPC • MLflow: Alembic migration • Aim: threading, RocksDB • Comet: threading • MLtraq: SQLite operations • Neptune: direct writes to FS How much time to track 1-10 float values? 1.6s
  11. Exponentially worse with more runs How much time to track

    1 float on 100 runs? 208s Start up time
  12. How much time to track 100-100K float values? 0.85 “accuracy”

    Run ID High frequency tracking • MLflow uses entity-attribute-value model • DB INSERT at every .log_metric(...) call 5.8s
  13. Tracking large objects How much time to track 1 million

    float64 values? Storage formats: • MLtraq: safe Pickle, NumPy • W&B: JSON • Neptune: JSON, binary blob • MLflow: binary blob • Aim: binary blob • Comet: binary blob binary blob ⇒ weak semantics 2.4s
  14. 01010 10101 01010 10110 10111 01010 10111 11101 Safe-Pickling Safe-Unpickling

    If dangerous opcodes encountered, exception Pickle binary format Python objects
  15. • Write speed of np.zeros(size, dtype=np.int8) • MLtraq-fs: direct write

    to filesystem • MLtraq-db-mem: in-memory sqlite DB • MLtraq-db-fs: sqlite DB stored on filesystem How much time to track 1 billion int8 values? Tracking large objects
  16. • No size fits all: threading or IPC, web API

    or DB, storage design, batching and streaming • Use native SQL and Python types, PyArrow: uuencoding or JSON-like formats are slow, poor semantics • Impact: contributed to make new W&B SDK 36-88% faster! Conclusion • floats