Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Danian: tail latency reduction of networking ap...

Gustavo Pantuza
September 01, 2021

Danian: tail latency reduction of networking application through an O(1) scheduler

Paper presented at: ISCC 2021

Abstract:
Core allocation for application threads is a problem of reasonable complexity and computational cost inside Unix systems. Caladan scheduler is a solution aiming to reduce the cost of how threads and cores are allocated in microsecond scale. Danian system optimizes through memoization the thread picking algorithm that picks the best thread for a given core. Such improvements have direct impact on applications distributed across networks on a data center. Thread picking operation cost dropped from O(n) to O(1), the CPU time reduced 7%, the tail latency reduced 3% on Caladan Synthetic experiment and 5% on the Netperf experiment.

Key words:
Real Time Communication Services,
Distributed Systems Architecture and Management,
Optimization and Management,
Network Reliability,
Network Design

Gustavo Pantuza

September 01, 2021
Tweet

More Decks by Gustavo Pantuza

Other Decks in Research

Transcript

  1. Danian: Tail latency reduction of networking application through an O(1)

    scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021
  2. Agenda ▪ Introduction ▪ Thread scheduling ▪ Caladan ▪ Danian

    ▪ Experiments ▪ Results ▪ Future work ▪ Conclusion
  3. Introduction ▪ Tail at scale (2013) ▪ Shenango (2019) ▪

    Caladan (2020) ▪ Danian (2021) p50 p95 p99 1ms 5ms 10ms Hypothetical Example
  4. Caladan ▪ Schedule threads into CPUs ▪ Run on top

    of DPDK ▪ Reads control signals every 5 μs ▪ Implemented inside Shenango
  5. Danian “In the 5000 years between the events of the

    Arrakis Revolt and the time the Lost Ones returned from The Scattering, Caladan's name was shortened to Dan, and all things pertaining to Dan were known as Danian.” Fonte: https://dune.fandom.com/wiki/Caladan
  6. Danian Fonte: https://dune.fandom.com/wiki/Caladan ▪ Works inside Caladan ksched ▪ Adds

    a memoization array ▪ Intercepts threads join/leave ▪ Algorithm to assign CPU→thread ▪ O(n) → O(1)
  7. Danian static struct thread * sched_pick_last_kthread(struct proc *p, unsigned int

    core) { struct thread *th; th = p->last_run[core]; if (!th->active) { return th; } return list_tail(&p->idle_threads, struct thread, idle_link); }
  8. Conclusion ▪ Thread picking from O(n) to O(1) ▪ Memoization

    using LRU policy ▪ -5% on tail latency (p99) ▪ -15% CPU usage
  9. Danian: Tail latency reduction of networking application through an O(1)

    scheduler Gustavo Pantuza, Lucas A. C. Bleme, Marcos Augusto M. Vieira, Luiz Filipe M. Vieira 26th IEEE Symposium on Computers and Communications Athens, Greece, September 5-8, 2021 IEEE ISCC 2021