Upgrade to Pro — share decks privately, control downloads, hide ads and more …

cpm at PerlCon 2019

Shoichi Kaji
August 09, 2019
110

cpm at PerlCon 2019

PerlCon 2019, Rīga, Latvia, 7-9 August

Shoichi Kaji

August 09, 2019
Tweet

Transcript

  1. Agenda • 5 features of cpm • The internal of

    cpm — why cpm fast — • Toward cpm version 1.0
  2. › curl -sL https://git.io/cpm > cpm › chmod +x cpm

    › ./cpm -V cpm 0.983 (./cpm) This is a self-contained version, 0.983
  3. language: perl perl: - "5.30" install: - curl -sL https://git.io/cpm

    | perl - install -g script: - prove -l t It also is easy to use cpm in CI such as travis CI
  4. • You can change cpm’s resolvers via —resolver option •

    Let’s say you have DarkPAN which contains your private distributions, but not whole CPAN distributions. • In fact, other CPAN clients do not work well for such case, but cpm does.
  5. › cpm install \ --resolver 02package,http://your-darkpan \ --resolver metadb \

    Module1 Module2 ... resolve your darkpan first and if it fails,
 fall back to normal metadb resolver
  6. • Leon Timmermans has proposed a new concept for CPAN

    distribution installation called static-install • It is much simpler, safer, faster than traditional one (i.e, executing Makefile.PL) • cpm support static-install • I wrote a blog post about static install
  7. • cpm keeps builds in ~/.perl-cpm and never fetch/build them

    again • This is (of cource!) inspired by Carmel • This makes cpm even faster!
  8. 5 features of cpm • 1. fast • 2. self-contained

    • 3. flexible resolvers • 4. static install • 5. prebuilt
  9. Programming paradigms • To make program fast, there are some

    programming paradigms • Multi Thread • I don’t think it is good idea to use thread in perl5. Oops. • Event Driven • This is a good choice. But once we adopt event-loop, we cannot use synchronous code anymore. This means that we cannot relay on cpanminus code. Oops. • Multi Process • Let’s use this
  10. Multi Process • Let’s use multi-process paradigm.
 So the next

    questions are: • Q1: How do we pass data from the master to workers and vice versa? • Q2: The master need to know that workers finish their jobs as soon as possible. How do we achieve this? worker worker worker master process
  11. IPC • Q1: How do we pass data between processes?

    • Idea1: files • Other process does not detect files are changed quickly. It appers inotify is "slow". • Idea2: TCP/IP • A Good choice. Because master and worker processes are in the same host, we don’t have to use TCP/IP necessarily. • Idea3: pipes • Let’s use this 
 We should prepare 2 pipes for master -> woker and woker-> master
  12. select • How does the master detect which workers are

    finished? • Workers will send results to the master via pipe, which also means the workers are finished. • So, if the master monitor pipes by select(2), it can detect which workers are finished quickly
  13. Wrap-up so far • We adopt multi process paradigm •

    Connect master with workers via 2 pipes • Master monitors pipes by select(2) so that it detects which workers are finished quickly worker worker worker master process pipe pipe pipe select
  14. Actual code of cpm The master "calculates" jobs Get ready

    (= finished) workers here (internally, we do select pipes!) Modularize connection between
 the master and workers as Parallel::Pipes module And send job to the ready worker This only 11-line code makes cpm fast!
  15. Toward cpm version 1.0 • Make distributions "first-class objects" •

    Traditionally we install CPAN modules into one specific directory. So, after install, we cannot see which distribution a module come from and it is hard to re-use distributions • On the other hand, MIYAGAWA has introduced a concept "central repositories" in his project Carmel • Let’s keep each distributuions separately • Once cpm implement it, for example, cpm can easily install only runtime dependencies
  16. Wrap up • cpm is a fast CPAN client •

    it also has some other iteresting features • cpm uses system calls fork/pipe/select effectvely so that it install CPAN modules fast • cpm 1.0 will treat distributions as "first-class objects"