Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Why a new CPAN client cpm is fast

Shoichi Kaji
June 22, 2016
23

Why a new CPAN client cpm is fast

YAPC::NA 2016 Orlando, Florida, 2016.06.22
(original https://www.slideshare.net/skaji/why-a-new-cpan-client-cpm-is-fast)

Shoichi Kaji

June 22, 2016
Tweet

Transcript

  1. Me • Shoichi Kaji • Tokyo, Japan • pause/github: skaji

    • Perl5: cpm, App::FatPacker::Simple, Mojo::SlackRTM • Perl6: mi6, Frinfon, evalbot in Slack:)
  2. Agenda • What is cpm, and why? • cpanm VS

    cpm • The internal of cpm • divide installing processes into pieaces • learn from go language • Roadmap
  3. Why a new CPAN client? • Yes, I always use

    cpanm to install CPAN modules. It’s awesome! • Because cpanm installs modules in series,
 it takes quite a lot of time to install a module that has many dependencies
  4. Why a new CPAN client? • So I created cpm

    • Actually cpm is not a new CPAN client,
 but it uses cpanm in parallel,
 so that it can install CPAN modules much faster
  5. cpm

  6. First, let’s think simple $ cat modules | xargs cpanm

    Can we just use xargs to parallelize cpanm? NO, WE CAN’T.
  7. The problem with • The modules to be installed are

    not determined in advance. • Even if you have a list of modules to be installed, cpanm workers will be broken unless you synchronize cpanm workers • So we have to • (1) divide installing process of CPAN module into pieces that can be executed individually • (2) synchronize cpanm workers in some way $ cat modules | xargs cpanm
  8. (1) Divide installing process of CPAN modules sub installing_process {

    my $module = shift; # 1. resolve # query cpanmetadb my $dist_url = resolve($module); # 2. fetch (and extract) # wget && tar xzf && read META.json my ($dir, @configure_deps) = fetch($dist_url); install_module($_) for @configure_deps; # 3. configure # perl Makefile.PL/Build.PL && read MYMETA.json my @deps = configure($dir); install_module($_) for @deps; # 4. install # make install (or ./Build install) install($dir); } I divided the process into 4 jobs: * resolve * fetch * configure * install which are independent
  9. Take a look at go language… go introduces two concurrency

    primitives: * goroutines * channels They are very simple but powerful. func work(in <-chan string, out chan<- string) { for { job := <-in // do work with job out <- "result" } } func main() { in := make(chan string) out := make(chan string) go work(in, out) in <- "job" result := <-out }
  10. Take a look at go language… func main() { in1

    := make(chan string) out1 := make(chan string) go work(in1, out1) in2 := make(chan string) out2 := make(chan string) go work(in2, out2) in1 <- "job1" in2 <- "job2" select { case result1 := <-out1: // do something with result1 case result2 := <-out2: // do something with result2 } } It is very easy to increase workers You can use select() to await multiple channels simultaneously
  11. The internal of cpm .BTUFS DQOBN XPSLFS DQOBN XPSLFS DQOBN

    XPSLFS TFMFDU QJQFY QJQFY QJQFY cpanm worker 1. get job via pipe 2. work, work, work! 3. send result via pipe Master 1. prepare pipes for workers by pipe(2) 2. launch workers by fork(2) and connect them with pipes 3. loop {
 calculate jobs and send jobs to idle workers. if all workers are busy, then wait them and recieve results by select(2)
 }
  12. Roadmap • Last year I talked with Tatsuhiko Miyagawa about

    cpanm 2.0 (menlo) • Then he said “why don’t you merge cpm into cpanm itself?” • I was very happy to hear that!
  13. Roadmap • So if you all find cpm is useful

    and stable, then cpm should be merged into cpanm 2.0 • Before merging, there are some problems that need to be resolved: • The log file is very messy • I will highly appreciate your feedback!