Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Parsing a distribution name is sometimes hard

Parsing a distribution name is sometimes hard

LT at PerlCon 2019

Avatar for Kenichi Ishigaki

Kenichi Ishigaki

August 09, 2019
Tweet

More Decks by Kenichi Ishigaki

Other Decks in Technology

Transcript

  1. Usually it's easy I/IS/ISHIGAKI/Module-CPANTS-Analyse-1.01.tar.gz S/SK/SKAJI/Perl6/App-Mi6-0.0.2.tar.gz • The blue part is

    the author's directory based on their ID • The purple part is a subdirectory under the author's dir • The red part is the name of the distribution • The orange part is the version of the distribution
  2. CPAN::DistnameInfo I've been using a patched version for CPANTS but

    I didn't want to repeat that for CPAN::Groonga
  3. CPAN::DistnameInfo I was going to ping the gang, but I

    thought twice: let's test it with BackPAN first
  4. CPAN::DistnameInfo says... my $path = "E/ER/ERWANMAS/v0.10.zip"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "ERWANMAS", "dist" : "v", "distvname" : "v0.10", "extension" : "zip", "filename" : "v0.10.zip", "maturity" : "released", "pathname" : "E/ER/ERWANMAS/v0.10.zip", "version" : "0.10" }
  5. Or ... my $path = "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "SONNY", "dist" : "DBIx-Class-InflateColumn", "distvname" : "DBIx-Class-InflateColumn-S3", "extension" : "tar.gz", "filename" : "DBIx-Class-InflateColumn-S3.tar.gz", "maturity" : "released", "pathname" : "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz", "version" : "S3" } But really?
  6. More delicate cases my $path = "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "HARPREET", "dist" : "XMS-MotifSetv", "distvname" : "XMS-MotifSetv1.0", "extension" : "tar.gz", "filename" : "XMS-MotifSetv1.0.tar.gz", "maturity" : "released", "pathname" : "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz", "version" : "1.0" }
  7. More delicate cases my $path = "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "MPERRY", "dist" : "Config-INI-Reader", "distvname" : "Config-INI-Reader-Encrypted2", "extension" : "tar.gz", "filename" : "Config-INI-Reader-Encrypted2.tar.gz", "maturity" : "released", "pathname" : "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz", "version" : "Encrypted2" }
  8. More delicate cases my $path = "C/CA/CAFFIEND/font_ft2_0.1.0.tgz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties

    }); { "cpanid" : "CAFFIEND", "dist" : "font_ft", "distvname" : "font_ft2_0.1.0", "extension" : "tgz", "filename" : "font_ft2_0.1.0.tgz", "maturity" : "released", "pathname" : "C/CA/CAFFIEND/font_ft2_0.1.0.tgz", "version" : "2_0.1.0" }
  9. Why this happens? • CPAN::DistnameInfo looks for a distribution name

    and a version at the same time (using regex) • But it might be better to look for a version first, then treat the rest as a name
  10. Parse::Distname https://metacpan.org/release/Parse-Distname So I wrote a new module as a

    PoC, instead of applying a breaking change to the existing code
  11. Let's see my $path = "E/ER/ERWANMAS/v0.10.zip"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid" : "ERWANMAS", - "dist" : "v", + "dist" : "", "distvname" : "v0.10", "extension" : "zip", "filename" : "v0.10.zip", "maturity" : "released", "pathname" : "E/ER/ERWANMAS/v0.10.zip", - "version" : "0.10" + "version" : "v0.10" }
  12. Let' see my $path = "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid" : "SONNY", - "dist" : "DBIx-Class-InflateColumn", + "dist" : "DBIx-Class-InflateColumn-S3", "distvname" : "DBIx-Class-InflateColumn-S3", "extension" : "tar.gz", "filename" : "DBIx-Class-InflateColumn-S3.tar.gz", "maturity" : "released", "pathname" : "S/SO/SONNY/DBIx-Class-InflateColumn-S3.tar.gz", - "version" : "S3" + "version" : null }
  13. Let's see my $path = "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz"; say encode_json({ CPAN::DistnameInfo->new($path)->properties });

    { "cpanid" : "HARPREET", - "dist" : "XMS-MotifSetv", + "dist" : "XMS-MotifSet", "distvname" : "XMS-MotifSetv1.0", "extension" : "tar.gz", "filename" : "XMS-MotifSetv1.0.tar.gz", "maturity" : "released", "pathname" : "H/HA/HARPREET/XMS-MotifSetv1.0.tar.gz", - "version" : "1.0" + "version" : "v1.0" }
  14. Let's see my $path = "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid": "MPERRY", - "dist": "Config-INI-Reader", + "dist": "Config-INI-Reader-Encrypted", "distvname": "Config-INI-Reader-Encrypted2", "extension": "tar.gz", "filename": "Config-INI-Reader-Encrypted2.tar.gz", "maturity": "released", "pathname": "M/MP/MPERRY/Config-INI-Reader-Encrypted2.tar.gz", - "version": "Encrypted2" + "version": "2" }
  15. Let's see my $path = "C/CA/CAFFIEND/font_ft2_0.1.0.tgz"; say encode_json({ Parse::Distname->new($path)->properties });

    { "cpanid": "CAFFIEND", - "dist": "font_ft", + "dist": "font_ft2", "distvname": "font_ft2_0.1.0", "extension": "tgz", "filename": "font_ft2_0.1.0.tgz", "maturity": "released", "pathname": "C/CA/CAFFIEND/font_ft2_0.1.0.tgz", - "version": "2_0.1.0" + "version": "0.1.0" }
  16. Fixed 200+ cases • Out of 330000+ BackPAN distributions •

    Most cases are ancient, or accidental, and often removed already • See https://github.com/charsbar/Parse- Distname/blob/master/xt/walk_through.t for details • Parse::Distname also contains a few patches for CPAN::DistnameInfo
  17. May not be perfect yet my $path = "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz"; say

    encode_json({ Parse::Distname->new($path)->properties }); { "cpanid" : "CDRAKE", - "dist" : "Crypt", + "dist" : "Crypt-MatrixSSL", "distvname" : "Crypt-MatrixSSL3", "extension" : "tar.gz", "filename" : "Crypt-MatrixSSL3.tar.gz", "maturity" : "released", "pathname" : "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz", - "version" : "MatrixSSL3" + "version" : "3" } Looks better, but...
  18. Fixed this morning (0.04) my $path = "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz"; say encode_json({

    Parse::Distname->new($path)->properties }); { "cpanid" : "CDRAKE", - "dist" : "Crypt", + "dist" : "Crypt-MatrixSSL3", "distvname" : "Crypt-MatrixSSL3", "extension" : "tar.gz", "filename" : "Crypt-MatrixSSL3.tar.gz", "maturity" : "released", "pathname" : "C/CD/CDRAKE/Crypt-MatrixSSL3.tar.gz", - "version" : "MatrixSSL3" + "version" : null } ... by making it an exception
  19. Dogfooding • I have started using this for CPANTS and

    CPAN::Groonga • If everything goes well...?
  20. Caveats for migration • Distribution name may become empty (and

    your database may complain about this) • Internal hash keys are changed