Senin, 12 Oktober 2009

CPAN download counter? +1!

From time to time people (including me once, a few years back) would ask questions like: what are the 'top' (or 'most popular' or 'widely used' or most downloaded) dists/modules on CPAN? Is there a download counter for each dist/module? Like this one from prz, a budding module author.

The answer is there isn't one, because CPAN is just a bunch of static files. The upside of this, CPAN is very easily mirrored (e.g. via FTP or rsync or offline via CDs) and served (e.g. via FTP, HTTP, or local filesystem). The downside, there isn't a place for much intelligence/logic on the serving side.

To implement this feature, we can put some stats gathering code on the client side, like what Debian has been doing for a while; in fact you can already see the list of most widely installed Perl modules from the data. Or we can add some stats to search.cpan.org like most viewed/clicked/downloaded dists and modules, and maybe top search keywords. Not representative of all mirrors, sure, but it's better than nothing.

Download counter, or at least Popular/Top Downloads, is a common feature on download/catalog/shopping/news sites, from freshmeat and Download.com, to Amazon and iTunes Store. So common that many users expect it to be there as a standard feature.

It's not hard to imagine why people like to know what's popular, what everybody else is using/doing, what's in, what's hot. It's a social side of human nature. And it's beneficial to know which modules are getting downloaded and used more, to direct development efforts to the more important stuffs. Volunteers can surely take the top modules list as one consideration when picking which project to spend their valuable time on.

What I'm not very clear on though is why, aside from PHP, many programming languages' communities don't like this particular feature? Do we hate competition, do we hate popularity contest, or are we just plain lazy?

Anyway, effort like CPANHQ might soon make the Top/most $foo modules, and more, possible. Yay!

6 komentar:

  1. Speaking as the maintainer of one of the fastest CPAN mirrors, one of the reason that no-one has implemented a counter feature, is that it requires a tool to monitor downloads and then send that information to central resource, which can then collate the information. If someone had the time and motivation to do it, I would think many mirror admins would be happy to run it.

    With search.cpan.org, you have to remember that it is a distributed system too. There isn't one web server, there are several around the world to reduce latency. It would be potentially possible to aggregate server logs, but again no-one has had the time or motivation to do it.

    The problem with being a distributed system means that gathering the information requires the numerous small parts to all work together, otherwise any information you provide is going to be inaccurate.

    Another reason perhaps why no-one has implement it is because of the Flash Crowd effect. Some module featuring in a top 20 most viewed/downloaded list doesn't mean it is the best module for the job. It just means that because it featured in the top 20, several hundred/thousand people have now viewed/downloaded it to see what the fuss was about, thus sustaining its position in the top 20.

    BalasHapus
    Balasan
    1. Everything you said is right. But still wouldn't it be nice to have a small counter near the "Download" option (we already have "size" option)?

      @"Another reason perhaps why no-one has implement it is because of the Flash Crowd effect. Some module featuring in a top 20 most viewed/downloaded list doesn't mean it is the best module for the job.

      Agreed but for that purpose we have ratings/reviews. We should have a counter so that the author of the module can track the number of downloads. It will help users too.

      I'm planning to make it btw.

      Hapus
  2. "Another reason perhaps why no-one has implement it is because of the Flash Crowd effect. Some module featuring in a top 20 most viewed/downloaded list doesn't mean it is the best module for the job. It just means that because it featured in the top 20, several hundred/thousand people have now viewed/downloaded it to see what the fuss was about, thus sustaining its position in the top 20."

    Agreed. Personally, the ratings and reviews are far more interesting than aggregate download statistics. I wish more folks would take time to rate / review modules and -- when necessary -- to use the annotation feature to update the documentation.

    BalasHapus
  3. Flash effects aside, there also some subtlety needed to identify and factor out people running minicpan (which pulls the entire repository) from people pulling modules for installation.

    BalasHapus
  4. Komentar ini telah dihapus oleh penulis.

    BalasHapus
  5. If some module in the top 20 were so mediocre and should not even be there, wouldn't there be a natural reaction from the community? E.g. actively promoting an alternative, improving the module, forking the module, or creating an alternative top-N list based on some other criteria. Won't that reactive movement itself bring positive results?

    As mentioned in the article, we can add stats gathering code on the client side (as in: the CPAN::* modules and/or the command lines). That way Mini::CPAN module can skip counting downloads when doing mirroring. We can also track number of installations/upgrades/other activities.

    BalasHapus