Rabu, 28 Juli 2010

Startup overhead still matters

We all love Moose, and the subject of this question could have been rephrased better, but why do I get the feeling that not many people write pure CGI or command-line scripts in Perl (that got executed many times) anymore? After all, didn't Perl begin as a tool for sysadmin and only in the mid 1990's got picked up as the darling of CGI/web programming?

There are still many cases where/reasons why Perl scripts need to be run many times (instead of persistently long running).

  • It's much more stable (I've often need to kill or ulimit or periodically restart a Perl process because after days it grows to 500+ MB).

  • Sometimes CGI is all you get (especially in shared hosting environment, which is related to point 1).

  • Sometimes you need to run the scripts for many users, and it's not feasible (e.g. memory-wise) to let them all run persistently.

  • Many old scripts are designed that way.

  • Some environments require them that way (e.g. scripts run in .qmail are run for every incoming mail, scripts run by tcpserver are started for every incoming connection, etc).


There used to be projects like PersistentPerl or SpeedyPerl to let us easily make a Perl script persistent by just changing the shebang line (e.g. from #!perl to #!pperl), but these projects are currently not actively developed, probably due to lack of demand (?), or becase this kind of deployment tends to cause subtle bugs (I did get bitten by this a couple of times in the past). You can't just convert a script that is designed/written to be a one-off run into a long-running one without expecting some bugs, anyway.

And the Perl compiler (B::*, *.pmc) is also now deprecated, probably because it does not give that many startup cost saving after all (the fact that Perl has phasers like BEGIN/CHECK blocks means it has to execute code as it compiles them anyway).

And thus we're stuck with having to accept the startup cost of parsing & compiling for every script run. That's why startup cost matters. On our servers awstats runs many thousand of times everyday (2000-5000 sites x 10+ HTML pages), and since it's a giant script (10k-ish line) it has a startup overhead of almost 1s. I really would like to shave this startup overhead as it is a significant part of server load.

Until today many of my scripts/programs are still deployed as one-off command line scripts. And that's why instead of Moose I use Mouse (or Any::Moose, to be exact) whenever I can. And so far I can.

2 komentar:

  1. I've often wondered why FastCGI hasn't actually taken off more as a generic mechanism for this problem.

    Ignoring the world of CGI and webapps for a moment, FastCGI is nothing more than way for an application to remain persistent, and handle multiple possibly-concurrent sessions of environment variables + STDIN stream input, STDOUT + STDERR streams output. It ought to be trivial to write a little connector to just shove %ENV+@ARGV into a new FastCGI request to some persistent application, and connect the STDIO streams.

    BalasHapus
  2. @leonerd: Yeah, a few years back I'd been wondering too why FastCGI didn't take off in general. The Perl folks are too focused with Apache/mod_perl, and PHP folks with mod_php. And aside from Apache I remembered there was only one or two other webservers which supported FastCGI (one of them commercial, forgot the name).

    Only after alternative webservers like lighty, nginx, etc are getting more popular, along with Rails, Django, etc, did FastCGI get more attention.

    BalasHapus