Rabu, 28 Juli 2010

Startup overhead still matters

We all love Moose, and the subject of this question could have been rephrased better, but why do I get the feeling that not many people write pure CGI or command-line scripts in Perl (that got executed many times) anymore? After all, didn't Perl begin as a tool for sysadmin and only in the mid 1990's got picked up as the darling of CGI/web programming?

There are still many cases where/reasons why Perl scripts need to be run many times (instead of persistently long running).

  • It's much more stable (I've often need to kill or ulimit or periodically restart a Perl process because after days it grows to 500+ MB).

  • Sometimes CGI is all you get (especially in shared hosting environment, which is related to point 1).

  • Sometimes you need to run the scripts for many users, and it's not feasible (e.g. memory-wise) to let them all run persistently.

  • Many old scripts are designed that way.

  • Some environments require them that way (e.g. scripts run in .qmail are run for every incoming mail, scripts run by tcpserver are started for every incoming connection, etc).

There used to be projects like PersistentPerl or SpeedyPerl to let us easily make a Perl script persistent by just changing the shebang line (e.g. from #!perl to #!pperl), but these projects are currently not actively developed, probably due to lack of demand (?), or becase this kind of deployment tends to cause subtle bugs (I did get bitten by this a couple of times in the past). You can't just convert a script that is designed/written to be a one-off run into a long-running one without expecting some bugs, anyway.

And the Perl compiler (B::*, *.pmc) is also now deprecated, probably because it does not give that many startup cost saving after all (the fact that Perl has phasers like BEGIN/CHECK blocks means it has to execute code as it compiles them anyway).

And thus we're stuck with having to accept the startup cost of parsing & compiling for every script run. That's why startup cost matters. On our servers awstats runs many thousand of times everyday (2000-5000 sites x 10+ HTML pages), and since it's a giant script (10k-ish line) it has a startup overhead of almost 1s. I really would like to shave this startup overhead as it is a significant part of server load.

Until today many of my scripts/programs are still deployed as one-off command line scripts. And that's why instead of Moose I use Mouse (or Any::Moose, to be exact) whenever I can. And so far I can.

Senin, 05 Juli 2010

Spot the error

use Data::Rmap qw(:all);
use JSON;
use Data::Dump;
use Clone;
use boolean;

my $arg = from_json(q{{"1":true,"2":false}});
# convert JSON booleans to boolean's booleans
rmap_all { bless $_,"boolean" if ref($_) =~ /^JSON::(XS|PP)::Boolean$/ }, $arg;
dd $arg;

Hint: it's one character long.

In fact, this piece of code is full of Perl's traps (from Perl's lack of booleans obviously, to less obviously having to clone and rmap not working), it disgusts me.