Selasa, 27 Oktober 2009

Using mod_perl *aside* from deploying web apps?

History has it that mod_perl was (is?) marketed as a high performance alternative to Perl CGI, and that virtually all success stories mentioned on the website are about using it to deploy web apps.

It is unfortunate because: first of all, mod_perl is much more than just about running web apps, it is actually a tool to customize and extend Apache using Perl instead of C. There's so much you can do with it.

Second, I believe mod_perl is suboptimal for running web apps: it makes your Apache processes fat (and thus you'll often need a front-end proxy), it's tied to Apache (so you can't experiment with other webservers), it's relatively more complex to configure and tune compared to, say, FastCGI (and thus potentially more insecure), and it's just too damn powerful if you just want a fast CGI.

mod_perl is also theoretically more insecure because it bundles webserver and the application engine together instead of cleanly separates it. Some of your Perl code might run as root too. All of these are unnecessary if you just want to run web apps.

Here at work we have been using mod_perl for years, not for deploying web apps but for creating custom Apache handlers in Perl (basically it was because my C sucks). The handlers do the following:


  • connect to CGI daemon (because we also write our own CGI daemon, which is much more paranoid in some respects but also more flexible in others like running PHP scripts under different configurations);
  • filter URLs using own rules (this can be done using a series of regexps with mod_rewrite, but much more readable and comfortable if done with full-fledged Perl code);
  • authenticate hosting users;
  • do per-vhost aliases (mod_alias can also do this, but we are using mass/dynamic virtual hosts);
  • etc.


The servers on which we run mod_perl are shared hosting servers, and we allow users to install their own .htaccess, so we have to patch mod_perl to restrict the Perl* directives from the users. This is because mod_perl does not have something like mod_ruby's RubyRestrictDirectives. This kind of functionality is available as a build-time configuration.

So there you have it, an unusual/unpopular application of mod_perl: customizing Apache in a shared hosting environment.

Anyone else using mod_perl not to deploy web apps?

Trying out Padre 0.47

This is written after a few days of trying Padre. Normally I use emacs, joe, Komodo Edit/IDE, kate, and recently geany, so this post is basically about comparing Padre's editing features with these other editors.

Note: Yup, 0.48 was already out a couple of weeks ago, but I haven't been able to install it due to segfaults in libwx*. Incidentally my box was recently upgraded from Jaunty to Karmic so that might have caused the problem. Anyway, I did read the 0.48's ChangeLog just in case.

First and foremost, I'm blown away! Between my first try a few months back and 0.47, Padre has tranformed into a very usable editor/IDE, and it's pretty fast too, faster than Komodo IDE/Edit which is notoriously sluggish. Kudos to the hardworking Padre team, you guys rock!

Here are the features which I find still missing:


  • a button (or keyboard shortcut) to quickly toggle on/off the directory tree.
  • next-tab and prev-tab functionality. There is next-file and prev-file, but after you rearrange the tabs, those two become less useful. Maybe assign Ctrl-PgUp/PgDn or Alt-Left/Right for this?
  • justify/reflow paragraph/manual word wrap. A la emacs' M-q or joe's C-kj.
  • autodelete trailing whitespaces when saving (and autoadd newline at the end if missing).
  • remember folding state when reopening files.
  • (Bug?) the "autofold POD when folding is enabled" feature is very nice, but right now is behaving rather strangely. It only works after I open Preferences and hit Save. Opening or saving files does not automatically fold PODs as advertised.
  • maybe include something like Rx Toolkit in Komodo? Because I do write a *lot* of regexes when coding in Perl.
  • (Suggestion) soft characters for automatic bracket completion. I find this feature in Komodo very very nice as I seem to have the habit of typing { or (, and then after thinking a while cancel it.


I'd gladly submit these into Padre's Trac if someone would give me an account. Last time I visited #padre no one had the admin rights to do this. Signup from the website is temporarily disabled to overwhelming amount of spam.

Anyway, keep up the wonderful work, guys! Looking forward to using Padre more and more often in the future. Envisioning writing Padre config files in Perl, just like in emacs using Lisp... Life will be good...

Selasa, 20 Oktober 2009

HTTP-style sub return

In a subroutine, we often want to return the status of operation (success/failure/error code) as well as the result of the operation.

When a function does not need to return any result, it can just return the status, usually as an integer scalar. In C and Unix the convention is to return 0 for success and non-zero for the error code. In Perl and many other languages, it is the other way around: zero/false/undef for failure and true values for success. You can also return the result as well right there and then as long as the result evaluates to true. This can be a problem if there is a possibility that the result be zero/false/undef.

Alternatively, you can return the result instead. How do we return the error code then? Usually via some global variable like $? and $! in Perl. This has drawbacks of its own, like in multithreaded/reentrant code.

It is safer thus, to return both the status and the result separately and explicitly, e.g. using a 2-element array:

($status, $result) = foo(...);


For roughly a year now, I have been adopting something like the above, with what I call HTTP-style return convention. Instead of 2, I return a 3-element array in my subs:

return ($status, $extra_info, $result);


$status is a 3-digit integer values, with values taken as much as possible from the HTTP spec: 200 means success, 4xx means generic "client side" (i.e. caller side) error like missing or invalid arguments, 404 means not found, 403 means forbidden, 5xx means error in the "server side" (our side, the sub side), 501 means unimplemented, and so on.

$extra_info is a hashref which contains, well... extra info, like error string, debugging messages, or intermediate results. This is the equivalent of HTTP response headers. But it can be an undef too if the sub does not offer extra information. So it will avoid creating unnecessary hashref.

$result is the actual result.

An example code:

my @resp = process_stuff(@stuffs);
if ($resp[0] != 200) {
die "Failed ($resp[1]{errstr})";
} else {
print "Number of stuffs input: ", scalar(@stuffs);
print "Number of stuffs processed: ", $resp[2];
}


Another example:

my @resp = search_stuff($stuff);
if ($resp[0] == 404) {
die "Stuff $stuff not found";
} elsif ($resp[0] != 200) {
die "Failed";
} else {
print "Stuff $stuff found in $resp[2]";
}


It is a bit confusing for readers not familiar with this style, but I find this clear as it is (i.e., without declaring/using constants for 200, 404, etc).

The advantages of this return style:



  • You can return the status as well as the result as well as extra info.

  • The convention for the status codes is already familiar to many. HTTP has been so popular for most of the lifetime of Perl, that most Perl programmers who have dabbed in CGI or Internet programming should be familiar with it. Even many nonprogrammers know what 404 or 500 mean since they often see this while browsing. The 2xx, 4xx, 5xx convention is also used in other protocols like SMTP.

    If you are not, you can always use a comment or a constant to explain the meaning of the numbers.

  • You can easily wrap your sub later in a REST API or web service. Just pass $status as HTTP status code, (selected) pairs in $extra_info into HTTP response headers, and $result (possibly encoded in JSON/YAML/whatever).

  • A bonus, because Perl has contexts, you can also do this:

    if (wantarray) {
    return ($status, $extra_info, $result);
    } else {
    return $result;
    }


    so that you can fallback to a very simple style when all the other stuffs are not needed:

    my $result = foo(...); # don't care about status, assume always success




I find this return style somewhat relevant in the light of PSGI's deservedly speedy upshoot to popularity.

Rabu, 14 Oktober 2009

Dumping content to files using Log::Dispatch::Dir

Logging frameworks like Log::Log4perl and Log::Dispatch are great. They relieve you of the burden of reinventing your own (which, admit it, will probably suck more), and they decouple your code from logging details (which can be changed later independently). You just say I want to log "something something" in the code, and you can later configure whether those messages are actually written to the logs, where those logs are, how the messages are formatted, etc without changing your log-using code.

Logging can be used, to some extent, to replace debugging or to aid it. Each log message is usually only a single line, like "Starting foo ...", "Ended foo ...", "The value of foo is $val", although we can also log large data structure dumps or file contents.

When writing web robots like HTML scrapers or interfaces to online banking sites, which are particularly fragile, it is often convenient to save each server's full response into a separate file so you can easily check each step by opening the saved file in a browser.

If you want to use Log::Log4perl or Log::Dispatch for this, you can too, using Log::Dispatch::Dir. This module will write each log message to a separate file in a specified log directory.

An example, in Finance::Bank::ID::Base I have code like this:


$self->logger_dump->trace(
"<!-- result of mech request #$i ($method ".Dump($args)."):\n".
$mech->response->status_line."\n".
$mech->response->headers->as_string."\n".
"-->\n".
$mech->content
);


where $mech is a WWW::Mechanize object. I use $self->logger for "normal" log messages and $self->logger_dump specifically for dumping contents. Both the logger and logger_dump attributes can be supplied by the module user, e.g.:


my $ibank = Finance::Bank::ID::BCA->new(
...
logger => Log::Log4perl->get_logger("Messages"),
logger_dump => Log::Log4perl->get_logger("Dumps"),
);


and the Log::Log4perl configuration is something like this:


log4perl.logger.Messages=TRACE, SCREEN, LOGFILE
log4perl.logger.Dumps=TRACE, LOGDIR

log4perl.appender.SCREEN=Log::Log4perl::Appender::ScreenColoredLevels
log4perl.appender.SCREEN.layout=PatternLayout
log4perl.appender.SCREEN.layout.ConversionPattern=[\%r] %m%n

log4perl.appender.LOGFILE=Log::Log4perl::Appender::File
log4perl.appender.LOGFILE.filename=/path/to/logs/main.log
log4perl.appender.LOGFILE.layout=PatternLayout
log4perl.appender.LOGFILE.layout.ConversionPattern=[\%d] %m%n

log4perl.appender.LOGDIR=Log::Dispatch::Dir
log4perl.appender.LOGDIR.dirname=/path/to/logs/dumps
log4perl.appender.LOGDIR.layout=PatternLayout
log4perl.appender.LOGDIR.layout.ConversionPattern=%m


This is convenient enough for me, but in the future I want to do some MIME checking to the log messages, so Log::Dispatch::Dir can automatically add a suitable file extension e.g. .html, .txt, .jpg, etc.

Mengecek rekening BCA dan Mandiri dengan Perl

Akhirnya, kesampean juga merilis modul Finance::Bank::ID::BCA dan Finance::Bank::ID::Mandiri. Kini Anda bisa mengecek rekening BCA dan Mandiri dengan Perl!

Bertahun-tahun lalu saya pernah membuat skrip serupa untuk KlikBCA tapi pake kombinasi curl/wget. Sejak terjadi perubahan layout/program di KlikBCA dari ASP ke Java, gak pernah lagi ngupdate skrip ini sampe beberapa bulan yang lalu. Dan akhirnya minggu lalu dan minggu ini menyempatkan memodulkan kodenya.

Oya, sebetulnya sebagian kodenya pertama-tama ditulis dalam PHP ;-p. Cuma, kode PHP-nya untuk kantor dan gak dirilis (dan mungkin gak akan pernah dirilis, karena saya males memaintain kode PHP utk publik).

Senin, 12 Oktober 2009

Planet CPAN

Recently I have been enjoying Iron Man blog posts that talk about some particular CPAN module like HTML::FormHandler, Finance::Quote, Term::ProgressBar, local::lib, Log::Log4perl, Dist::Zilla, even the good ol' Digest::MD5.

I'm hoping though that even more people (authors and users alike) would blog more about CPAN modules, because although we arguably have one of the richest sets of interfaces to our wonderful software library, with more than 16000 modules it's near impossible to browse them all. Feature blog posts certainly help people stumble on interesting stuffs even if they don't follow Recent CPAN Uploads.

Since ~ 95% of all interesting things in the Perl world are happening inside CPAN (not to belittle the huge efforts of the p5p team or the Parrot & Perl 6 designers/implementors, of course) shouldn't we be blogging more about it?

Filtering RSS

On a somewhat unrelated note, lately I've also been tired of all the Microsoft/Windows/Vista/7/8/9 that are filling up from the Slashdot RSS feed. Most are irrelevant to me as I use Linux, besides, those news are really minor/unimportant/of marketing type, like rereading old Vista reviews, intentionally ambiguous 128-bit version of future Windows, or repeated news items telling me that products are being delayed yet again. Who cares?

Nowadays I'm using Google Reader on a cellphone to read most feeds, so less junk would be nice. Google Reader doesn't have filtering yet, so I ended up using Yahoo! Pipes. Filtering and doing other stuffs to feeds (and other kinds of data like CSV) are surprisingly quick and easy using this web-based visual editor. A user-friendly Perl- and Unix-killer? :-) Just go to pipes.yahoo.com, click on My Pipes, create a new pipe, and do some drag-and-drops and text field filling, publish your pipe, and get RSS. Perfect (here are two examples of pipes I've created: slashdot-noms and slashdot-nofb). Looking forward to a more Microsoft-free news reading ahead.

CPAN download counter? +1!

From time to time people (including me once, a few years back) would ask questions like: what are the 'top' (or 'most popular' or 'widely used' or most downloaded) dists/modules on CPAN? Is there a download counter for each dist/module? Like this one from prz, a budding module author.

The answer is there isn't one, because CPAN is just a bunch of static files. The upside of this, CPAN is very easily mirrored (e.g. via FTP or rsync or offline via CDs) and served (e.g. via FTP, HTTP, or local filesystem). The downside, there isn't a place for much intelligence/logic on the serving side.

To implement this feature, we can put some stats gathering code on the client side, like what Debian has been doing for a while; in fact you can already see the list of most widely installed Perl modules from the data. Or we can add some stats to search.cpan.org like most viewed/clicked/downloaded dists and modules, and maybe top search keywords. Not representative of all mirrors, sure, but it's better than nothing.

Download counter, or at least Popular/Top Downloads, is a common feature on download/catalog/shopping/news sites, from freshmeat and Download.com, to Amazon and iTunes Store. So common that many users expect it to be there as a standard feature.

It's not hard to imagine why people like to know what's popular, what everybody else is using/doing, what's in, what's hot. It's a social side of human nature. And it's beneficial to know which modules are getting downloaded and used more, to direct development efforts to the more important stuffs. Volunteers can surely take the top modules list as one consideration when picking which project to spend their valuable time on.

What I'm not very clear on though is why, aside from PHP, many programming languages' communities don't like this particular feature? Do we hate competition, do we hate popularity contest, or are we just plain lazy?

Anyway, effort like CPANHQ might soon make the Top/most $foo modules, and more, possible. Yay!

Selasa, 06 Oktober 2009

Moving from bzr to git

As with a lot of coders out there, I've moved on from no source control, to CVS, to Subversion, and recently to one of the distributed ones. In fact, I tried both Bazaar and git roughly at the same time a couple of years back and have been quite comfortably using both to manage my projects.

Last week though, I've migrated all but one of my projects to git. The only bzr repository left is the one for $work where I want to avoid retraining people to use git since they are not coders.

I had no real complaints with bzr. Sure, it's generally slower than git, the repository size is slightly larger than git's, and branching is a bit more cumbersome than in git, but all of those have never bothered me enough to part with bzr.

The clincher, however, came when I needed to write some post-commit hook to post my commit messages to an internal web-based bulletin board. In bzr you need to write a plugin written in Python. Not that there's anything wrong with it, but I don't think I even want a Perl-based SCM where I need to write a Perl plugin for everything when a two-line shell script will do. Add to that fact that bzr doesn't provide a template/skeleton for the plugin and I have to spend a few minutes to google around the plugin API (in addition to refreshing my memory on Python syntax), etc.

I think I'm much more comfortable with git nowadays.

Some Indonesian-specific modules to check SIM, NPWP, NIK

Today for some reason I finally got motivated enough to write and release Business-ID-{NIK,SIM,NPWP} to CPAN. Been wanting to for years, but haven't got around to it due to lack of challenge and, more importantly, reference. For example, so far I haven't found a publicly available and authoritative document explaining the numbering scheme and the valid area codes for the SIM. I had to do some deducing (read: guessing) from some thousands of actual user-submitted SIM, NPWP, and NIK/KTP numbers (some of them most certainly bogus). Isn't it sad?

If you encounter some valid numbers being rejected by the modules, or vice versa, feel free to report them as bugs.