Jumat, 19 November 2010

Outputting pretty data structure on console programs

Our application has a command-line API interface for convenient access via shell/console. It used to output API result data in YAML:



# /c/sbin/spanel api --yaml File list --account steven --volume data --dir /public
---
dir:
atime: '1270675429'
ctime: '1285916065'
gid: 1023
group: steven
is_dir: 1
mtime: '1285916065'
perms: 493
uid: 1012
url: ~
user: steven
entries:
-
atime: '1284665000'
ctime: '1289609859'
dir: /public
gid: 1023
group: steven
icon_file: folder.gif
inode: 1984908
is_dir: 1
is_link: 0
mtime: '1289609859'
name: git
perms: 493
reldir: ''
size: 4096
uid: 1012
url: ~
user: steven
-
atime: '1270675424'
ctime: '1285130140'
dir: /public
gid: 1023
group: steven
icon_file: text.gif
inode: 1976727
is_dir: 0
is_link: 0
mtime: '1155368486'
name: .htaccess
perms: 436
reldir: ''
size: 48
uid: 1012
url: ~
user: steven
-
atime: '1270675424'
ctime: '1285130140'
dir: /public
gid: 1023
group: steven
icon_file: unknown.gif
inode: 1976725
is_dir: 0
is_link: 0
mtime: '1155368397'
name: .htaccess~
perms: 436
reldir: ''
size: 63
uid: 1012
url: ~
user: steven
total_num_entries: 3
url: ~


YAML is relatively readable if you compare to JSON or (shudder) XML, but I soon grew tired of reading YAML for data that should be tabulated and better formatted for human consumption.



Thus, Data::Format::Pretty::Console. The idea is for me, a lazy programmer, to throw it data structure of various kind and it will display it nicely suitable for console viewing. The command line API interface now by default shows nicely formatted text for API result data (but still provides --yaml and --json option). Please bear with this blog post's misformatting and assume it's all pretty:



dir:
.---------------------.
| key | value |
+--------+------------+
| atime | 1270675429 |
| ctime | 1285916065 |
| gid | 1023 |
| group | steven |
| is_dir | 1 |
| mtime | 1285916065 |
| perms | 493 |
| uid | 1012 |
| url | |
| user | steven |
'--------+------------'

entries:
.----------------------------------------------------------------------------------------------------------------------------------------------------------------------.
| atime | ctime | dir | gid | group | icon_file | inode | is_dir | is_link | mtime | name | perms | reldir | size | uid | url | user |
+------------+------------+---------+------+--------+-------------+---------+--------+---------+------------+------------+-------+--------+------+------+-----+--------+
| 1284665000 | 1289609859 | /public | 1023 | steven | folder.gif | 1984908 | 1 | 0 | 1289609859 | git | 493 | | 4096 | 1012 | | steven |
| 1270675424 | 1285130140 | /public | 1023 | steven | text.gif | 1976727 | 0 | 0 | 1155368486 | .htaccess | 436 | | 48 | 1012 | | steven |
| 1270675424 | 1285130140 | /public | 1023 | steven | unknown.gif | 1976725 | 0 | 0 | 1155368397 | .htaccess~ | 436 | | 63 | 1012 | | steven |
'------------+------------+---------+------+--------+-------------+---------+--------+---------+------------+------------+-------+--------+------+------+-----+--------'

total_num_entries:
3

url:


Rabu, 17 November 2010

Comparison of INI-format modules on CPAN

I'm not terribly happy with the state of Perl/CPAN support for the INI file format.

I have this requirement of modifying php.ini files programmatically from Perl like: set register_globals to On/Off, add/remove some extension (via the extension=foo lines), adding/removing some functions from the disabled_functions list, etc. So I would like to find a CPAN module that can just set/unset a parameter and leave formatting/comments alone as much as possible.

Turns out that among a dozen or so of INI modules on CPAN, a few of them do not do writes at all (e.g. Config::INI::Access or Config::Format::INI). And a few that do, write a la dump. That is, they just rewrite the whole INI file with the in-memory structure. All comments and formatting (even ordering, in some cases) are lost. Example: Config::INI::Writer and Tie::Cfg. And, last time I tried, I couldn't even install Config::IniHash from the CPAN client. Not good.

So I ended up with Config::IniFiles. And I needed to patch two features in before I could even read php.ini and write to it properly. This is an old module, which although still maintained, probably needs a rewrite or modernization. One reviewer in CPAN Ratings also wrote that this module fails in edge cases and the test suite is incomplete.

But, at least this module gets the work done. It tries to maintain comments, and even has a host of other features like delta, multiline, default section, etc. Most features seem to be rather exotic to me personally, but then none of the other INI modules on CPAN has the basic features that I needed.

(Short, grossly incomplete) comparison of Perl logging frameworks

After doing this post on comparison of Perl serialization modules, I intended to continue with other comparisons, and even thought on setting up a wiki or creating/maintaining a section on the Official Perl 5 Wiki, which already has a Recommended Modules section, although there is not much comparison being written for each recommendation. (Btw, I just noticed a change of domain for the Wiki, from perlfoundation.org to socialtext.net).

But of course other tasks soon took precedence, so until the Wiki idea is realized, I thought I'll just continue posting on the blog as usual.

There are **seriously** too many Perl logging frameworks out there. As the Log4perl FAQ mentions, "Writing a logging module is like a rite of passage for every Perl programmer, just like writing your own templating system."

So I'm not going to pretend like I've evaluated even half of logging modules that are on CPAN. Instead I'm just going to include a few worth mentioning.

Log::Dispatch and Log::Log4perl. Two of the arguably most popular Perl logging frameworks are Log::Dispatch and Log::Log4perl. They are like the Moose of logging frameworks: mature, feature rich, flexible, has a lot of support/extra/3rd party modules, but... "slow". I quote the slow part because first of all, speed is not an issue for the majority of applications. And second of all, they are not relatively slow at all compared to other modules until they actually log stuff to output. For example, doing debug() on a warn level is around 1,5mils/sec with with Log4perl, and 3mils/sec with Log::Fast. But for actual logging, Log::Fast can be 10-45 times faster than these two.

Log::Any. For most people, Log::Dispatch and Log4perl should suffice. I personally haven't been unable to produce a case where I can't customize Log4perl they way I want. This shows the flexibility of the module. So the only thing left for flexibility is a thin wrapper where you might want to switch logging framework (kind of like Any::Moose for logging). There are a few of these on CPAN, but I prefer Log::Any (and I've also made a thin wrapper for *that*, Log::Any::App). RJBS also made one: Log::Dispatchoulli. You might be interested in using it if you are interested in using String::Flogger.

Performance-wise, as with Moose, there are other alternatives: Log::Fast, for one. There are also a few other minimalistic frameworks, but I do not recommend using them as many of them are not flexible at all. Unless your application is really performance-critical.

I've most probably left out a lot of possibly interesting alternatives.

Jumat, 01 Oktober 2010

Sometimes you *don't* want circular checking

I use the nifty Data::Rmap to "flatten" DateTime objects into strings so they can be exported to JSON and handled outside Perl. But due to circular checking in Data::Rmap, this:

$ perl -MData::Rmap=:all -MData::Dump \
-e'$d = DateTime->now; $doc = [$d, $d];
rmap_ref { $_ = $_->ymd if UNIVERSAL::isa($_, "DateTime") } $doc;
dd $doc'


produces something like this:

["2010-10-01", ...unconverted DateTime object...]

For now I work around this by defeating Data::Rmap's circular checking, though I wonder if there's a better way.

$ perl -MData::Rmap=:all -MData::Dump \
-e'$d = DateTime->now; $doc = [$d, $d];
rmap_ref { $_[0]{seen} = {}; $_ = $_->ymd if UNIVERSAL::isa($_, "DateTime") } $doc;
dd $doc'


will correctly produce:

["2010-10-01", "2010-10-01"]

Kamis, 30 September 2010

Yet another stupid mistake #1

During a refactor of a data from array @foo to hash %foo, I used 'each' to iterate over the hash, but forgot to change the 'for' statement with 'while'. So I ended up with something like:

$ perl -MData::Dump -E'%a=(a=>1, b=>2);
for (my ($k, $v) = each %a) { $_ = "$k x"; dd {k=>$k, v=>$v, "\$_"=>$_} }'


And this is nasty because for(@ary) aliases $_ to each element in @ary, and in this case it modifies $k (quiz #1: and $v too, do you know why?) right under your nose! Thus the result are really messed up:

{ "\$_" => "a x", "k" => "a x", "v" => 1 }
{ "\$_" => "a x x", "k" => "a x", "v" => "a x x" }


Not to mention the loop stops after processing two items (quiz #2: do you know why?) But you might not realize that after you add some pairs to %a and wondering why they don't get processed.

The error message Perl gives is not really helpful, to say the least :)

Kamis, 23 September 2010

Comparison of Perl serialization modules

A while ago I needed a Perl data serializer with some requirements (supports circular references and Regexp objects out of the box, consistent/canonical output due output will be hashed). Here's my rundown of currently available data serialization Perl modules. A few notes: the labels fast/slow is relative to each other and are not the result of extensive benchmarking.

Data::Dumper. The grand-daddy of Perl serialization module. Produces Perl code with adjustable indentation level (default is lots of indentation, so output is verbose). Slow. Available in core since the early days of Perl 5 (5.005 to be exact). To unserialize, we need to do eval(), which might not be good for security. Usually the first choice for many Perl programmers when it comes to serialization and arguably the most popular module for that purpose.

Storable. Fast. Produces compact, portable binary output. Also available in core distribution. Does not support Regexp objects out of the box (though adding support for that requires only a few lines). Binary format used to change several times in the past without backward compatibility in the newer version of the module, giving people major PITA. Supposedly stabilized now.

YAML::XS. Fast. Verbose YAML output (currently doesn't seem to have option to output inline YAML). My personal experience in the past is sometimes this module behaved weirdly and died with a cryptic error, but I guess currently it's pretty stable.

There are other YAML implementations like YAML::Syck (also pretty speedy) and the old Pure-Perl YAML.pm and partial implementation YAML::Tiny. The last two might not be a good choice for general serialization needs.

Data::Dump. Very slow. Produces nicely indented Perl output. The strength of this module is in pretty output and flexibility in customizing the formatting process. Based on Data::Dump I've hacked two other specialized modules: Data::Dump::PHP for producing PHP code, and Data::Dump::Partial to produce compact and partial Perl output for logging.

XML::Dumper. Produces *very* verbose (as is the case with all XML) XML output. Slow. Aside from the XML format, I don't think there's a reason why you should choose this over the others.

JSON::XS. Fast, outputs pretty compact but still readable code, but does not support circular references or Regexp objects.

JSYNC. Slow, outputs JSON and in addition supports circular references but not yet Regexp objects.

FreezeThaw. Slow, produces compact output but not as compact as Storable. Does not support Regexp objects out of the box.

Apart from these there are many other choices too, but I personally don't think any of them is interesting enough to be a favorite. For example, last time I checked PHP::Serialization (and all the other PHP-related modules) does not support circular references. There's also, for example, Data::Pond: cute concept but of little practical use as it is even more limited than JSON format.

There are also numerous alternatives to Data::Dumper/Data::Dump, producing Perl or Perl-like code or indented formatted output, but they are either: not unserializable back to data structures (so, they are more of a formatting module instead of serialization module) or focus on pretty printing instead of speed. In general I think most Data::Dumper-like modules are slow when it comes to serializing data.

In conclusion, choice is good but I have not found my perfect general serialization module yet. My two favorites are Storable and YAML::XS. If JSYNC is faster and supports Regexp, or if YAML::XS or YAML::Syck can output inline/compact YAML, that would be as near to perfect as I would like it.

Hope this comparison is useful. Corrections and additions welcome.

Perl vs PHP (a bit of credit to PHP)

Just read this blog post. Comments are disabled, so I thought I'd add a blog post.

There are endless ways we can sneer at PHP's deficiencies, but since 5.3 PHP already supports anonymous subroutines, via the function (args) { ... } syntax. So:

$longestLine = max(
array_map(
create_function('$a', 'return strlen($a);'),
explode("\n", $str)
)
);


can be rewritten as:

$longestLine = max(
array_map(
function($a) { return strlen($a); },
explode("\n", $str)
)
);


though the example is not a good one since it might as well be:

$longestLine = max(
array_map(
'strlen',
explode("\n", $str)
)
);

Rabu, 01 September 2010

Book review: Catalyst 5.8 The Perl MVC Framework

Book information
Title: Catalyst 5.8 The Perl MVC Framework.
Subtitle: Build Scalable and extendable web applications using the Agile MVC framework.
Author: Antano Solar John.
Publisher: Packt Publishing.
Country: UK/India.
Year: 2010.

This book is a follow up to the 2007 Catalyst book by Jonathan Rockway (member of Catalyst core developer team). I have no idea how much of the content is changed between the two.

About the review(er)
This is a review on the electronic (PDF) edition of the book. I am a Perl developer and a CPAN author, but have not used Catalyst (or most other recent web frameworks, for that matter) before.

About Catalyst
So far I've managed to avoid learning about web frameworks and continue to create web applications the old way (CGI/CGI::Fast, direct DBI/SQL, a homemade simple templating language, and recently lots of jQuery and CSS play). Part of this is due to laziness, and part due to lack of need. I've never needed to create complex web applications in Perl. And the apparently heavy learning curve and complexities of Catalyst, Mojo, Dancer, etc just make me say don't bother.

But, thanks to this book, I find out that Catalyst project is not unlike a Perl CPAN module, with files/subdirectories like Makefile.PL, Changes, README, lib/, t/, etc. You can now even manage your project with Dist::Zilla (not mentioned in the book though as the plugin for this is new).

The good
This book is only about 200 (instead of 500+) pages long, which I appreciate. The preface is concise, and the explanation in the chapters are straightforward enough. The author uses clear and simple English sentences instead of long complex ones. The organization of topics into chapters is quite appropriate.

Missing topics
I didn't find any mention of Strawberry Perl, only ActivePerl. The examples are all using SQLite and no other databases. I wish AJAX and integration with one/more JavaScript frameworks like jQuery (and thus, CSS) is discussed more, as this is now very popular and common. But that will add significantly to the length of the book.

The first chapter on MVC also deserves some more extension.

There is no comparison whatsoever with any other Perl web frameworks or other non-Perl frameworks like Django and Rails.

I would've liked a chapter/subchapter on performance tuning and benchmarking (there is a 'Performance considerations' section in the Deployment chapter but that only covers the choice of webserver).

Plack/PSGI is not yet covered on this edition, which is a pity.

The rather bad
The author gives CPAN links to pages of specific release versions, e.g. http://search.cpan.org/~ash/DBIx-Class-0.08013/lib/DBIx/Class/Schema/
Versioned.pm which tends to break as new releases added and old releases removed from CPAN. But this is understandable because currently CPAN only provides http://search.cpan.org/dist/DBIx-Class/ and not something like http://search.cpan.org/dist/DBIx-Class/current/pod/Foo/Bar.pm. search.cpan.org does provide a more stable URL: http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual/FAQ.pod

The author also uses 2-space indent instead of 4, which I suspect is because he also uses Ruby/Rails.

The really ugly
The general editing of the book, and especially the code/output formatting, is the deal breaker here. I have not found another book that fares equally poorly in this regard.

The first paragraph of the preface already contains two very off-putting typos: "Frednic Brooks" (of Mythical Man-Month fame) and "MOOSE". Boxes drawn with ASCII characters which should align become wrapped and misaligned. When the long lines of code/output are wrapped, it is not clear which lines are wrapped and which are just new lines (some visual indicator should've added like a + or \ sign, line number, or striped background/lines).

There is a plain error in YAML syntax in p67, plain wrong MySQL configuration in p69.

Code formatting/editing is atrocious, with __PACKAGE__ sometimes becomes PACKAGE, or __Package__. Blank lines (which are significant for POD) are removed. And there is some garbage/random characters added in a few places. Totally unacceptable.

Verdict
Unfortunately I cannot recommend this book due to the utterly poor code formatting. I have no major problem with the content though.

Coding Style As A Failure Of Language Design?

Read this older blog post the other day. Hilarious at best, creepy at worst.

Arbitrary limitations should not be added to a general-purpose programming language unless for a really good reason. Do you really want to code in a language that forces you to indent with 2 spaces, never cross 80-column line, or require/forbid whitespace here and there? And besides, is there any language (no matter how strict the syntax of which is) which do not have some sort of coding style?

Jumat, 27 Agustus 2010

Wishlist for a service framework and/or manager

I maintain code for a few daemons/services written in Perl (most of them serve requests by forking/preforking). Reading post on Ubic, I started to feel that I am reinventing a lot of wheels. Currently I am doing these by writing my own code, as much as which I hope can be offloaded to CPAN in the future:

  • Autorestart when process size is becoming too big. We need to do this gracefully, meaning wait until there is no more clients being serviced, unless process size really gets too big in which case we need to restart immediately. Checking period can be configured.

  • Autorestart if script or modules change on disk. Also needs to be done gracefully. This is usually being recommended to be used only on development environment, but I use this too in production, for ease of deployment. But we need to check first (e.g. via "perl -c" or "eval + require" whether the new code from disk is okay.

  • Avoid duplicate instance (currently always using Proc::PID::File, but I'm open to better mechanism).

  • Limit clients concurrency. Sometimes this is simple (a single limit for all clients) and sometimes not so much (different limits for different IP/IP blocks/authenticated users/groups/etc).

  • Reap dead child processes and maintain a count of child processes.

  • Handle timed out clients. This is rather cumbersome with blocking I/O.

  • Write init script. This is the part I dislike the most, since there are tons of different OS flavors out there, and with more recent efforts like upstart, launchd, systemd, sooner or later I will certainly have to write different init scripts. I wish there is something equivalent to PSGI/Plack for general services, which can plug my code to whatever service manager might be out there.

Rabu, 25 Agustus 2010

Random Perl wishlists #1: uncapture modifier, require+import, backtick function

Uncapture modifier. The new /r regexp substitution modifier in Perl 5.13.2 indicates that there might be hope for even more modifiers in the future. A possible modifier might be one that ignores all capture groups, which can solve this guy's problem.

require that can also import. I wonder why "require" doesn't also support importing like "use" does: use MODULE_NAME LIST... since it already support "require MODULE_NAME". This way, whenever I want to defer loading some modules, I can just replace "use" with "require" and put the statement under some "if" statement.

the backtick function.. Do you find yourself often having to do use a temporary variable like this, $cmd = "some longish and ".quote("complexly formed")."command"; before doing backtick `$cmd`? This is because unlike in PHP, we have system() but no backtick(). Most probably remnants from shell. There is Capture::Tiny for more general solution but of course it's a bit more cumbersome.

Rabu, 28 Juli 2010

Startup overhead still matters

We all love Moose, and the subject of this question could have been rephrased better, but why do I get the feeling that not many people write pure CGI or command-line scripts in Perl (that got executed many times) anymore? After all, didn't Perl begin as a tool for sysadmin and only in the mid 1990's got picked up as the darling of CGI/web programming?

There are still many cases where/reasons why Perl scripts need to be run many times (instead of persistently long running).

  • It's much more stable (I've often need to kill or ulimit or periodically restart a Perl process because after days it grows to 500+ MB).

  • Sometimes CGI is all you get (especially in shared hosting environment, which is related to point 1).

  • Sometimes you need to run the scripts for many users, and it's not feasible (e.g. memory-wise) to let them all run persistently.

  • Many old scripts are designed that way.

  • Some environments require them that way (e.g. scripts run in .qmail are run for every incoming mail, scripts run by tcpserver are started for every incoming connection, etc).


There used to be projects like PersistentPerl or SpeedyPerl to let us easily make a Perl script persistent by just changing the shebang line (e.g. from #!perl to #!pperl), but these projects are currently not actively developed, probably due to lack of demand (?), or becase this kind of deployment tends to cause subtle bugs (I did get bitten by this a couple of times in the past). You can't just convert a script that is designed/written to be a one-off run into a long-running one without expecting some bugs, anyway.

And the Perl compiler (B::*, *.pmc) is also now deprecated, probably because it does not give that many startup cost saving after all (the fact that Perl has phasers like BEGIN/CHECK blocks means it has to execute code as it compiles them anyway).

And thus we're stuck with having to accept the startup cost of parsing & compiling for every script run. That's why startup cost matters. On our servers awstats runs many thousand of times everyday (2000-5000 sites x 10+ HTML pages), and since it's a giant script (10k-ish line) it has a startup overhead of almost 1s. I really would like to shave this startup overhead as it is a significant part of server load.

Until today many of my scripts/programs are still deployed as one-off command line scripts. And that's why instead of Moose I use Mouse (or Any::Moose, to be exact) whenever I can. And so far I can.

Senin, 05 Juli 2010

Spot the error

use Data::Rmap qw(:all);
use JSON;
use Data::Dump;
use Clone;
use boolean;

my $arg = from_json(q{{"1":true,"2":false}});
# convert JSON booleans to boolean's booleans
rmap_all { bless $_,"boolean" if ref($_) =~ /^JSON::(XS|PP)::Boolean$/ }, $arg;
dd $arg;


Hint: it's one character long.

In fact, this piece of code is full of Perl's traps (from Perl's lack of booleans obviously, to less obviously having to clone and rmap not working), it disgusts me.

Selasa, 29 Juni 2010

Reduce in Perl

Perl has grep/map/sort since probably forever (actually, sort() starts appearing since Perl 2.0). But even now, reduce is still not a builtin in Perl 5 (though available via List::Util), so doing reduce is probably not something that comes as naturally to Perl programmers. Meanwhile Ruby, JavaScript, and even PHP have their reduce operation builtin.

But then, reduce is "not really that useful" (you can just replace it with a simple for loop). So much that Python 3.0 now removes the function from the global namespace and reduces it (pun intended) to a mere member of functools. I guess reduce is really handy only if you are in a heavily functional language that lacks procedural basics.

This can be thought of as a testament to the level of language design skill that Larry has.

The rather funny thing is, in Perl 6, in addition to the reduce() List method there is also the reduce ([]) metaoperator as well.

Rabu, 23 Juni 2010

Perl vs JavaScript

Here are some notes I made while hacking on Language::Expr::Compiler::JS. Of course, there are a million differences between the two, but these focus mostly on operators and types. Hope it can be useful.

  • Double vs single quotes. There are practically no functional differences between double-quoted string and single-quoted one in JavaScript. In Perl, single quotes do not interpret escape sequences other than \\ and \', but in JavaScript both single- and double-quoted strings interpret the same set of escape sequences.

  • String escape sequences. Perl does not support JavaScript's \v (vertical tab), while JavaScript does not support \N{NAME} (named Unicode character), \e and \c[ (escape/control), \a (alarm bell).

  • Two undefs. JavaScript has two special nothingness/undefinedness: null and undefined. Strangely, null == undefined and they are equal to themselves, but they are not equal to any other values (including 0, '', false). The difference between the two is just this: undefined is not a keyword but a global variable (you can assign to it, but of course you shouldn't). If you want less confusion, just use null.

  • Behaviour of "+". Since JavaScript only has "+" (while Perl has "+" and "."), you should be aware that "+" in JavaScript coerces to strings when one of the operands is a string (e.g. 1 + "2" becomes "12"). In Perl, "+" coerces to numbers. So be careful when mixing numbers and strings.

  • You need to explicitly "return" value from a function. But there's a cute shortcut for one statement functions introduced in JavaScript 1.8: 'function (x) x*3' which is equivalent to 'function (x) { return x*3 }' so in this case you don't need the "return".

  • Boolean. JavaScript has real boolean. In Perl you can use 'boolean' from CPAN which gives you practically the same stuff.

  • JavaScript lacks a lot of Perl convenience operators, including <=> cmp, low-precedenced and/or/not, //, =~, ~~, qq(), qx(), qw(), m//, s///, **, etc.



All in all, I think JavaScript is quite nice and simple language with familiar syntax (at least compared to PHP). It also has lexical variables, anonymous functions, OO, etc.

Selasa, 22 Juni 2010

JSYNC is brilliant!

It's a brilliant idea: bank on JSON's popularity and more widespread implementations, add some of the important YAML features not present in JSON on top of it. The result is JSYNC, along with its preliminary CPAN module. (I'd probably picked a different name and choose something like "\" for prefix instead of ".", but hey, it's not my project :-)

A few months ago I was really desperate with the YAML situation in Perl. We have the largest number of YAML implementations, but none of them are good enough compared to Ruby's libsyck. I even contemplated converting all my YAML documents to JSON, but of course that plan was cancelled because JSON doesn't even support references nor objects.

Here's to hoping JSYNC will rocket to popularity soon enough. Ingy++.

Rabu, 09 Juni 2010

Custom dumping *in* Data::Dump

After blogging about my small patch to Data::Dump, I contacted Gisle Aas. He is quite responsive and finally comes up with a new release (1.16) of Data::Dump containing the cool new filter feature. My previous example after converted to use the new feature becomes:

$ perl -MData::Dump=dumpf -MDateTime -e'dumpf(DateTime->now, sub { my ($ctx, $oref) = @_; return unless $ctx->class eq "DateTime"; {dump=>qq([$oref])} })'
[2010-06-09T12:22:58]


This filter mechanism is quite generic and allows you to do some other tricks like switching classes, adding comments, and ignore/hide hash keys. The interface is also pleasant to work with, although starting with this release the "no OO interface" motto should perhaps be changed to "just a little bit of OO interface" :-)

Aren't we glad that stable and established modules like this are still actively maintained and getting new features.

Thanks, Gisle!

Rabu, 02 Juni 2010

Menunggu statistik-statistik menarik dari sensus 2010

Koran PR beberapa minggu lalu menulis, Sensus Penduduk 2010 selain bertujuan mencacah jiwa, juga ingin "mencari fakta-fakta unik." Salah satu (dan satu-satunya) contoh: mencari orang tertua. Dan benar memang, selama beberapa minggu ini sudah ada beberapa artikel yang meliput tentang nenek di kampung anu yang berumur 115th, lalu rekornya dipecahkan oleh yang berumur 120th, lalu 125th, dst. Terakhir kalau tidak salah ada yang lebih dari 140th (walaupun semua klaim usia ini berdasarkan ucapan semata, bahkan tidak ada akte lahir atau bukti tertulis lainnya).

Tentu saja, selain hanya mencari orang tertua, masih ada banyak sekali hal menarik yang bisa diekstrak dari data sensus ini. Misalnya, saya harapkan, BPS dapat menerbitkan adalah daftar nama depan dan nama belakang yang paling lazim, seperti yang dilakukan oleh badan sensus Amrik.

Saat menyusun versi awal modul Perl Locale::ID::GuessGender::FromFirstName, saya kesulitan mencari basis data nama yang bisa dipakai, karenanya saya mengambil 1000 nama depan terlazim dari database pelanggan kantor. Tentu saja, andaikan ada data yang lebih representatif, seperti dari sensus penduduk, tentu jauh lebih baik.

Seandainya diberi kesempatan, saya bersedia mengolah data mentahnya ;-)

Kamis, 27 Mei 2010

Optimizing for $money?

While some sites optimize for speed and bandwidth usage, others do the opposite. About a decade ago when Internet connection at the office (then at pathetic speed of 128-256kbps) was experiencing serious slowdown, I noticed that images and even flag icons from the Summer Olympic website are deliberately made totally uncacheable, by setting Expires header value to a past date. Apparently IBM is still doing the same trick for the Grand Slams sites.

$ cctrl() { perl -MLWP::UserAgent -E'$ua=LWP::UserAgent->new; $res=$ua->get($ARGV[0]); say $res->header("cache-control")' $1; }

RSS icon, only cacheable for several hours:

$ cctrl http://www.rolandgarros.com/images/nav/rgr_nv_00000g3.gif
max-age=15000

In fact all content photos are also cacheable for several hours only, despite already having unique URLs.

Yellow button, which surely won't change a lot (and has a fairly unique URL anyway), cacheable only to a little over 11 minutes!

$ cctrl http://www.rolandgarros.com/images/misc/rgr_ms_00000g2.gif
max-age=700

Compare to:

$ cctrl http://www.facebook.com/images/app_icons/newsfeed.gif
max-age=2592000

or even:

$ cctrl http://www.ibm.com/i/v16/t/ibm-logo.gif
max-age=2592000

Jumat, 21 Mei 2010

Custom class dumping for Data::Dump

I'm using DateTime objects a lot these days: anytime I get some date/time data from outside of Perl, the first thing I do is convert them to DateTime object, to avoid calculation/formatting hassle ahead.



However, the dumps are not pretty.



% perl -MDateTime -MData::Dump -e'dd [DateTime->now]'



[

bless({

formatter => undef,

local_c => {

day => 21,

day_of_quarter => 51,

day_of_week => 5,

day_of_year => 141,

hour => 8,

minute => 55,

month => 5,

quarter => 2,

second => 36,

year => 2010,

},

local_rd_days => 733913,

local_rd_secs => 32136,

locale => bless({

"default_date_format_length" => "medium",

"default_time_format_length" => "medium",

en_complete_name => "English United States",

en_language => "English",

en_territory => "United States",

id => "en_US",

native_complete_name => "English United States",

native_language => "English",

native_territory => "United States",

}, "DateTime::Locale::en_US"),

offset_modifier => 0,

rd_nanosecs => 0,

tz => bless({ name => "UTC" }, "DateTime::TimeZone::UTC"),

utc_rd_days => 733913,

utc_rd_secs => 32136,

utc_year => 2011,

}, "DateTime"),

]



It gets worse when you have some records each with DateTime object in it.



That's why I added a couple of mechanisms to allow us to custom a class' dump.



$ perl -Ilib -MDateTime -MData::Dump -e'$Data::Dump::CUSTOM_CLASS_DUMPERS{"DateTime"} = sub { "$_[0]" }; dd [DateTime->now]'

[2010-05-21T08:57:45]



or:



$ perl -Ilib -MDateTime -MData::Dump -e'package DateTime; sub dump { "$_[0]" }; package main; dd [DateTime->now]'

[2010-05-21T08:58:09]

I know some other dumper in CPAN probably has this ability, but I like Data::Dump's output.

If you want to take a look at a couple of small patches to Data::Dump: http://github.com/sharyanto/data-dump

I've also contacted Gisle Aas to ask what he thinks of it.

Kamis, 13 Mei 2010

On RJBS's automatic version numbering scheme

Everytime I browse through CPAN recent uploads, and see versions of modules with RJBS's automatic numbering scheme, like 2.100920 or 1.091200 I tend to read it as 2.(noise) and 1.(more noise).

The problem is that it doesn't look like a date at all (is there any country or region using day of year in their date??). I've never been bothered enough with this though, as I don't use this scheme myself, but have always had a suspicion that this obfuscation is deliberate for some deep reason.

Turns out that it's just a matter of space saving and floating point issue. I'm not convinced though, is x.100513n (YYMMDD, 6 digits + 1 digit serial = 7 digits) really that much longer than x.10133n (YYDDD, 5 digits + 1 digit serial = 6 digits)? Is there a modern platform where Perl's numbers are represented with 32-bit single precision floating point (only 7 decimal digit precision) where it will present a problem when n becomes 2 digit using YYMMDD scheme?

Based on past experiences, since it is unlikely that I will do more than 20 releases in one month (usually even only once or twice a month or less frequently), if I were to adopt a date-based automatic versioning policy, perhaps I'll pick x.YYMMn where n is omitted for the first release, and then 1..9, and then 91..99 (and then 991..999 and so on). This way, most releases have the shortest number of digits. I don't "incur cost" for the first few releases (which anyway will be all there is, most of the time).

1.1005
1.10051
1.10052
...
1.10059
1.100591
1.100592
...
1.100599
1.1005991
...


In fact, I bet most modules have only a few releases per year. So how about this scheme, x.YYn:

1.10 - first release of the year
1.101 - second
1.102 - third
...
1.109 - tenth
1.1091 - eleventh
1.1092 - 12th
...
1.1099 - 19th
1.10991 - 20th


Or how about x.Dn (releases per decade) or even x.Cn (releases per century)? :-)

My brain prefers that I don't use long version numbers. Except when the version number is long because of some date (e.g. to indicate freshness of release). But why torture ourselves with a date that we need several seconds to parse in our head?


So I'll stick with 0.01, 0.02, 0.03, ... for now.

Menebak gender orang Indonesia berdasarkan nama depan

Sesuai janji di posting blog beberapa bulan lalu, hari ini saya merilis Locale-ID-GuessGender-FromFirstName. Nama modulnya jadi panjang ya? :-p

Sebab ke depannya, seiring dengan modul pelengkap yang direncanakan, Locale-ID-ParseName-Person, kita juga bisa menebak gender seseorang dari atribut nama lainnya, misalnya dari sapaan (Bapak/Ibu/Bung/Mbak), dari gelar keagamaan (H/Hj), dari pola nama kedaerahan (mis: I Ketut/Ni Ayu), dll.

Rilis pertama ini akurasi dan kelengkapannya belum bisa diandalkan, tapi sudah bisa dicoba-coba. Saya sudah menambahkan sekitar 1000 nama-nama umum dari database klien kantor (soalnya kesulitan mencari database yang lebih bagus, tidak seperti di Amrik yang bisa mengambil data dari biro sensus di sana). Algoritma heuristik (sangat) sederhana juga sudah ditambahkan, beserta dengan algoritma untuk mencari dari Google.

Ada yang punya waktu luang membuat skrip CGI sederhana, atau aplikasi Facebook, untuk interface web modul ini? Sekalian mengumpulkan lebih banyak data dan koreksi. Saya sih pengen aja, cuma males :p

Rabu, 12 Mei 2010

perlmv: Renaming files with Perl code

perlmv is a script which I have personally been using all the time for years, but has only been uploaded to CPAN today. The concept is very simple, to rename files by manipulating $_ in specified Perl code. For example, to rename all .avi files to lowercase,

$ perlmv -de '$_=lc' *.avi

The -d option is for dry-run, so that we can test our code before actually renaming the files. If you are sure that the code is correct, remove the -d (or replace it with -v, for verbose).

perlmv can also save your code into scriptlets (files in ~/.perlmv/scriptlets/), so if you do:

$ perlmv -e 's/\.(jpe?g|jpe)$/.jpg/i' -W normalize-jpeg

You can later do this:

$ perlmv -v normalize-jpeg *.JPG *.jpeg

In fact, perlmv comes with several scriptlets you can use (more useful scriptlets will be added in the future):

$ perlmv -L
lc
pinyin
remove-common-prefix
remove-common-suffix
uc
with-numbers


Let me know if you have tried out the script.

Rabu, 05 Mei 2010

So is wantarray() bad or not?

The style of returning different things in list vs scalar context has been debated for a long time (for a particular example, this thread in Perlmonks).

A few months ago I made a decision that all API functions in one of my projects should return this:

return wantarray ? ($status, $errmsg, $result) : $result;

That is, we can skip error checking when we don't want to do it.

Now, in the spirit of Fatal and autodie, I am changing the above to:

return wantarray ? ($status, $errmsg, $result) :
do { die "$status - $errmsg" unless $status == SUCCESS; $result };


But somehow I can still see myself and others tripping over this in the future, as I have, several times so far. It's bad enough that for each API function one already has to remember the arguments and their types, and one kind of return and its type.

Maybe I should just bite the bullet and admit the misadventure into wantarray(), and that context-sensitive return should be left to @foo, localtime(), and a few other classical Perl 5 builtins that have been ingrained in every Perl programmer's mind.

Rabu, 28 April 2010

Module Wishlist: magical loading of module

This Module Wishlist series is meant to surprise me with the power of CPAN. I wish about or dream up some module without first checking on CPAN, and hopefully can be delighted when what I want is already there.

Don't you hate it when you have to do:

$ perl -MSome::Really::Long::Module -e'print Some::Really::Long::Module->foo'

The goal is to be able to say something very close to:

$ perl -e'print Some::Really::Long::Module->foo'

and my module is loaded automaticaly.

To die or to croak, that is the question

Lately I've been tempted to use croak() instead of die(). Somehow it seems more considerate to users. But finally in the end I'm sticking with die(). In fact, I think the Carp module should be, well, croaked.

The reasons:

1. Even though Carp has been included in Perl 5 since forever (Module::CoreList tells me: "5"), carp(), croak(), cluck(), and confess() are still not builtins, which means I still need an extra "use Carp".

2. Too many keywords! Most other languages only have "throw" or "raise".

3. Names are too weird! I understand the difficulty of coming up with a concise set of names that are similar but slightly different. But requiring these weird names might also indicate that there is something fishy about the concept itself.

4. The choice of showing a stack trace or not should not be in the individual functions. That burdens the programmer with too much thinking.

5. Even with Carp qw(verbose), what's to be done with codes that still die() and warn()? (But luckily there's Carp::Always.)

6. Showing stack trace should not be this difficult. I still think there should be a command-line switch for Carp::Always (or alias it to 'oan' :-)

7. Programmers (module writers) make mistake. They should not skip a call frame.

In short, I think Carp makes things a little bit too complicated. But what's Perl without complication? :-)

You know you're a Perl programmer when...

You know you're a Perl programmer (or a CPAN author) when...

When you're thinking of packaging every piece of code as a CPAN module.

A couple of days ago I need a subroutine that takes a nested data structure (e.g. {vol1 => {a=>{b=>{c=>10}}}, vol2 => {a2=>{b2=>{c2=>20}}}}), a Unix-like path string (e.g. "vol1:/a/b/c") and return the branch/leaf node of the data structure according to the specified path (in this example, 10).

After browsing CPAN and a few minutes of reading the POD of some modules and not finding exactly what I wanted [*], that subroutine idea quickly transformed into an idea of a full-fledged CPAN module. The next day I uploaded Data::Filesystem to CPAN, which is actually yet another Data::Walker- / Data::Path- / Data::DPath-like module.

Turns out that I really don't need that module (yet, maybe someday). What I needed is just a simple Perl subroutine, and nothing more, because I will need to create a Javascript and PHP equivalent for it. Porting a whole module is not something I even want to do.

I wonder just how many CPAN authors that (do not) start their modules this way: overengineering of a small problem after not finding exactly what they want in CPAN.

[*] Btw, not finding what you want in one of the millions of CPAN modules has got to be one of the saddest thing in the universe. :-)

Minggu, 18 April 2010

Yada Yada, Buat apa ?

Perl 5.12.0 baru saja keluar dan salah satu yang baru yaitu Yada Yada Operator.

Saya hanya bingung kapan atau situasi seperti apa yang membuat yada yada operator bisa (dan bagus) untuk digunakan ?,
Selintas saya teringat seperti pending nya RSpec.

Kamis, 15 April 2010

Tip sprintf()

Salah satu fitur sprintf() (dan printf()) yang agak jarang diketahui/dipakai orang adalah bahwa sprintf() mendukung spesifikasi posisi argumen di dalam string formatnya, menggunakan prefiks NOMOR + "$":

% perl -E'say sprintf(q[%d %d %d], 1, 2, 3)'
1 2 3

% perl -E'say sprintf(q[%2$d %3$d %1$d], 1, 2, 3)'
2 3 1

Sayangnya, sprintf() tidak mendukung binding berdasarkan nama, seperti di Python:

print 'This {food} is {adjective}.'.format(adjective='absolutely horrible', food='spam')
This spam is absolutely horrible.

Kadang-kadang binding berdasarkan nama lebih nyaman, karena jika terjadi penambahan/pengurangan argumen, kita tidak harus menggeser-geser posisi lagi. Beberapa aplikasi tertentu seperti translasi juga kadang bisa dibuat lebih enak interfacenya seandainya menggunakan binding berbasis nama.

Jadi, bagaimana solusinya di Perl? Bisa dengan modul seperti String::Formatter, atau membuat sendiri :-) (seperti yang saya lakukan baru-baru ini di Data::Schema):

# $extra = {mverb => "harus"}; # mverb juga bisa 'sebaiknya'
# $args = [1, 10];
print stringf("Data %(mverb)s di antara %(0)d sampai %(1)d", $args, $extra);
Data harus lebih kecil di antara 1 sampai 10.

Fungsi stringf() mencoba mencari nilai binding di argumen kedua dst. Argumen dapat berupa hashref maupun arrayref, jadi saya bisa menggunakan binding berdasarkan posisi maupun nama. Nyaman :-)

Rabu, 14 April 2010

Data::Dumper (Unfortunately, Part 2)

One of the first things a Perl programmer will notice when learning about Data::Dumper is: how weird and "inside out" the OO interface is. This is, I think, another unfortunate accident in the Perl history, as Data::Dumper, being the first of such modules, gets into the core in early Perl 5 and remains popular up until this day. But the interface and default settings apparently annoy a lot of people so much that alternatives and wrappers like Data::Dump, Data::Dumper::Again, Data::Dumper::Concise, among others, sprung up to life.

A loose analogy would be CVS which was popular for (too long) a time, and following it the explosion of alternative version control systems. Eventually after this phase a winner will emerge or dominate. In the version control system case it appears to be git. And in the Perl case I think it will be a builtin perl() method/function, like in Perl 6. Probably in 5.14? 5.16? 5.18? Don't you think it's about time Perl can "natively" dump its own structures in Perl, just like Python, Ruby, PHP, etc have been able to for a long time?

(Btw, lest anyone thinks otherwise: I do love DD. It has lots of options and has served its purpose well over the years.)

List::Util, List::MoreUtils, Util::Any (Unfortunately, Part 1)

The dichotomy of List::Util and List::MoreUtils is one of the unfortunate annoyances in Perl. One is without s, one is with s. Which function belongs to which? And no, you can't simply say, "f*ck it, just import everything!" as List::Util doesn't provide the usual ":all" import tag (RT).

Some thoughts (from someone who is largely ignorant on the history of both modules), all IMO:

1. Since List::Util is basically a convenient library, convenience should've been its main design goal. It should've been inclusive enough. The decision to deny the inclusion of any(), all(), none() just because they are too "trivial" to implement in one line of Perl was a bit strange, since max(), min(), etc are also trivial to implement in Perl.

2. List::MoreUtils should've included all the functionalities of List::Util, so one can use it *instead of* List::Util.

But hey, what happened happened.

Btw, we also have Perl 6's junction taking the "all", "any", "none" keyword.

And we'll see whether solutions like Util::Any will catch on, as it's another syntax to learn, another module to download and install. As with many annoyances, they are actually not that big of a deal. One can just spend a few seconds looking up the documentation to find the functions he/she wants, and after about tens of uses should remember which ones are in which.

Just when I'm warming to 5.10, comes 5.12!

Perl is far from dead/dying nowadays, with 5.12 being released recently, and the yearly timed-based release plan and all. In fact, just after I start to be comfortable using some of the 5.10 niceties, here comes a whole new version with even more niceties waiting to be explored!

Features in 5.10 I'm using regularly.

Defined-or (if there's only one feature I can have in 5.10, I pick this one).

State variables (love it!).

Features in 5.10 I'm starting to use.

-E switch (but my reflex still says -e all the time).

Recursive pattern in regex (e.g., via Regexp::Grammars).

say() (maybe if I say it often enough I'll start to say say more).

Features in 5.10 I rarely/ever touch.

Smart match (I know it's a godsend, but strangely I never feel the need for it so far).

given/when (I'm sticking with if/elsif/else, especially since given/when cannot be used as an expression yet).

Named capture in regex (yeah, old habits die hard).

5.10 and 5.12. IMO, 5.10 contains more "significant" visible new features for end users (i.e. Perl programmers), especially in the area of new syntax addition. It's 5 years in the making and delivers many features borrowed from Perl 6. But that is not meant to belittle 5.12 which also packs some major goodies, especially pluggable keywords. This one promises to usher us into a world of new syntaxes and mini languages, though it also confirms Perl as being a language that is "impossible to parse" and it surely will pose a challenge/headache for PPI and syntax highlight/Intellisense writers. I look forward to something like a better embedded SQL and templates (using pluggable keywords instead of treating everything as strings all the time).

Rabu, 07 April 2010

Data::Dump::PHP

I actually don't believe there isn't something like this in CPAN yet. Well, actually there is PHP::Var, but it has bugs, doesn't handle scalars, and doesn't do recursive structure. But then I am equally surprised to be able to hack Data::Dump::PHP in just a couple of hours, by blatanly copying from Gisle Aas' Data::Dump and just modifying only what's necessary.

And another note, PHP's var_export() currently can't dump recursive structures, which Data::Dump::PHP can.

Kamis, 01 April 2010

Bahasa yang buruk vs programer yang buruk

Apakah kita seharusnya menangisi kenyataan bahwa Perl tidak lagi menjadi bahasa primadona untuk Web? Rasanya sulit bisa mengejar popularitas PHP, atau Ruby dan Python saat ini di domain pemrograman Web. Dulu pertengahan 1990-an Perl dipilih karena belum banyak bahasa lain yang tersedia default di server-server Unix. Alternatifnya saat itu hanyalah C, shell, atau Tcl. Kini persaingan amat ketat/banyak. Perl termasuk salah satu yang lebih sulit/lama dipelajari dan selain itu memiliki imej "tua" (padahal umurnya gak beda jauh dengan rekan2nya, hanya sekitar 1-2 tahun dengan Python dan hanya 5 tahunan lebih dengan Ruby; semua bahasa2x ini sudah hampir atau lebih tua dari 20 tahun).

Di satu sisi kehilangan pamor/momentum/posisi jawara/apalah tentu gak mengenakkan. Tapi di sisi lain, ada manfaatnya. Para "programer" web yang cenderung lebih banyak menghasilkan kode-kode yang buruk jadi meninggalkan Perl. Saya ingat dulu saat Perl popular, betapa komunitas Perl dianggap elitist, eksklusivist, sombong, angkuh, tidak ramah terhadap pemula. Dan bahasa-bahasa lain mulai mendapat tempat di hati khalayak ramai karena menawarkan komunitas yang lebih ramah pemula. (Belakangan, komunitas Perl pun mulai melunak dan menginisiasi effort2x untuk lebih merangkul pemula, seperti membuat milis beginners@, dsb. Tapi mungkin sudah terlambat).

Salah satu alasan mengapa komunitas Perl "benci" pemula adalah: karena begitu banyak niubi yang jadi programer karbitan/jadi2xan berbondong2x mempelajari Perl, kadang setengah2x (atau seperempat2x!), dan selalu mencampurkan konsep Perl dan CGI. Selalu menulis Perl dengan PERL. Selalu mendecode parameter CGI sendiri (karena mengikuti instruksi buku2x Perl tak bermutu), padahal di Perl 4 pun sudah ada cgi-lib.pl. Selalu menanyakan persoalan sepele yang sudah sejak lama ada di FAQ. Selalu mengkopi paste kode dan menulis skrip yang begitu hancur2xan jeleknya.

Sekarang rupanya mayoritas dari mereka sudah berpindah ke PHP. Sebagai pengurus server hosting Linux, sudah sering saya harus mengecek aplikasi PHP milik klien hosting yang bermasalah. Dan tiap kali saya mengintip kode sumbernya, kadang saya tersenyum pahit, kadang mengelus dada, kadang geleng2x kepala. Program2x jelek dan berantakan ternyata tidak pernah punah. Dulu di Perl, sekarang di PHP. Kalau dulu Matt's Script Archive jadi biang hole, kini ada phpBB, WordPress, Joomla sebagai penerusnya.

Apakah Rails atau Django akan kebal dari para programer buruk? Don't underestimate the power of stupid people :)

Maaf, saya tidak bermaksud berarogan ria di sini. Ada berbagai macam alasan mengapa seseorang bisa disebut programer buruk, seringkali itu bukan karena dia bodoh. Deadline yang terlalu singkat menyebabkan harus kopi paste kode. Pengetahuan yang minim karena pengalaman kurang menyebabkan desain yang naif. Peran bahasa untuk melakukan "nudging" dan manajemen insentif untuk memperbaiki kebiasaan-kebiasaan yang salah memang berpengaruh, tapi selalu ada ruang untuk beginner's mistakes. Dan selalu harus ada refactoring. Programer yang buruk tidak pernah melakukan refactoring.

Jadi, bersyukurlah karena nanti 10 tahun lagi para generasi programer baru tidak lagi banyak mengutuk Perl karena harus memaintain kode lama CGI yang sudah membusuk. Melainkan mengutuk PHP karena diwarisi segunung kode spageti busuk bercampur HTML. Atau mengutuk Ruby karena peninggalan kode-kode busuk Rails dengan desain objek yang terbalik-balik dan pattern-pattern salah kaprah.

Programer yang buruk selalu ada sepanjang masa. Bahasa yang saat itu banyak dipakai yang akan jadi kambing hitamnya. :-)

Rabu, 31 Maret 2010

Kompetisi, kompetisi

(Berhubung lagi gak mut dan ada kerjaan2x lain, kelewat seminggu deh posting Iron Man. Menurut aturan, seharusnya status ane jadi kertas lagi :-( Ternyata tidak, karena ada posting dari zak. Thanks zak :) )

Tadi barusan baca postingan Sawyer X tentang postingan Adam Kennedy yang berencana mengadu Dancer dan Mojolicious. Exciting banget!

IMO, dari dulu seharusnya komunitas Perl menggelar inisiatif2x kompetisi seperti ini.

Biasanya yang sering terjadi di CPAN adalah kelahiran modul demi modul baru yang merupakan eksperimen atau proyek pribadi milik orang2x berbeda. Kurang suka dengan sebuah modul? Buat saja tandingannya, alternatifnya, versi Anda sendiri. Lihat ada berapa belas framework web di Perl sejak Catalyst, ada berapa puluh modul untuk validasi data (ane juga nyumbang satu nih hehe), dan mungkin seratusan modul konfigurasi.

Tapi yang membuat pengguna bingung, bagaimana memilih modul terbaik? Terpopular? Tercepat? Dsb. Kurang ada perbandingan2x atau benchmark2x atau kontes. Komunitas Perl seolah gak suka dengan kompetisi. Bahkan counter download aja dari dulu gak jadi dibuat2x.

Kita perlu ingat bahwa kompetisi mendorong inovasi. Kurang inovasi berarti mati.

Ayo kita berkompetisi, ayo kita saling beradu secara positif.

-- sh (Yang sedang menggodok versi Data::Schema berikutnya untuk menjadi DFV-killer :-)

Jumat, 26 Maret 2010

Autovivification

Berdasarkan pertanyaan dari beberapa orang yang agak kebingungan dengan istilah autovivicationautovivification, mungkin secara singkat saya akan langsung memberikan contoh tentang apa itu yang disebut dengan autovivication.autovivification
Ini dia autovivication autovivification:
my $hash;
$hash->{aaa}{bbb}{ccc} = 'sebuah nilai';

Ketika dideklarasikan, $hash tidak mempunyai key dan value, tetapi kemudian kita memberikan suatu value pada key yang 'undefined'.

Ya sesingkat itulah definisi dari autovivication.autovivification

Rabu, 17 Maret 2010

Modules *AND* applications

"Why choose?" -- Fatima Dinssa

In Modules vs Applications, Sawyer X noted that one of the "issues" (emphasis mine) of Perl people is the tendency to write modules instead of applications. That CPAN is great but due to the lack of end user programs there is no WOW factor. He suggested that we write programs/applications that everyone can use to attract more people to Perl.

While I agree with the last suggestion, I don't agree with the preference to modularize everything as an issue. As someone who wrote a program years ago that is comprised of many separate scripts and duplicated code (even in different languages, just for the fun of it) and still have to maintain it today, I'd say that not putting as much code as possible into reusable modules is a mistake.

I'd instead suggest that we still write modules (which is what made CPAN great anyway), but try to *also* accompany each distribution with a demo app (preferably in the App:: namespace).

I myself will try to do that from now on.

Log::Any::App (2)

Following up on my previous post, I've just uploaded Log::Any::App to CPAN.

Do you write or use modules that use Log::Any? Or, do you want to use Log::Any conveniently in an application? Now you can just do this:

% perl -MLog::Any::App -MOtherModuleThatUsesLogAny -e ...

and the logs will be displayed to screen. The default level is WARN, but if you want to debug things:

% DEBUG=1 perl -MLog::Any::App ...

or if you want to quiet things down:

% LOGLEVEL=error perl -MLog::Any::App ...

If you put 'use Log::Any::App' in your script, when run it will by default log to file too (~/prog.log or /var/log/prog.log). It can even automatically log to syslog.

Zero-conf. No more long incantation.

Please tell me what you think.

Kamis, 11 Maret 2010

Choosing test names

Which test names do you prefer?

"tong() method can connect to database"
"tong() method can disconnect from database"
"sha() method can delete an existing file"
"sha() method fails when deleting a non-existing file"

or:

"tong 1"
"tong 2"
"sha 1"
"sha 2"

They are both rather extreme, but if I had to choose, I would still rather go with the shorter ones. I tend to treat test names more like unique IDs, and when things go wrong I just look up the actual test code.

I wouldn't mind verbose test names though if they can somehow be automatically generated (a future Google Translate project, perhaps?) from code, because they are just repeating what the code says.

To repeat myself, it's the DRY principle.

Rabu, 03 Maret 2010

On reading code

Last week I started using github, forked a project, and read some of miyagawa's beautiful code. Later on the weekend, I imitated a particular style I found from his code to improve my own code.

And then I realized: during the course of many years as a programmer, I really really seldom read other people's code, especially real-world code. Sure, I do whenever I have to patch something. But other than that, practically never.

Other than code, I do read an awful lot: books, magazines, mailing lists, forums, blogs, web pages. I have never doubted the benefits of reading for improving knowledge and understanding, so why haven't I read more code? A couple of reasons I can think of:

1. We programmers are not paid to read code. We are paid to write programs, to churn out lines upon lines of code. Heck, we're not even paid to write good code, we're paid to get the job done.

2. Reading code is hard. It can really drain your brain. Imagine a recipe book where the flow of instructions is not linear (jump to line X, jump to page Y and then return here), where a misplaced character can destroy the whole recipe, and where everything is so interdependent and interlinked that you need to read everything twice first before you can begin to understand it.

3. Reading code is boring. Can you really curl up with a good program the way you can with an exciting novel?

4. Reading code is not necessary. A good reusable piece of code need not be read anyway, all you need is its API documentation. Programmers are hired regardless of their ability to read real-world code, there are no interview tests in reading code other than a few or at most a dozen lines of it. Heck, programmers are even hired despite their inability to write any code at all.

After experiencing that one session of code reading can do some wonders, I'll make it a program to read more (good) other people's code.

Jumat, 26 Februari 2010

{Logging, Messaging, Notification, Auditing, ...} frameworks

They're all the same, in many respects.

In my module code, I want to be as flexible as possible. I want to be as detached from implementation details as possible.

I just want to generate a log message (or a notification, or an audit entry). I don't want to care where it ends up, how it is sent, how it is filtered/categorized, who the recipient(s) is/are, what medium(s) is/are used, etc. Let whoever uses the module configure it all.

And I don't want to reinvent the wheel, I want to use an existing framework. A logging framework seems to be a sensible choice. Let's use Log::Any for example. This is a snippet from a module for a file-manager-type web app:

my (@success, @failed);
for (@files) {
unlink $_;
if ($!) { push @failed, $_ } else { push @success, $_ }
}
$log->info("Done deleting files. Files deleted: %s. Files not deleted: %s", \@success, @failed);


The message can end up:
  • in a system-wide log-file;
  • discarded (if configured level of output is less than INFO);
  • in a per-user log file;
  • in a notification email to user (should the user configure it);
  • in a notification email to sysadmin (should the admin configure it);
  • into audit table in database (if the application is using a database);
  • in the console (if this module is also used in a command-line based app);
  • as a desktop notification (if this module is also used in a desktop app);
  • in an internal web-based forum;


However, some of those outputs usually need some additional metadata. In email and desktop notification we usually need separate subject and body. In the audit table in database we need to fill in who (the logged in user). In web applications we usually need to log the remote IP address (or even User-Agent string). In desktop notification or forum sometimes we would like to be able to update the message instead of creating a new one.

Instead of in the logging output module (the appender and formatter, in Log4perl-speak), sometimes I *do* care and need to specify this in my module.

Logging APIs usually do not allow us to do so. Output/appender modules are only given the message as a string.

So I intend to cheat by embedding a JSON-/YAML-encoded metadata in front of the message, e.g.:
$log->info("\x00{subject: 'Progress of copying $foo to $bar', id: xasd8f7d}\x00Copying $foo to $bar. 0%");
for (1..100) {
$log->info("\x00{id: xasd8f7d}\x00$_%");
sleep 1;
}

The presence of \x00 signify that until the next \x00 there will be additional metadata in the form of YAML (or JSON). This way, my desktop notification and web-based forum output module knows that the 2nd-101st message is an update and thus can adjust accordingly.

It should, and I don't have to reinvent any framework or wrapper. But it feels so hackish and ugly.

Kamis, 25 Februari 2010

The problems with the older CPAN clients

Thank God for cpanminus. Now that I'm free from having to use them, allow me to rant, no, bitch about them.

1. Bad defaults. Some default values might make sense 10-15 years ago, but not so much nowadays. For example, I'd argue that "follow" should now be default. See #2.

2. Too developer-oriented. For example, I believe "notest" should be on by default. This is compounded by the fact that installing Perl modules is so damn-slow already. See #3.

3. Too slow. Startup takes around 10-30 seconds or more. Installing Moose usually takes minutes (but with cpanminus, it only takes about 1 minute with --notest on my PC). Autocomplete takes one to a couple of seconds.

4. Too interactive, too verbose. The older clients are getting better but not quiet enough, cpanminus is such a breathe of fresh air.

5. Too bloated (which is the reason why cpanminus was developed in the first place).

The older CPAN clients are an embarassment if we compare it to "apt-get", "yum", "urpmi", which are way faster, way quieter, way less interactive. There's no reason why a CPAN client cannot be like those. And fortunately cpanminus proves it.

The shorter path to deployment heaven

Now that there is cpanminus, I've modified the handy little module CPAN::AutoINC to prefer cpanminus over CPAN.

So now when users download and run my programs/scripts for the first time, instead of failing with the dreaded message:
Can't locate Foo/Bar.pm in @INC (@INC contains: [a dozen or more of paths....]).
BEGIN failed--compilation aborted at /some/path line 123

which users or even Perl novices have no clue on how to fix, the program will instead automatically download every necessary CPAN modules into the user's home directory and runs out-of-the-box on the first try!
$ download-bca
Fetching http://search.cpan.org/CPAN/authors/id/A/AD/ADAMK/File-HomeDir-0.89.tar.gz
Building File-HomeDir-0.89 for File::HomeDir...
File::HomeDir installed successfully.
Fetching http://search.cpan.org/CPAN/authors/id/D/DR/DROLSKY/File-Slurp-9999.13.tar.gz
Building File-Slurp-9999.13 for File::Slurp...
File::Slurp installed successfully.
Fetching http://search.cpan.org/CPAN/authors/id/S/SH/SHARYANTO/Finance-Bank-ID-BCA-0.07.tar.gz
==> Found dependencies: Pod::Coverage, DateTime, Test::Pod::Coverage, Log::Any, Any::Moose, Mouse
Fetching http://search.cpan.org/CPAN/authors/id/R/RC/RCLAMP/Pod-Coverage-0.20.tar.gz
==> Found dependencies: Devel::Symdump
Fetching http://search.cpan.org/CPAN/authors/id/A/AN/ANDK/Devel-Symdump-2.08.tar.gz
Building Devel-Symdump-2.08 for Devel::Symdump...
Devel::Symdump installed successfully.
Building Pod-Coverage-0.20 for Pod::Coverage...
Pod::Coverage installed successfully.
...
...
...
Fetching http://search.cpan.org/CPAN/authors/id/S/SP/SPADKINS/App-Options-1.07.tar.gz
Building App-Options-1.07 for App::Options...
App::Options installed successfully.
...
(the program finally runs)

As the developer, you only need to put "use CPAN::AutoINC" at the top of your main script. How's that for deployment heaven? Zero installation, zero configuration (thanks to cpanminus), zero dependency outside of Perl core modules.

But there's a catch. The user (or the user's sysadmin) still needs to install CPAN::AutoINC and needs to install and bootstrap local::lib first. So it's not really an out-of-the-box experience yet.

miyagawa++ is already planning to automatically bootstrap local::lib from cpanminus, and the logic behind CPAN::AutoINC is just a couple of dozen lines that can be embedded easily into the script (this process can be made automatic with distribution building tools like Dist::Zilla plugin).

Since cpanminus has zero dependencies, we can simply include it too in our application.

And maybe in the future there can be GUI interfaces for cpanm, so it can display a nice dialog to ask confirmation via the desktop.

Be prepared to be able to provide a much nicer experience for your users.

Rabu, 24 Februari 2010

Log::Any::App

This is a draft/RFC document.

Log::Any is great if you're writing modules. You only need to say:
use Log::Any qw($log);

and then you're off producing logs with:
$log->debug(...);
$log->warn(...);
# etc

But if you're writing scripts/applications (and thus need to "consume" or display the logs as well), it becomes a bit of a hassle. For example, if you want to display logs to the screen with Log::Dispatch, this is the incantation you need:
use Log::Any qw($log);
use Log::Any::Adapter;
use Log::Dispatch;
Log::Any::Adapter->set('Dispatch');
my $disp = Log::Dispatch->new(outputs => ["Screen", min_level=>"debug", newline=>1]);
Log::Any::Adapter->set("Dispatch", dispatcher=>$disp);
...
$log->warn(...);

which I'm sure I'll never ever remember and will just copy paste everytime.

The goal is to be able to write only this in your scripts:
use Log::Any::App qw($log);

and Log::Any::App (which currently does not exist) will take care of all the rest:

  • Choose the best available adapter(s);
  • Configure the adapter(s) with the best defaults, e.g. to screen, as well as to file /var/log/SCRIPTNAME.log (if running as root) or ~/SCRIPTNAME.log (if running as user). The defaults can of course be changed via configuration;
  • Pick configuration from various sources, like environment variables (e.g. turning level to debug if DEBUG is set to true), command line options (e.g. log level from --log_level/--log-level/--debug/--verbose/etc, as well as detecting result from Getopt::Long or App::Options so we can avoid parsing by ourselves)


I think the main challenge is arranging a set of defaults that are acceptable and comfortable for a lot of people, and working together nicely with available modules like adapter modules (Log::Any::Adapter::Dispatch, Log::Any::Adapter::Log4perl), command line parsing modules, configuration modules, etc.

And later should you refactor your script into modules, the logging part can be left untouched as they already use the Log::Any framework.

Modules to look at: Log::Dispatchouli.

Rabu, 17 Februari 2010

A small plea to non-English bloggers

I absolutely welcome and cheer the non-English posts in the Iron Man feed (as they are also welcome by the rules). I even try to read one or two whenever I can.

But, at the risk of sounding like a party pooper, guys, could you please not give your post an English title/subject when the content is not in English? I'm sure other people will appreciate it as it saves them time when they can't read your language anyway.

Thank you very very much, and keep blogging!

Rabu, 10 Februari 2010

The (lack of) readability of Perl

Can't believe it's almost 8 years since I wrote this Indonesian article about "8 things that make Perl relatively unreadable".

To summarize, I think the readability issue in Perl largely boils down to the high usage of symbols/non-alphanumeric characters (e.g. regex and special variables). This admittedly won't change for years to come, as Perl 6 also offers us lots of new operators, even allows us to define new ones, using the full range of the Unicode charset! So we will almost certainly be hearing and need to be fighting against this meme for a long long time.

I also believe that most of the complaints about Perl being unreadable come from non-Perl users. When they read some code, they expect it to be able to read/guess it based on the prior knowledge of some other programming language. For example, even before I started learning French or German, I could guess the general meaning of some French and German (or some other European language) phrases based solely on my knowledge of English. This does not happen with, say, Chinese or Math or regular expression, as the symbols are just gibberish to the non-initiated.

Perhaps marketing that Perl is one of the hardest languages to master will attract more people to learning it, as that's what's happening with Cantonese and some westerners.

Setelah 7 tahun...

Tahun 2003 saya menulis artikel ini: 8 Hal Yang Membuat Perl Relatif Unreadable. Wah, 1 tahun lagi sudah 8 tahun deh, secepat itukah waktu berlalu?

Eniwei, setelah lewat 7 tahun, tentu saja ada beberapa hal/poin di artikel tersebut yang sudah tidak berlaku lagi. Bukti bahwa Perl tetap hidup karena tetap berubah dan berevolusi.

1. Moose dan sistem OO di Perl6. Kini pengguna Perl tidak perlu minder lagi, karena Perl kini sudah punya sistem objek yang modern dan "tidak mengerikan lagi", bahkan berani deh beradu superior dengan Ruby, Python, Javascript, dll.

2. Beberapa perbaikan readability dapat kita jumpai di Perl 6, antara lain penggunaan prefiks variabel yang lebih konsisten, Untuk mengakses elemen array dan hash kini menggunakan @array[n] dan %array{s}, tidak lagi $array[n] dan $array{s}, bahkan sebetulnya Anda bisa "hidup" hanya dengan mengenal satu prefiks saja, $, karena array dan hash dan struktur kompleks semuanya dapat ditampung oleh $.

Terdapat pula grammar untuk mengizinkan kita memodularkan pola regex yang kompleks sehingga menjadi jauh lebih readable.

Tapi... bukan Perl namanya kalau tidak mengizinkan kita meringkas-ringkas program menggunakan banyak simbol. Tak kurang dari puluhan operator baru, belum lagi metaoperator, dan juga berbagai idiom baru, hadir di Perl 6. Bahkan seluruh karakter Unicode dapat dipakai!

Apakah ini membuat Perl tetap/makin unreadable? Mengulangi kata-kata di artikel 2003 saya, readability sesuatu yang subjektif. Apakah matematik yang penuh simbol juga unreadable? Tentu saja iya, bagi mereka yang buta/awam matematik, tapi tentu tidak bagi matematikawan.

Rabu, 03 Februari 2010

App::Options

Over the course of many years, I have written lots of lots of short command-line scripts in Perl (Perl's great for that, you know). Most of these scripts are small utilities, or replacement for shell scripts, or automation tools.

All of these scripts invariably need some ability to take command-line options. Of course, back in the days, I used Getopt::Std and Getopt::Long. They both did their task well, but also invariably I will need to provide -h or --help for usage information. I always hate having to write this usage text manually.

Also, if you use your scripts often enough, you will end up with the same incantation for some, that you will want to put the command line options to a config file to avoid repeating yourself. Passwords are also not appropriate in command-line options due to security issues.

So I now use App::Options and have never looked back. You can look at it as a pretty straightforward replacement for Getopt::Long, but it gives you automatic --help and --version (--program_version actually, but --version also does something else wonderful). And it automatically enables you to read config files. I emphasize "automatically" because you absolutely do not have to do a single thing, as App::Options gives you some nice defaults on where to find the config files.

How great is that? Just write your code as if you're using Getopt::Long, but gain all these extra abilities for free!

There are other solutions for command-line scripts, like App::Cmd, but really App::Options is the easiest way especially for old-timers like me who do not want to change their Getopt::Long-style habit.

App::Options is not perfect though. There are a couple of small annoyances I have about it, but I'm not hung up on them. The first is that "=" in command-line option is required, i.e. you have to write "--opt=args" instead of "--opt args". Second is, --version defaults to displaying Perl modules version instead of your program's version.

For larger apps, I also sometimes need hierarchical/multilevel configuration. It'd also be nice to have YAML support. These are something I hope to accomplish with Config::Tree, but I guess I still need to work on it quite a bit before I can replace my usage of App::Options with it.

Rabu, 27 Januari 2010

Four months after...

Four months ago I joined the Iron Man challenge. So what have I done and what did I get out of this?

What I have done: wrote several new Perl modules. I didn't feel like I have enough things to say (that's why I set up the blog as "Perl Indonesia" and invited others to join instead of "Steven Haryanto's Perl Adventure"), and I still don't, really. So I write more code instead, and post about them.

The modules are typically small, ones which I can finish in one or two sittings. The ideas for these modules usually have been floating in my head for some time, but since they are trivial enough I have never gotten around to write them. The blogging challenge changed that.

Some of the other modules actually come from code I have written, in one form or another, but have not been properly packaged. Dist::Zilla changed that, as it lowers the cost of maintaining and releasing Perl distributions. Thanks, RJBS!

What I get: joy and satisfaction that I am actively doing something for Perl.

All in all, Ironman is a great idea and a good cause. I encourage everybody to keep blogging and writing.

Rabu, 20 Januari 2010

Lingua::ZH::PinyinConvert::ID

Today I released Lingua::ZH::PinyinConvert::ID. This release is dedicated to the recently deceased former President of Indonesia, Abdurrahman Wahid, which in 2001 declared Chinese new year as national holiday as well as re-allowed the use of Chinese characters and the expression of Chinese culture in daily life, thus ending the 30 years of anti-Chinese laws during the Suharto regime.

Sadly, a generation of Indonesian Chinese people grew up while being cut off from their heritage, not learning Chinese languages nor a lot of the Chinese culture. This will probably be fixed in the next couple of generations as many elementary schools now start to include Mandarin in their curriculum.

Selasa, 12 Januari 2010

Log::Any

A few weeks ago I found about Log::Any, and have since migrated many of modules to it (usually from Log::Log4perl).

The two main advantages of Log::Any for me:
  • your users don't need to configure anything if they don't want logging. When my module, say Foo, uses Log4perl to produce logs, then when you use Foo, Log4perl will emit a warning: Log4perl: Seems like no initialization happened. Forgot to call init()?. To remove this warning you need to initialize Log4perl, e.g. using use Log::Log4perl qw(:easy); Log::Log4perl->easy_init($FATAL); This is annoying if you happen to not care about logging.

    With Log::Any, the default is null logging.

  • printf-style logging. For example, $log->debugf("data = %s", $data); It can also handle references/nested data structures, so you don't have to resort to something like $log->debug(sub { "data = " . Dumper($big_data) });. In fact, 99% of the time I use sub{} is precisely because I want to avoid the cost of dumping.


Of course, there are some features in Log4perl that is missing in Log::Any, like logdie() and the TRACE level. But it is a small price to pay. You gain other benefits, the most important of which is compatibility with other logging frameworks. I'm sure some people prefer something else to Log4perl. Log::Any currently supports Log4perl as well as Log::Dispatch, and possibly others too in the future.

Thanks Jonathan for Log::Any!

Selasa, 05 Januari 2010

The Help MySQL petition

As much as I love(d) MySQL and am still using it a lot (mostly for PHP web applications that are married to it), there seems to be too much politics surrounding it these days.

From helpmysql.org (emphasis mine):
If those IPRs fall into the hands of MySQL's primary competitor, then MySQL immediately ceases to be an alternative to Oracle's own high-priced products. So far, customers had the choice to use MySQL in new projects instead of Oracle's products. Some large companies even migrated (switched) from Oracle to MySQL for existing software solutions. And every one could credibly threaten Oracle's salespeople with using MySQL unless a major discount was granted. If Oracle owns MySQL, it will only laugh when customers try this. Getting rid of this problem is easily worth one billion dollars a year to Oracle, if not more.

Is Oracle really MySQL's primary competitor? I thought they represent two very distinct segments?

Also, if Oracle owns MySQL, why can't I still threaten Oracle sales reps to use MySQL to get a deep discount on Oracle DB? Because Oracle will threaten to kill MySQL, or sue every other company that provides paid support for MySQL, or deliberately delay fixing critical bugs in MySQL? I'm really not convinced with this argument. In any case MySQL would still be a cheaper substitute for Oracle. And I were an Oracle client wanting to get a discount (and not laughs), I think I would rather threaten to switch to SQL Server or Postgres instead.

Also, online petition immediately conjures up the image of a teenager trying to save his/her favorite cartoon TV show that is being cancelled.

Also, let's never forget the bigger "politics" before this, regarding the position of MySQL AB on the usefulness of things like transactions or foreign key constraints, depending on whether its product has support for them.

Let's take one particular example with foreign key constraints. It shows that you really can't trust the opinion of people with ulterior motives. Here's a snippet from the old MySQL 3.23.x manual, when MySQL has no support for foreign key checking (emphasis mine):
5.4.5.1 Reasons NOT to Use Foreign Keys constraints

There are so many problems with foreign key constraints that we don't know where to start:
  • Foreign key constraints make life very complicated, because the foreign key definitions must be stored in a database and implementing them would destroy the whole ``nice approach'' of using files that can be moved, copied, and removed.
  • The speed impact is terrible for INSERT and UPDATE statements, and in this case almost all FOREIGN KEY constraint checks are useless because you usually insert records in the right tables in the right order, anyway.
  • There is also a need to hold locks on many more tables when updating one table, because the side effects can cascade through the entire database. It's MUCH faster to delete records from one table first and subsequently delete them from the other tables.
  • You can no longer restore a table by doing a full delete from the table and then restoring all records (from a new source or from a backup).
  • If you use foreign key constraints you can't dump and restore tables unless you do so in a very specific order.
  • It's very easy to do ``allowed'' circular definitions that make the tables impossible to re-create each table with a single create statement, even if the definition works and is usable.
  • It's very easy to overlook FOREIGN KEY ... ON DELETE rules when one codes an application. It's not unusual that one loses a lot of important information just because a wrong or misused ON DELETE rule.

The only nice aspect of FOREIGN KEY is that it gives ODBC and some other client programs the ability to see how a table is connected and to use this to show connection diagrams and to help in building applicatons.

And here's a snippet from MySQL 5.1 manual, when, through InnoDB, MySQL now has support for foreign keys. Notice the complete change of heart (emphasis mine):
1.8.5.4. Foreign Keys

The InnoDB storage engine supports checking of foreign key constraints, including CASCADE, ON DELETE, and ON UPDATE. See Section 13.6.4.4, “FOREIGN KEY Constraints”.

[...]

Foreign key enforcement offers several benefits to database developers:
  • Assuming proper design of the relationships, foreign key constraints make it more difficult for a programmer to introduce an inconsistency into the database.
  • Centralized checking of constraints by the database server makes it unnecessary to perform these checks on the application side. This eliminates the possibility that different applications may not all check the constraints in the same way.
  • Using cascading updates and deletes can simplify the application code.
  • Properly designed foreign key rules aid in documenting relationships between tables.

Do keep in mind that these benefits come at the cost of additional overhead for the database server to perform the necessary checks. Additional checking by the server affects performance, which for some applications may be sufficiently undesirable as to be avoided if possible. (Some major commercial applications have coded the foreign key logic at the application level for this reason.)

[...]

Be aware that the use of foreign keys can sometimes lead to problems:

[...]

This either means MySQL AB deliberately added misleading opinion about foreign key constraints, or MySQL AB grew up and saw the benefits of foreign key constraints during the later days of 3.23/4.x. Either way doesn't bode well on MySQL.

But anyway, I guess all of these so-called "politics" exist in any product advocacy. No product can support all possible features, so unsupported features sometimes get downplayed, either deliberately or innocently. Since Perl subscribes to the multiparadigm and TIMTOWTDI thinking, we suffer less from these. But haven't we all heard more than a handful of otherwise brilliant Perl programmers casting aside things like block indentation or even OOP as overrated, just because other languages support these features (better) than us?

Parsing with Perl

When it comes to text processing and manipulation, I suspect that the majority of Perl programmers depend on their regex skill to do the job. So much that it becomes their hammer. At least it is mine, because I whip up regexes for almost anything, from web scraping to converting dictionaries.

Which is a pity because there are other useful techniques, like parsing.

Admittedly, parsing is a bit harder (to do correctly), and libraries for parsing most formats out there, from XML to YAML, from CSV to PPI, already exist anyway. So ironically it's even harder to find useful, practical applications for parsing that are easy to implement. Most parser tools seem to invariably give the tired, if not overused, calculator example (i.e., parsing simple mathematical expression like '3 * (2 + 4)').

In the recent years, I've also have only encountered one instance where I want to do some parsing. Two weeks ago I thought it would be nice for my Data::Schema module to provide some shortcuts.

'int*' would be translated to [int => {set=>1}]

'int[]' to [array => {of => "int"}],

'(int*)[]' to [array => {of => [int => {set=>1}]}]

'(int[])*' to [array => {set => 1, of => "int"}]

'(int|int[])[]' to [array => {of => {either => {of => ["int", [array => {of => "int"}]]}}}]

and so on. (These shortcuts, along with some other goodies, will be in the next release of Data::Schema.)

I ended up using Regexp::Grammars for this, since I'm already on Perl 5.10. Regexp::Grammars is basically just a syntactic sugar and helper on top of Perl 5.10's regex which is already capable of doing recursive matching. Parse::RecDescent is also equally easy to use, but I prefer the simpler interface of Regexp::Grammars. I quickly gave up on Parse::Yapp and Parse::Eyapp though. Not that they're no good, it's just that they are too much trouble for my particularly simple need.

Anyway, parsing is fun, as long as--like anything else in life--it doesn't get too complicated :-) I just need to find other small use cases where I can do some more parsing...