Kamis, 30 September 2010

Yet another stupid mistake #1

During a refactor of a data from array @foo to hash %foo, I used 'each' to iterate over the hash, but forgot to change the 'for' statement with 'while'. So I ended up with something like:

$ perl -MData::Dump -E'%a=(a=>1, b=>2);
for (my ($k, $v) = each %a) { $_ = "$k x"; dd {k=>$k, v=>$v, "\$_"=>$_} }'

And this is nasty because for(@ary) aliases $_ to each element in @ary, and in this case it modifies $k (quiz #1: and $v too, do you know why?) right under your nose! Thus the result are really messed up:

{ "\$_" => "a x", "k" => "a x", "v" => 1 }
{ "\$_" => "a x x", "k" => "a x", "v" => "a x x" }

Not to mention the loop stops after processing two items (quiz #2: do you know why?) But you might not realize that after you add some pairs to %a and wondering why they don't get processed.

The error message Perl gives is not really helpful, to say the least :)

Kamis, 23 September 2010

Comparison of Perl serialization modules

A while ago I needed a Perl data serializer with some requirements (supports circular references and Regexp objects out of the box, consistent/canonical output due output will be hashed). Here's my rundown of currently available data serialization Perl modules. A few notes: the labels fast/slow is relative to each other and are not the result of extensive benchmarking.

Data::Dumper. The grand-daddy of Perl serialization module. Produces Perl code with adjustable indentation level (default is lots of indentation, so output is verbose). Slow. Available in core since the early days of Perl 5 (5.005 to be exact). To unserialize, we need to do eval(), which might not be good for security. Usually the first choice for many Perl programmers when it comes to serialization and arguably the most popular module for that purpose.

Storable. Fast. Produces compact, portable binary output. Also available in core distribution. Does not support Regexp objects out of the box (though adding support for that requires only a few lines). Binary format used to change several times in the past without backward compatibility in the newer version of the module, giving people major PITA. Supposedly stabilized now.

YAML::XS. Fast. Verbose YAML output (currently doesn't seem to have option to output inline YAML). My personal experience in the past is sometimes this module behaved weirdly and died with a cryptic error, but I guess currently it's pretty stable.

There are other YAML implementations like YAML::Syck (also pretty speedy) and the old Pure-Perl YAML.pm and partial implementation YAML::Tiny. The last two might not be a good choice for general serialization needs.

Data::Dump. Very slow. Produces nicely indented Perl output. The strength of this module is in pretty output and flexibility in customizing the formatting process. Based on Data::Dump I've hacked two other specialized modules: Data::Dump::PHP for producing PHP code, and Data::Dump::Partial to produce compact and partial Perl output for logging.

XML::Dumper. Produces *very* verbose (as is the case with all XML) XML output. Slow. Aside from the XML format, I don't think there's a reason why you should choose this over the others.

JSON::XS. Fast, outputs pretty compact but still readable code, but does not support circular references or Regexp objects.

JSYNC. Slow, outputs JSON and in addition supports circular references but not yet Regexp objects.

FreezeThaw. Slow, produces compact output but not as compact as Storable. Does not support Regexp objects out of the box.

Apart from these there are many other choices too, but I personally don't think any of them is interesting enough to be a favorite. For example, last time I checked PHP::Serialization (and all the other PHP-related modules) does not support circular references. There's also, for example, Data::Pond: cute concept but of little practical use as it is even more limited than JSON format.

There are also numerous alternatives to Data::Dumper/Data::Dump, producing Perl or Perl-like code or indented formatted output, but they are either: not unserializable back to data structures (so, they are more of a formatting module instead of serialization module) or focus on pretty printing instead of speed. In general I think most Data::Dumper-like modules are slow when it comes to serializing data.

In conclusion, choice is good but I have not found my perfect general serialization module yet. My two favorites are Storable and YAML::XS. If JSYNC is faster and supports Regexp, or if YAML::XS or YAML::Syck can output inline/compact YAML, that would be as near to perfect as I would like it.

Hope this comparison is useful. Corrections and additions welcome.

Perl vs PHP (a bit of credit to PHP)

Just read this blog post. Comments are disabled, so I thought I'd add a blog post.

There are endless ways we can sneer at PHP's deficiencies, but since 5.3 PHP already supports anonymous subroutines, via the function (args) { ... } syntax. So:

$longestLine = max(
create_function('$a', 'return strlen($a);'),
explode("\n", $str)

can be rewritten as:

$longestLine = max(
function($a) { return strlen($a); },
explode("\n", $str)

though the example is not a good one since it might as well be:

$longestLine = max(
explode("\n", $str)

Rabu, 01 September 2010

Book review: Catalyst 5.8 The Perl MVC Framework

Book information
Title: Catalyst 5.8 The Perl MVC Framework.
Subtitle: Build Scalable and extendable web applications using the Agile MVC framework.
Author: Antano Solar John.
Publisher: Packt Publishing.
Country: UK/India.
Year: 2010.

This book is a follow up to the 2007 Catalyst book by Jonathan Rockway (member of Catalyst core developer team). I have no idea how much of the content is changed between the two.

About the review(er)
This is a review on the electronic (PDF) edition of the book. I am a Perl developer and a CPAN author, but have not used Catalyst (or most other recent web frameworks, for that matter) before.

About Catalyst
So far I've managed to avoid learning about web frameworks and continue to create web applications the old way (CGI/CGI::Fast, direct DBI/SQL, a homemade simple templating language, and recently lots of jQuery and CSS play). Part of this is due to laziness, and part due to lack of need. I've never needed to create complex web applications in Perl. And the apparently heavy learning curve and complexities of Catalyst, Mojo, Dancer, etc just make me say don't bother.

But, thanks to this book, I find out that Catalyst project is not unlike a Perl CPAN module, with files/subdirectories like Makefile.PL, Changes, README, lib/, t/, etc. You can now even manage your project with Dist::Zilla (not mentioned in the book though as the plugin for this is new).

The good
This book is only about 200 (instead of 500+) pages long, which I appreciate. The preface is concise, and the explanation in the chapters are straightforward enough. The author uses clear and simple English sentences instead of long complex ones. The organization of topics into chapters is quite appropriate.

Missing topics
I didn't find any mention of Strawberry Perl, only ActivePerl. The examples are all using SQLite and no other databases. I wish AJAX and integration with one/more JavaScript frameworks like jQuery (and thus, CSS) is discussed more, as this is now very popular and common. But that will add significantly to the length of the book.

The first chapter on MVC also deserves some more extension.

There is no comparison whatsoever with any other Perl web frameworks or other non-Perl frameworks like Django and Rails.

I would've liked a chapter/subchapter on performance tuning and benchmarking (there is a 'Performance considerations' section in the Deployment chapter but that only covers the choice of webserver).

Plack/PSGI is not yet covered on this edition, which is a pity.

The rather bad
The author gives CPAN links to pages of specific release versions, e.g. http://search.cpan.org/~ash/DBIx-Class-0.08013/lib/DBIx/Class/Schema/
Versioned.pm which tends to break as new releases added and old releases removed from CPAN. But this is understandable because currently CPAN only provides http://search.cpan.org/dist/DBIx-Class/ and not something like http://search.cpan.org/dist/DBIx-Class/current/pod/Foo/Bar.pm. search.cpan.org does provide a more stable URL: http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual/FAQ.pod

The author also uses 2-space indent instead of 4, which I suspect is because he also uses Ruby/Rails.

The really ugly
The general editing of the book, and especially the code/output formatting, is the deal breaker here. I have not found another book that fares equally poorly in this regard.

The first paragraph of the preface already contains two very off-putting typos: "Frednic Brooks" (of Mythical Man-Month fame) and "MOOSE". Boxes drawn with ASCII characters which should align become wrapped and misaligned. When the long lines of code/output are wrapped, it is not clear which lines are wrapped and which are just new lines (some visual indicator should've added like a + or \ sign, line number, or striped background/lines).

There is a plain error in YAML syntax in p67, plain wrong MySQL configuration in p69.

Code formatting/editing is atrocious, with __PACKAGE__ sometimes becomes PACKAGE, or __Package__. Blank lines (which are significant for POD) are removed. And there is some garbage/random characters added in a few places. Totally unacceptable.

Unfortunately I cannot recommend this book due to the utterly poor code formatting. I have no major problem with the content though.

Coding Style As A Failure Of Language Design?

Read this older blog post the other day. Hilarious at best, creepy at worst.

Arbitrary limitations should not be added to a general-purpose programming language unless for a really good reason. Do you really want to code in a language that forces you to indent with 2 spaces, never cross 80-column line, or require/forbid whitespace here and there? And besides, is there any language (no matter how strict the syntax of which is) which do not have some sort of coding style?