Perl Indonesia: perl

Tampilkan postingan dengan label perl. Tampilkan semua postingan

Rabu, 27 Februari 2013

Datang ke YAPC::NA 2013 (Austin, 3-5 Jun)

Hi guys,

Gw bakal datang ke YAPC::NA 2013 di Austin, Texas tanggal 3-5 Juni. Kalo ada yang juga bakal bertandang ke sana, kontak-kontak ya ...

Jumat, 30 Maret 2012

Berita: perubahan di blog ini

Untuk menghindari duplikasi (termasuk duplikasi usaha saya dalam memposting ke dua blog), maka mulai sekarang posting Perl saya yang berbahasa Indonesia *hanya* akan dikirimkan ke blog Perl Indonesia ini, sementara posting berbahasa Inggris *hanya* akan dikirimkan ke blog pribadi saya di blogs.perl.org.

Sekian sekilas info.

Rabu, 19 Oktober 2011

Menginstal Finance::Bank::ID::Mandiri untuk mengunduh transaksi Mandiri

Atas permintaan user, berikut ini panduan cara menggunakan modul Perl Finance::Bank::ID::Mandiri untuk mengunduh transaksi rekening bank Mandiri Anda. Panduan ini berasumsi Anda menggunakan Linux (Debian atau Ubuntu) dengan Perl 5.10 ke atas. Jika Anda menggunakan Windows, atau Linux dengan Perl di bawah 5.10 (mis: CentOS 5.x) harap menyesuaikan sendiri (atau, jika ada yang mau membuatkan tutorialnya, silakan hubungi saya).

Modul serupa untuk BCA Finance::Bank::ID::BCA juga tersedia, cara menggunakannya mirip.

Prasyarat:

Komputer dengan koneksi Internet bersistem operasi Linux (Debian/Ubuntu) dan Perl 5.10 ke atas
Program curl (untuk mendownload cpanminus)
Akses root (aplikasi bisa juga diinstal tanpa akses root, tapi agar mudahnya kita pakai root)
Rekening bank Mandiri dengan akses internet banking aktif (ada username dan password).

Langkah-langkah:

Instal modul-modul Perl yang dibutuhkan. Agar mudahnya, kita menggunakan cpanminus untuk menginstal modul-modul Perl. Jika Anda belum menginstal cpanminus, silakan instal dulu sbb:

Buka konsol, lalu ketik:
```
$ curl -L http://cpanmin.us | perl - --sudo App::cpanminus
```
Setelah itu, kita menginstal modul Mandiri dengan cpanminus:
```
$ sudo cpanm -n Finance::Bank::ID::Mandiri
```
Mengkonfigurasi program. Setelah selesai, Anda akan mendapatkan perintah download-mandiri. Konfigurasi perintah ini dengan membuat file konfigurasi:
```
$ mkdir ~/.app
$ (buat/edit file download-mandiri.conf)
```
Isi file konfigurasi adalah sbb:
```
[ALL]
username = (username akun Mandiri Anda)
password = (password akun Mandiri Anda)
```
Setelah itu tinggal jalankan perintah download-mandiri dari konsol. Defaultnya program akan mengunduh transaksi dalam format YAML selama sebulan terakhir. Bisa juga dioutput format JSON dan kustomisasi tanggal. Tambahkan opsi --debug jika ingin melihat pesan debugging. Anda juga bisa menjalankan skrip ini lewat cron agar berjalan otomatis secara periodik (mis: seminggu sekali atau sehari sekali).

Jika mengalami masalah, silakan reply posting blog ini.

Rabu, 27 Juli 2011

App::UniqFiles (a case for building app with Dist::Zilla and Sub::Spec)

When watching videos at Tudou or Youku, both Chinese YouTube-like video sites, you'll often get one/two 15- or 30-second video ads at the beginning. Since I download lots of videos recently, my Opera browser cache contains a bunch of these video ads files, each usually ranging from around 500k to a little over 1MB. But there are also duplicates.

I thought I'd collect these ads, for learning Chinese, but I don't want the duplicates, only one file per different ad. The result: App::UniqFiles, which contains a command-line script called uniq-files. Now all I need to do is just type mkdir .save; mv `uniq-files *` .save/ and delete the duplicate videos, which are files not moved to .save/.

With the help of Dist::Zilla, Sub::Spec::CmdLine, Pod::Weaver::Plugin::SubSpec, and Log::Any::App, I managed to finish App::UniqFiles, from scribbling down the concept to uploading the first release to CPAN and github, in just about under an hour (00:54 to be exact). Not super-speedy for a small script (I can probably write a one-off script version in 15-30 minutes), but for an extra 30 minutes, I get:

a proper Perl distribution, with tests and POD and all;
all the core functionality contained in subroutines (which is much more reusable than a script);
a POD API documentation for the subroutines;
a command-line application with --help message, argument parsing, configurable log levels, even bash completion with just 3 lines of code.

I think developing with Dist::Zilla and Sub::Spec is great, mainly because they realize the DRY ("Don't Repeat Yourself") principle and free you from mundane tasks. Having to repeat the same stuffs or do mindless/tedious tasks is indeed a significant source of frustation for programmers. It deflects us from the real, important task: writing the code to actually solve our problems.

Dist::Zilla allows you to generate dist's README from the main module's POD instead of you having to create this file manually. It inserts LICENSE, AUTHORS, VERSION sections to your POD instead of you having to insert and update them manually. It frees you from the mundane tasks like creating dist tarballs, checking ChangeLog, incrementing version numbers, uploading to CPAN, etc. Really, I wouldn't want to build dists manually ever again without tools like Dist::Zilla.

Sub::Spec allows you to specify rich metadata for your sub in one place, from which you can generate Getopt::Long options, POD documentation, command-line --help message, etc from it, instead of you having to maintain each of them manually. Module like Sub::Spec::CmdLine also frees you from many mundane UI issues (which, coincidentally, I hate) like parsing arguments and formatting output data to screen.

Senin, 25 Juli 2011

Undocumented Getopt::Long::Configure feature

Getopt::Long has a Configure() function to let you customize its parsing behaviour, e.g. whether or not to be case-sensitive, whether or not unknown options are passed unmodified or generate an error, etc. However, this customization is global: it affects every piece of code using Getopt::Long.

Since I use Getopt::Long in a utility module, which might conflict with the module user using Getopt::Long along with my module, I need to localize my Configure() effect. I was about to submit an RT wishlist ticket pertaining to this, but some quick checking revealed that Configure() already has this feature.

Configure() returns an arrayref containing all the current options. If you pass this arrayref to it, it will set all the options. This way, you can save and restore options.

Thanks to the Getopt::Long author, Johan Vromans, who apparently has maintained this module since 1990!

Kamis, 16 Juni 2011

Using Org format to document code

My most recent hacktivity includes preparing Org::Export::Pod and Org::Export::Text (both not yet ready) following Org::Export::HTML. I am planning to document source code (currently just for functions) using Org as the master format instead of POD. From Org, I'll be exporting to various target formats, including POD itself, inserted to modules' source code in the build process using a simple Dist::Zilla plugin.

Now why Org? First and foremost, obviously because I use Emacs, and the last few months I've migrated practically all of my notes/todolists/addressbooks to this format. Also, it's visually nicer to look at than POD when it comes to things like headings and lists. Org also supports tables (I understand that there's an extension to POD that supports tables too, but I imagine it will not be as easy to write?). BTW, among other lightweight markup languages, Markdown Extra also supports tables with an equally nice syntax.

A couple of concerns for Org. First, writing literal examples is a bit more cumbersome. Where in Pod or Markdown or most Wiki format you only need to indent to go verbatim, in Org you need to enclose with #+BEGIN_SRC ... #+END_SRC or prepend each line with ": ". But I've come to accept it.

Second is parser support in other languages. Since I envision ultimately my function specs is to be processed by other languages too, it would be nice if there are support for the document parser in these languages, including for Javascript and PHP. In this regard, Markdown seems to be a win.

But hey, Org is still readable as-is, and currently nothing beats Org-mode for writing notes. So Org FTW!

Rabu, 30 Maret 2011

Bench: a simpler benchmark module

There was a post in blogs.perl.org or Planet Perl Iron Man (sorry, forgot the exact article) that said something along the line of: "Benchmark is a fine module, but for simplicity I'll use the time command". Which immediately hit home with me, because I too very seldomly use Benchmark. I guess the problem is I almost always have to perldoc it before using it, and there are quite some extra characters to type.

So last weekend I wrote Bench (repo) that's hopefully simpler enough to get used more.

To benchmark your program, just type: perl -MBench yourscript.pl. Sample output:

$ perl -Ilib -MBench -MMoose -e1
0.229s

Bench exports a single function, bench(), by default. To time a single sub, use: perl -MBench -e'bench sub { ... }'. By default it will call your sub at most 100 times or 1 second. Here's a sample output:

100 calls (12120/s), 0.0083s (0.0825ms/call)

To benchmark several subs: perl -MBench -e'bench {a=>sub{...}, b=>sub{...}}' or perl -MBench -e'bench [sub{...}, sub{...}]'. Sample output:

a: 100 calls (12120/s), 0.0083s (0.0825ms/call)
b: 100 calls (5357/s), 0.0187s (0.187ms/call)

Bench will automatically use Dumbbench if it's already loaded, e.g.: perl -MDumbbench -MBench -e'...'. Or you can force Bench to use Dumbbench: perl -MBench -e'bench sub { ... }, {dumbbench=>1}'.

That's about it currently.

Kamis, 17 Maret 2011

Org::Parser

If you're like me, over the years you'll have had your todo lists scattered over multiple programs and places. First a simple text file with homebrewn format, then various Windows programs, then various Linux GUI programs, then back to Notepad and joe/gedit/kate, then various apps on cellphones, then pencil & paper (due to cellphones keep getting lost/stolen), then some cloud apps, then todo.txt, then finally org-mode. And if you're anything like me or many others, you'll find that org-mode is *it*.

I'm now in the (long, boring) process of consolidating everything in Org. For todo lists, contact lists, and even long documents and all journals/diaries. I've written a preliminary version of Org::Parser to help automate stuffs via the command line. It only supports the basic stuffs at the moment but has been able to parse all my *.org files.

The code is available on GitHub.

Senin, 14 Februari 2011

Backup data dengan dengan Perl, rsync, dan git

Saat ini, saya menyimpan data pribadi di 2 direktori utama: ~/repos dan ~/media. Semua file-file teks (termasuk source code, website, catatan/tulisan, konfigurasi, agenda .org Emacs) ditaruh di bawah ~/repos di dalam repo-repo git, per proyek (Contoh: ada ~/repos/settings, ~/repos/writings, ~/repos/perl-Git-Bunch, dsb). Semua file lain yang berupa file media besar-besar ditaruh di ~/media.

Untuk membackup data di ~/media, saya menggunakan File::RsyBak, yang menyediakan skrip command-line rsybak. Skrip ini pada dasarnya hanyalah wrapper untuk perintah rsync dan membuat snapshot-snapshot backup sesuai jangka waktu histori yang diinginkan (defaultnya: 7 harian + 4 mingguan + 3 bulanan). Skrip ini dijalankan tiap hari lewat cron dan backupnya disimpan di harddisk terpisah /backup.

Untuk membackup data di ~/repos, saya menggunakan Git::Bunch, yang menyediakan skrip command-line gitbunch. Pada dasarnya, gitbunch membackup menggunakan rsync juga, tapi tanpa histori (karena git sudah menyimpan sejarah perubahan). Selain itu, yang dibackup juga hanya subdirektori .git/ dari tiap repo. Ini mengirit ruang disk, karena ~/repos masih sering saya kopi ke flashdisk yang kapasitasnya terbatas. Untuk merestore dari backup, kita tinggal melakukan "git checkout" dari hasil backup .git/ tiap repo ini.

Skrip gitbunch juga dapat melakukan sinkronisasi dari satu direktori ~/repos ke direktori ~/repos lainnya. Pada intinya, "gitbunch sync" hanyalah wrapper untuk "git pull". Dengan cara ini, saya bisa mensinkronkan pekerjaan PC ke netbook atau sebaliknya dengan mudah.

Artikel yang lebih mendetil, pernah ditulis untuk majalah InfoLINUX: Manajemen data pribadi dengan git.

Bagaimana strategi backup Anda?

Senin, 07 Februari 2011

The coming bloated Perl apps?

A few weeks ago, I got annoyed by the fact that one of our command line applications was getting slower and slower to start up (the delay was getting more and more noticable), so I thought I'd do some refactoring, e.g. split large files into smaller ones and delay loading modules until necessary.

Sure enough, one of the main causes of the slow start up was preloading too many modules. Over the years I had been blindly sticking 'use' statements into our kitchen sink $Proj::Utils module, which was used by almost all scripts in the project. Loading $Proj::Utils alone pulled in over 60k lines from around 150 files!

After I split things up, it became clearer which modules are particularly heavy. This one stood out:

% time perl -MFile::ChangeNotify -e1
real   0m0.972s

% perl -MDevel::EndStats -e1
# Total number of module files loaded: 129
# Total number of modules lines loaded: 46385

So almost 130 files and a total of 45k+ lines just from loading File::ChangeNotify alone. 130 files just for a filesystem monitoring routine! Who would've thought that a filesystem monitor needs so many lines of program? Compare with, say, a recent HTTP client:

% perl -MHTTP::Tiny -e1
# Total number of module files loaded: 18
# Total number of modules lines loaded: 6089

I quickly switched to Linux::Inotify2 and things are much better now (but I might have to revisit this since we want to give the new Debian/kFreeBSD a Squeeze).

As I suspected (since the module is written by Dave Rolsky also), File::ChangeNotify utilizes Moose, which is not particularly lightweight either:

% time perl -MMoose -e1
real    0m0.712s

% perl -MDevel::EndStats -MMoose -e1
# Total number of module files loaded: 100
# Total number of modules lines loaded: 35760

Compare with:

% time perl -MMouse -e1
real    0m0.089s

% perl -MDevel::EndStats -MMouse -e1
# Total number of module files loaded: 20
# Total number of modules lines loaded: 6675

Come to think of it, running Dist::Zilla is also quite painfully slow these days. Just running "dzil foo" pulled in around 60k lines and took 1.7s! Of course, dzil is Moose-based.

While it is a good thing that Moose is getting more popular, it's a bit shameful to see that Ruby and Python scripts "get OO for free" while Moose scripts have to endure a 0.7s startup penalty. Mouse, Moo, Role::Basic come to the rescue but I wonder what would Ruby/Python programmers think (you have how many object systems?? Why do you people can never agree on one thing and TIMTOWTDI everything?)

Disclaimer: Number of lines includes all blanks/comment/POD/DATA/etc from all files loaded in %INC, actual SLOC is probably significantly less. Timing is done on a puny HP Mininote netbook (Atom N450 1.66GHz) which I'm currently stuck with in the past few weeks. With all due respects to all authors of modules mentioned. They all write fantastic, working code.

Jumat, 19 November 2010

Outputting pretty data structure on console programs

Our application has a command-line API interface for convenient access via shell/console. It used to output API result data in YAML:

# /c/sbin/spanel api --yaml File list --account steven --volume data --dir /public
--- 
dir: 
  atime: '1270675429'
  ctime: '1285916065'
  gid: 1023
  group: steven
  is_dir: 1
  mtime: '1285916065'
  perms: 493
  uid: 1012
  url: ~
  user: steven
entries: 
  - 
    atime: '1284665000'
    ctime: '1289609859'
    dir: /public
    gid: 1023
    group: steven
    icon_file: folder.gif
    inode: 1984908
    is_dir: 1
    is_link: 0
    mtime: '1289609859'
    name: git
    perms: 493
    reldir: ''
    size: 4096
    uid: 1012
    url: ~
    user: steven
  - 
    atime: '1270675424'
    ctime: '1285130140'
    dir: /public
    gid: 1023
    group: steven
    icon_file: text.gif
    inode: 1976727
    is_dir: 0
    is_link: 0
    mtime: '1155368486'
    name: .htaccess
    perms: 436
    reldir: ''
    size: 48
    uid: 1012
    url: ~
    user: steven
  - 
    atime: '1270675424'
    ctime: '1285130140'
    dir: /public
    gid: 1023
    group: steven
    icon_file: unknown.gif
    inode: 1976725
    is_dir: 0
    is_link: 0
    mtime: '1155368397'
    name: .htaccess~
    perms: 436
    reldir: ''
    size: 63
    uid: 1012
    url: ~
    user: steven
total_num_entries: 3
url: ~

YAML is relatively readable if you compare to JSON or (shudder) XML, but I soon grew tired of reading YAML for data that should be tabulated and better formatted for human consumption.

Thus, Data::Format::Pretty::Console. The idea is for me, a lazy programmer, to throw it data structure of various kind and it will display it nicely suitable for console viewing. The command line API interface now by default shows nicely formatted text for API result data (but still provides --yaml and --json option). Please bear with this blog post's misformatting and assume it's all pretty:

dir:
.---------------------.
| key    | value      |
+--------+------------+
| atime  | 1270675429 |
| ctime  | 1285916065 |
| gid    |       1023 |
| group  | steven     |
| is_dir |          1 |
| mtime  | 1285916065 |
| perms  |        493 |
| uid    |       1012 |
| url    |            |
| user   | steven     |
'--------+------------'

entries:
.----------------------------------------------------------------------------------------------------------------------------------------------------------------------.
| atime      | ctime      | dir     | gid  | group  | icon_file   | inode   | is_dir | is_link | mtime      | name       | perms | reldir | size | uid  | url | user   |
+------------+------------+---------+------+--------+-------------+---------+--------+---------+------------+------------+-------+--------+------+------+-----+--------+
| 1284665000 | 1289609859 | /public | 1023 | steven | folder.gif  | 1984908 |      1 |       0 | 1289609859 | git        |   493 |        | 4096 | 1012 |     | steven |
| 1270675424 | 1285130140 | /public | 1023 | steven | text.gif    | 1976727 |      0 |       0 | 1155368486 | .htaccess  |   436 |        |   48 | 1012 |     | steven |
| 1270675424 | 1285130140 | /public | 1023 | steven | unknown.gif | 1976725 |      0 |       0 | 1155368397 | .htaccess~ |   436 |        |   63 | 1012 |     | steven |
'------------+------------+---------+------+--------+-------------+---------+--------+---------+------------+------------+-------+--------+------+------+-----+--------'

total_num_entries:
3

url:

Rabu, 17 November 2010

Comparison of INI-format modules on CPAN

I'm not terribly happy with the state of Perl/CPAN support for the INI file format.

I have this requirement of modifying php.ini files programmatically from Perl like: set register_globals to On/Off, add/remove some extension (via the extension=foo lines), adding/removing some functions from the disabled_functions list, etc. So I would like to find a CPAN module that can just set/unset a parameter and leave formatting/comments alone as much as possible.

Turns out that among a dozen or so of INI modules on CPAN, a few of them do not do writes at all (e.g. Config::INI::Access or Config::Format::INI). And a few that do, write a la dump. That is, they just rewrite the whole INI file with the in-memory structure. All comments and formatting (even ordering, in some cases) are lost. Example: Config::INI::Writer and Tie::Cfg. And, last time I tried, I couldn't even install Config::IniHash from the CPAN client. Not good.

So I ended up with Config::IniFiles. And I needed to patch two features in before I could even read php.ini and write to it properly. This is an old module, which although still maintained, probably needs a rewrite or modernization. One reviewer in CPAN Ratings also wrote that this module fails in edge cases and the test suite is incomplete.

But, at least this module gets the work done. It tries to maintain comments, and even has a host of other features like delta, multiline, default section, etc. Most features seem to be rather exotic to me personally, but then none of the other INI modules on CPAN has the basic features that I needed.

(Short, grossly incomplete) comparison of Perl logging frameworks

After doing this post on comparison of Perl serialization modules, I intended to continue with other comparisons, and even thought on setting up a wiki or creating/maintaining a section on the Official Perl 5 Wiki, which already has a Recommended Modules section, although there is not much comparison being written for each recommendation. (Btw, I just noticed a change of domain for the Wiki, from perlfoundation.org to socialtext.net).

But of course other tasks soon took precedence, so until the Wiki idea is realized, I thought I'll just continue posting on the blog as usual.

There are **seriously** too many Perl logging frameworks out there. As the Log4perl FAQ mentions, "Writing a logging module is like a rite of passage for every Perl programmer, just like writing your own templating system."

So I'm not going to pretend like I've evaluated even half of logging modules that are on CPAN. Instead I'm just going to include a few worth mentioning.

Log::Dispatch and Log::Log4perl. Two of the arguably most popular Perl logging frameworks are Log::Dispatch and Log::Log4perl. They are like the Moose of logging frameworks: mature, feature rich, flexible, has a lot of support/extra/3rd party modules, but... "slow". I quote the slow part because first of all, speed is not an issue for the majority of applications. And second of all, they are not relatively slow at all compared to other modules until they actually log stuff to output. For example, doing debug() on a warn level is around 1,5mils/sec with with Log4perl, and 3mils/sec with Log::Fast. But for actual logging, Log::Fast can be 10-45 times faster than these two.

Log::Any. For most people, Log::Dispatch and Log4perl should suffice. I personally haven't been unable to produce a case where I can't customize Log4perl they way I want. This shows the flexibility of the module. So the only thing left for flexibility is a thin wrapper where you might want to switch logging framework (kind of like Any::Moose for logging). There are a few of these on CPAN, but I prefer Log::Any (and I've also made a thin wrapper for *that*, Log::Any::App). RJBS also made one: Log::Dispatchoulli. You might be interested in using it if you are interested in using String::Flogger.

Performance-wise, as with Moose, there are other alternatives: Log::Fast, for one. There are also a few other minimalistic frameworks, but I do not recommend using them as many of them are not flexible at all. Unless your application is really performance-critical.

I've most probably left out a lot of possibly interesting alternatives.

Jumat, 01 Oktober 2010

Sometimes you don't want circular checking

I use the nifty Data::Rmap to "flatten" DateTime objects into strings so they can be exported to JSON and handled outside Perl. But due to circular checking in Data::Rmap, this:

$ perl -MData::Rmap=:all -MData::Dump \
-e'$d = DateTime->now; $doc = [$d, $d]; 
rmap_ref { $_ = $_->ymd if UNIVERSAL::isa($_, "DateTime") } $doc; 
dd $doc'

produces something like this:

["2010-10-01", ...unconverted DateTime object...]

For now I work around this by defeating Data::Rmap's circular checking, though I wonder if there's a better way.

$ perl -MData::Rmap=:all -MData::Dump \ 
-e'$d = DateTime->now; $doc = [$d, $d]; 
rmap_ref { $_[0]{seen} = {}; $_ = $_->ymd if UNIVERSAL::isa($_, "DateTime") } $doc; 
dd $doc'

will correctly produce:

["2010-10-01", "2010-10-01"]

Kamis, 30 September 2010

Yet another stupid mistake #1

During a refactor of a data from array @foo to hash %foo, I used 'each' to iterate over the hash, but forgot to change the 'for' statement with 'while'. So I ended up with something like:

$ perl -MData::Dump -E'%a=(a=>1, b=>2);
for (my ($k, $v) = each %a) { $_ = "$k x"; dd {k=>$k, v=>$v, "\$_"=>$_} }'

And this is nasty because for(@ary) aliases $_ to each element in @ary, and in this case it modifies $k (quiz #1: and $v too, do you know why?) right under your nose! Thus the result are really messed up:

{ "\$_" => "a x", "k" => "a x", "v" => 1 }
{ "\$_" => "a x x", "k" => "a x", "v" => "a x x" }

Not to mention the loop stops after processing two items (quiz #2: do you know why?) But you might not realize that after you add some pairs to %a and wondering why they don't get processed.

The error message Perl gives is not really helpful, to say the least :)

Kamis, 23 September 2010

Comparison of Perl serialization modules

A while ago I needed a Perl data serializer with some requirements (supports circular references and Regexp objects out of the box, consistent/canonical output due output will be hashed). Here's my rundown of currently available data serialization Perl modules. A few notes: the labels fast/slow is relative to each other and are not the result of extensive benchmarking.

Data::Dumper. The grand-daddy of Perl serialization module. Produces Perl code with adjustable indentation level (default is lots of indentation, so output is verbose). Slow. Available in core since the early days of Perl 5 (5.005 to be exact). To unserialize, we need to do eval(), which might not be good for security. Usually the first choice for many Perl programmers when it comes to serialization and arguably the most popular module for that purpose.

Storable. Fast. Produces compact, portable binary output. Also available in core distribution. Does not support Regexp objects out of the box (though adding support for that requires only a few lines). Binary format used to change several times in the past without backward compatibility in the newer version of the module, giving people major PITA. Supposedly stabilized now.

YAML::XS. Fast. Verbose YAML output (currently doesn't seem to have option to output inline YAML). My personal experience in the past is sometimes this module behaved weirdly and died with a cryptic error, but I guess currently it's pretty stable.

There are other YAML implementations like YAML::Syck (also pretty speedy) and the old Pure-Perl YAML.pm and partial implementation YAML::Tiny. The last two might not be a good choice for general serialization needs.

Data::Dump. Very slow. Produces nicely indented Perl output. The strength of this module is in pretty output and flexibility in customizing the formatting process. Based on Data::Dump I've hacked two other specialized modules: Data::Dump::PHP for producing PHP code, and Data::Dump::Partial to produce compact and partial Perl output for logging.

XML::Dumper. Produces *very* verbose (as is the case with all XML) XML output. Slow. Aside from the XML format, I don't think there's a reason why you should choose this over the others.

JSON::XS. Fast, outputs pretty compact but still readable code, but does not support circular references or Regexp objects.

JSYNC. Slow, outputs JSON and in addition supports circular references but not yet Regexp objects.

FreezeThaw. Slow, produces compact output but not as compact as Storable. Does not support Regexp objects out of the box.

Apart from these there are many other choices too, but I personally don't think any of them is interesting enough to be a favorite. For example, last time I checked PHP::Serialization (and all the other PHP-related modules) does not support circular references. There's also, for example, Data::Pond: cute concept but of little practical use as it is even more limited than JSON format.

There are also numerous alternatives to Data::Dumper/Data::Dump, producing Perl or Perl-like code or indented formatted output, but they are either: not unserializable back to data structures (so, they are more of a formatting module instead of serialization module) or focus on pretty printing instead of speed. In general I think most Data::Dumper-like modules are slow when it comes to serializing data.

In conclusion, choice is good but I have not found my perfect general serialization module yet. My two favorites are Storable and YAML::XS. If JSYNC is faster and supports Regexp, or if YAML::XS or YAML::Syck can output inline/compact YAML, that would be as near to perfect as I would like it.

Hope this comparison is useful. Corrections and additions welcome.

Perl vs PHP (a bit of credit to PHP)

Just read this blog post. Comments are disabled, so I thought I'd add a blog post.

There are endless ways we can sneer at PHP's deficiencies, but since 5.3 PHP already supports anonymous subroutines, via the function (args) { ... } syntax. So:

$longestLine = max(
    array_map(
        create_function('$a', 'return strlen($a);'), 
        explode("\n", $str)
    )
);

can be rewritten as:

$longestLine = max(
    array_map(
        function($a) { return strlen($a); }, 
        explode("\n", $str)
    )
);

though the example is not a good one since it might as well be:

$longestLine = max(
    array_map(
        'strlen',
        explode("\n", $str)
    )
);

Rabu, 01 September 2010

Book review: Catalyst 5.8 The Perl MVC Framework

Book information
Title: Catalyst 5.8 The Perl MVC Framework.
Subtitle: Build Scalable and extendable web applications using the Agile MVC framework.
Author: Antano Solar John.
Publisher: Packt Publishing.
Country: UK/India.
Year: 2010.

This book is a follow up to the 2007 Catalyst book by Jonathan Rockway (member of Catalyst core developer team). I have no idea how much of the content is changed between the two.

About the review(er)
This is a review on the electronic (PDF) edition of the book. I am a Perl developer and a CPAN author, but have not used Catalyst (or most other recent web frameworks, for that matter) before.

About Catalyst
So far I've managed to avoid learning about web frameworks and continue to create web applications the old way (CGI/CGI::Fast, direct DBI/SQL, a homemade simple templating language, and recently lots of jQuery and CSS play). Part of this is due to laziness, and part due to lack of need. I've never needed to create complex web applications in Perl. And the apparently heavy learning curve and complexities of Catalyst, Mojo, Dancer, etc just make me say don't bother.

But, thanks to this book, I find out that Catalyst project is not unlike a Perl CPAN module, with files/subdirectories like Makefile.PL, Changes, README, lib/, t/, etc. You can now even manage your project with Dist::Zilla (not mentioned in the book though as the plugin for this is new).

The good
This book is only about 200 (instead of 500+) pages long, which I appreciate. The preface is concise, and the explanation in the chapters are straightforward enough. The author uses clear and simple English sentences instead of long complex ones. The organization of topics into chapters is quite appropriate.

Missing topics
I didn't find any mention of Strawberry Perl, only ActivePerl. The examples are all using SQLite and no other databases. I wish AJAX and integration with one/more JavaScript frameworks like jQuery (and thus, CSS) is discussed more, as this is now very popular and common. But that will add significantly to the length of the book.

The first chapter on MVC also deserves some more extension.

There is no comparison whatsoever with any other Perl web frameworks or other non-Perl frameworks like Django and Rails.

I would've liked a chapter/subchapter on performance tuning and benchmarking (there is a 'Performance considerations' section in the Deployment chapter but that only covers the choice of webserver).

Plack/PSGI is not yet covered on this edition, which is a pity.

The rather bad
The author gives CPAN links to pages of specific release versions, e.g. http://search.cpan.org/~ash/DBIx-Class-0.08013/lib/DBIx/Class/Schema/
Versioned.pm which tends to break as new releases added and old releases removed from CPAN. ~~But this is understandable because currently CPAN only provides http://search.cpan.org/dist/DBIx-Class/ and not something like http://search.cpan.org/dist/DBIx-Class/current/pod/Foo/Bar.pm.~~ search.cpan.org does provide a more stable URL: http://search.cpan.org/dist/DBIx-Class/lib/DBIx/Class/Manual/FAQ.pod

The author also uses 2-space indent instead of 4, which I suspect is because he also uses Ruby/Rails.

The really ugly
The general editing of the book, and especially the code/output formatting, is the deal breaker here. I have not found another book that fares equally poorly in this regard.

The first paragraph of the preface already contains two very off-putting typos: "Frednic Brooks" (of Mythical Man-Month fame) and "MOOSE". Boxes drawn with ASCII characters which should align become wrapped and misaligned. When the long lines of code/output are wrapped, it is not clear which lines are wrapped and which are just new lines (some visual indicator should've added like a + or \ sign, line number, or striped background/lines).

There is a plain error in YAML syntax in p67, plain wrong MySQL configuration in p69.

Code formatting/editing is atrocious, with __PACKAGE__ sometimes becomes PACKAGE, or __Package__. Blank lines (which are significant for POD) are removed. And there is some garbage/random characters added in a few places. Totally unacceptable.

Verdict
Unfortunately I cannot recommend this book due to the utterly poor code formatting. I have no major problem with the content though.

Coding Style As A Failure Of Language Design?

Read this older blog post the other day. Hilarious at best, creepy at worst.

Arbitrary limitations should not be added to a general-purpose programming language unless for a really good reason. Do you really want to code in a language that forces you to indent with 2 spaces, never cross 80-column line, or require/forbid whitespace here and there? And besides, is there any language (no matter how strict the syntax of which is) which do not have some sort of coding style?

Jumat, 27 Agustus 2010

Wishlist for a service framework and/or manager

I maintain code for a few daemons/services written in Perl (most of them serve requests by forking/preforking). Reading post on Ubic, I started to feel that I am reinventing a lot of wheels. Currently I am doing these by writing my own code, as much as which I hope can be offloaded to CPAN in the future:

Autorestart when process size is becoming too big. We need to do this gracefully, meaning wait until there is no more clients being serviced, unless process size really gets too big in which case we need to restart immediately. Checking period can be configured.
Autorestart if script or modules change on disk. Also needs to be done gracefully. This is usually being recommended to be used only on development environment, but I use this too in production, for ease of deployment. But we need to check first (e.g. via "perl -c" or "eval + require" whether the new code from disk is okay.
Avoid duplicate instance (currently always using Proc::PID::File, but I'm open to better mechanism).
Limit clients concurrency. Sometimes this is simple (a single limit for all clients) and sometimes not so much (different limits for different IP/IP blocks/authenticated users/groups/etc).
Reap dead child processes and maintain a count of child processes.
Handle timed out clients. This is rather cumbersome with blocking I/O.
Write init script. This is the part I dislike the most, since there are tons of different OS flavors out there, and with more recent efforts like upstart, launchd, systemd, sooner or later I will certainly have to write different init scripts. I wish there is something equivalent to PSGI/Plack for general services, which can plug my code to whatever service manager might be out there.