Perl Indonesia: 2009

Rabu, 30 Desember 2009

CPAN CPAN on the disk, ...

Interestingly, there are some ID's up there that I've never ever heard of, ever. Either I'm the hobbit, or this list is not representative at all, or both. Maybe we should rank on total number of prereq'ed modules...

$ cd /minicpan && ( for dir in */*/*; do 
echo -e `ls $dir/*.tar.gz 2>/dev/null | perl -ne's/-[^A-Za-z_].+//; 
$m{$_}++; END{print 0+keys %m}'`\\t${dir#*/*/}; done ) | sort -rn | head -10
213     ADAMK
212     ZOFFIX
206     RJBS
169     MIYAGAWA
124     SMUELLER
122     NUFFIN
117     BINGOS
111     GUGOD
109     MARCEL
100     TOKUHIROM

Keywords: CPAN authors with the most modules, most distributions

Senin, 28 Desember 2009

Given/when expression

Given/when in Perl 5.10 is great (and will be even greater in Perl 6, for example we can remove many of the now-required parentheses). But it is very statement-oriented. Sometimes I miss CASE expressions a la SQL:

SELECT store_name, CASE store_name
  WHEN 'Los Angeles' THEN sales*2
  WHEN 'San Diego' THEN sales*1.5
  ELSE sales
  END AS new_sales
FROM store;

So, how do you do given/when expressions in Perl? I can think of a couple alternatives, but none is very appealing. Do you a have better one?

my @store = (
    {store_name=>'Los Angeles', sales=>100_000},
    {store_name=>'San Diego', sales=>100_000},
    {store_name=>'Anaheim', sales=>100_000}
);

1. do {} + if/elsif (con: needs do{}, needs explicit assignment to $_, no implicit smart matching)

for my $s (@store) {
    printf "%s %d %d\n", $s->{store_name}, $s->{sales}, do {
        local $_ = $s->{store_name};
        if (/Angel/) { $s->{sales}*2 }
        elsif ($_ eq 'San Diego') { $s->{sales}*1.5 }
        else { $s->{sales} }
    };
}

2. do {} + given/when (con: needs do{}, needs $tmp assignments)

for my $s (@store) {
    printf "%s %d %d\n", $s->{store_name}, $s->{sales}, do {
        my $tmp;
        given($s->{store_name}) {
            when (/Angel/) { $tmp = $s->{sales}*2 }
            when ('San Diego') { $tmp = $s->{sales}*1.5 }
            default { $tmp = $s->{sales} }
        }
        $tmp
    };
}

3. sub {} + given/when (con: needs sub{}, slower perhaps?, needs explicit return's)

for my $s (@store) {
    printf "%s %d %d\n", $s->{store_name}, $s->{sales}, sub {
        given ($s->{store_name}) {
            when (/Angel/) { return $s->{sales}*2 }
            when ('San Diego') { return $s->{sales}*1.5 }
            default { return $s->{sales} }
        }
    }->();
}

Minggu, 20 Desember 2009

Kalender adven Perl 6, RSS di HP

Blog apa yang paling menarik dibaca tahun 2009 ini? Tak diragukan lagi, Perl 6 Advent Calendar, http://perl6advent.wordpress.com/ . Selama sebulan menjelang Natal, Anda akan disuguhi artikel-artikel menarik bergaya bahasa santai yang masing-masing mengulas satu fitur baru/keren dari Perl 6. Saat ini sudah ada 20-an artikel yang diterbitkan. Masih ada beberapa hari lagi tersisa, jangan lewatkan!

Betewe, sejak taun ini ikut-ikutan beli hape kuerti yang bisa internetan, walau gak ikut-ikutan fesbukan, saya menemukan kegiatan yang mengasyikkan: membaca RSS di hape dengan Google Reader. Terasa sekali, perusahaan yang paling gets it, paling memperhatikan usability mobile browsing tak lain tak bukan adalah Google. Dan juga Facebook mungkin (tapi saya jarang pakai). Karena itu gak sabar rasanya menanti ponsel dan netbuk besutan Google tahun depan.

Rabu, 16 Desember 2009

Storable, Regexp, bugs, bugs, bugs

A few hours spent yesterday trying to find out why some of my tests keep failing under certain conditions. Turns out I did a Storable::dclone() on an object, and that object contains a regular
expression. And Storable Don't Do No Regex. Worse is, Storable doesn't complain but will just freeze/thaw the regexes into garbage.

$ perl -MStorable=dclone -MData::Dumper -e'
    print "Storable version = $Storable::VERSION\n";
    $re = qr/abc/;
    print Dumper $re;
    print Dumper dclone $re;'
Storable version = 2.21
$VAR1 = qr/(?-xism:abc)/;
$VAR1 = bless( do{\(my $o = undef)}, 'Regexp' );

This means:

$ perl -MStorable=freeze -E'say "yikes" if freeze(qr/a/) eq freeze(qr/b/)'
yikes

Since regexes are so common in Perl, maybe some warnings in Storable documentation should be in order? I'm sure many more bums are in line waiting to be bitten by this. (Bug report filed).

There is Regexp::Copy which contains Regexp::Storable, which is supposed to add regexp (de)serialization to Storable, but turns out that it still has bugs. Even this very simple case will yield a wrong answer:

$ perl -MData::Dumper -MStorable=dclone -MRegexp::Copy -E'say "Regexp::Copy version = $Regexp::Copy::VERSION"; say Dumper dclone([qr/a/, qr/b/])'
Regexp::Copy version = 0.06
$VAR1 = [
          qr/(?-xism:b)/,
          qr/(?-xism:b)/
        ];

(Btw, if you "use Regexp::Copy" before/without "use Storable", it will also result in an error. So that's couple of bug reports filed in).

Sadly it's unclear whether these will be fixed soon. The bug queues for Storable and Regexp::Copy contains unresolved entries several years old.

I first tried to switch to Data::Compare (as actually I was just comparing data structure freeze()'s for comparison, as well as some cloning). But turns out that Data::Compare doesn't deal with recursive/circular structure yet (bug filed).

Finally I resorted to using the good ol' Data::Dumper for serializing/comparison part, and Clone for the cloning part.

Is it just me (I hope it's just me) or do other people find serializing/deserializing modules in CPAN tend to be more buggy than, say, Ruby ones? I've never *once* encountered a problem with Ruby's yaml module, yet aside from Storable and Regexp case above, I have also been bitten several times by bugs in YAML.pm, YAML::Syck, and YAML::XS. The last one is this.

And with a couple of bugs in my own code unrelated to all the above, that makes the most number of bugs found/reported between yesterday and today. Not bad after all, but still I'm worried.

Arti sebuah nama

“What’s in a name?” Apalah arti sebuah nama, begitu tulis Shakespeare. “That which we call a rose. By any other name would smell as sweet.” Atau diterjemahkan, terasi dibilang mawar pun tetap bau.

Tentu saja, dalam kenyataan, sebuah nama biasanya mengandung banyak arti, karena kita memberi nama tidak secara acak melainkan disertai asosiasi, maksud, ekspresi, atau harapan tertentu. Saya ingat saat sebuah serial drama Jepang 1980-an beken di Indonesia, dari anak kenalan sampai anjing tetangga diberi nama Oshin. Jiwa setiap zaman dan budaya terjejak dalam nama-nama yang diberikan pada era/kultur tersebut. Masa revolusi dan perjuangan China dulu banyak bayi lelaki diberi nama Jian Guo (bangun negara) atau Guo Qing (hari kemerdekaan). Di Internet, muncul situs-situs bernama aneh seperti digg, reddit, twitter, semuanya karena ingin nama yang pendek di tengah kelangkaan domain .com.

Baru-baru ini saya menulis 2 buah modul kecil dalam bahasa pemrograman Perl, yang satu untuk menebak gender nama orang berdasarkan nama depan (menurut statistik dan sejumlah aturan heuristik), dan yang satu lagi untuk mengurai sebuah nama menjadi komponen-komponennya. Pada waktu Anda membaca tulisan ini, kemungkinan kedua modul tersebut sudah bertengger di situs repositori Perl CPAN.

Berbeda dengan beberapa modul serupa yang sudah ditulis untuk bahasa lain seperti Inggris yang hanya berkutat soal penebakan gender, modul pengurai nama Indonesia ini saya lengkapi dengan rutin untuk mengekstrak segala macam aspek yang memang terindikasi dalam nama. Termasuk agama (dari keberadaan titel seperti Haji/Hj, nama depan seperti Muhammad/Muh, atau singkatan nama baptis seperti FX), suku/etnik (dari pola penamaan tertentu misalnya di Bali dengan nama-nama seperti I Gusti Agung atau Ni Made, di Jawa dengan Raden, atau dari nama depan/marga yang amat khas seperti Liem untuk etnik China, Siregar untuk Batak, dll), hingga profesi/tingkat pendidikan (dari titel akademik). Sudah sangat “SARA” bukan? Tapi apa yang sebetulnya dimaksud dengan isu SARA?

Modul penebak gender biasanya digunakan untuk memberi kata sapaan yang cocok (bisa Bapak atau Ibu) saat menulis surat/email, karena ada studi yang mengatakan bahwa penyebutan kata sapaan yang salah dapat mengurangi efektivitas/tingkat respon/dsb (selain tentunya menyinggung perasaan!). Namun modul pengurai nama saya ini, termasuk alat bantu lain seperti perangkat lunak pendeteksi ras dalam foto wajah, dapat membantu proses diskriminasi lebih lanjut. Bayangkan proses penyaringan mahasiswa/karyawan/pejabat yang kini dapat lebih praktis dalam membuang calon tak diinginkan dari ras, suku, agama/keyakinan tertentu.

Saya sempat ragu sesaat untuk tidak jadi merilis modul ini, namun berdasarkan pertimbangan-pertimbangan di bawah, akhirnya saya berkeputusan untuk tetap merilisnya.

Pertama, perangkat lunak semacam ini tidak membuat jadi mungkin diskriminasi yang sebelumnya tidak dimungkinkan. Maksudnya adalah, semua informasi untuk diskriminasi seperti agama, suku, gender, dsb tersebut sebetulnya sudah terkandung di dalam nama itu sendiri. Perangkat lunak hanya merupakan enkoding informasi ini dalam bentuk instruksi komputer. Entah karena kebanggaan, kebiasaan, atau untuk meneruskan garis keturunan, orang tetap mencantumkan berbagai elemen indikator ke dalam nama mereka, walaupun konsekuensinya mempermudah dirinya didiskriminasi berdasarkan nama.

Kedua, argumen “pedang bermata dua”. Sama seperti senjata pisau atau senapan yang bisa digunakan untuk membunuh maupun menyelamatkan, memulai atau menghentikan perang, demikian juga perangkat lunak dapat dipakai oleh polisi untuk melakukan racial profiling ataupun bagi para organisasi untuk melaksanakan affirmative action. Keberadaan perangkat lunak itu sendiri tidak mengubah kecenderungan ke arah salah satu.

Ketiga, dalam kaitannya dengan SARA, UU ITE di Bab tentang perbuatan yang dilarang menyebutkan bahwa hanya informasi yang ditujukan untuk menimbulkan kebencian atau permusuhan antarindividu/antargolonganlah yang termasuk dilarang disebarkan. Perangkat lunak pengurai (parser) sama sekali tidak dibibiti informasi kebencian/permusuhan.

Bagaimana menurut pandangan para pembaca? Saya menanti masukan dari Anda semua. Apakah merilis perangkat lunak untuk menguraikan (parsing) nama orang dengan tujuan mengetahui gender, agama, suku, golongan, termasuk ke dalam tindakan terlarang? (PCMedia Edisi Jan 2010)

Senin, 07 Desember 2009

Variabel state di Perl 5.10

Bahasa Perl termasuk memberi banyak pilihan skop variabel bagi programer. Pertama, ada variabel global (tepatnya variabel package, karena sebetulnya tidak ada variabel global di Perl; ck ck ck di awal artikel sudah berbohong? :-)

Kedua, yang mungkin paling sering kita pakai, variabel leksikal alias variabel privat yang dideklarasikan dengan my() yang hanya bisa dilihat/diakses oleh blok atau file atau eval tempat si variabel dideklarasi.

Ketiga, variabel lokal untuk dynamic scoping, menggunakan kata kunci local(), biasanya kita pakai untuk menyimpan/mem-backup sebuah variabel, lalu mengubahnya di dalam sebuah blok, dan nanti otomatis saat kita keluar dari blok tersebut nilai si variabel akan terpulihkan kembali. Kita bisa melokalkan variabel package, filehandle, glob, dsb. Bahkan kita bisa melakukan hal seperti ini:

$config = { foo=>'ujan', bar=>'kemarau' };
...
{
    local $config->{foo} = 'blah';
    sini();
    sana();
}
print $config->{foo}; # 'ujan' lagi

Pada contoh di atas, kita melokalkan satu pair dari hash saja. Saat masuk ke sini() dan sana(), nilai lokal $config->{foo} akan terus dipertahankan. Inilah yang dimaksud dynamic scoping, jadi tidak berbasis pada source code melainkan pada alur running program. Setelah blok selesai, barulah nilai lama $config->{foo} pulih. Asyik kan?

Keempat, ada lagi yang namanya our(), mulai diperkenalkan sejak Perl 5.6. Ini pada dasarnya adalah pengganti untuk "use vars qw($foo)". our() membuat alias leksikal untuk sebuah variabel global, eh, variabel package. Tentu saja variabel tersebut nanti bisa diakses dari package lain.

Kelima, yang menjadi topik posting blog ini, yaitu variabel state. Variabel ini diperkenalkan sejak Perl 5.9.sekian dan resmi menjadi bagian fitur baru dari 5.10. Untuk menggunakannya, di awal skrip kita harus melakukan:

use feature 'state';

atau:

use feature ':5.10'; # jangan lupa kutipnya ya...

Kegunaan variabel state adalah untuk membuat variabel leksikal yang persisten. Sebelumnya hal ini memang bisa dilakukan menggunakan closure, tapi terus terang saya malas menghafalnya. Setelah ada state(), barulah jadi lebih termotivasi untuk menggunakan leksikal persisten.

Akhir-akhir ini saya sering membuat metode yang mengembalikan nilai konstan/statik, contohnya:

sub config_vars {
    [qw/
     recurse_hash
     recurse_array
     parse_prefix
     ...
    /]
}

Kenapa tidak pakai variabel biasa saja (mis menggunakan our())? Tujuannya sih agar bisa memanfaatkan inheritance.

Tapi, tahukah Anda, bahwa setiap kali dipanggil, si metode tersebut akan membuat arrayref baru? Buktinya:

$ perl -le'sub f { [1,2,3,4,5,6,7,8,9,10] } print f for 1..5'
ARRAY(0x9d1c40)
ARRAY(0xa083c8)
ARRAY(0xa083b0)
ARRAY(0xa08398)
ARRAY(0xa08380)

Pemborosan bukan? Nah, solusinya kita bisa menggunakan variabel state:

sub config_vars {
    state $a = [qw/
     recurse_hash
     recurse_array
     parse_prefix
     ...
    /];
}

Variabel $a ini akan hidup terus walaupun sudah keluar dari skop (tentu saja, pada ujungnya nanti akan di-garbage collect kalau memang tidak ada yang memakai lagi).

Kini:

$ perl -lE'sub f { state $f = [1,2,3,4,5,6,7,8,9,10] } print f for 1..5'
ARRAY(0x12ffc40)
ARRAY(0x12ffc40)
ARRAY(0x12ffc40)
ARRAY(0x12ffc40)
ARRAY(0x12ffc40)

Oya, -E sama seperti -e tapi menghidupkan semua fitur Perl terbaru (dalam kasus ini, ekivalen dengan "use feature ':5.10'").

Selamat bermain dengan variabel di Perl!

Rabu, 02 Desember 2009

Requiring 5.10

So I decided to add "perl = 5.010000" in some of my dist.ini's. This is just because of one particular habit I recently acquired: writing "$a //= 1" and "$b = $a // 2". (Well actually I've rejoiced since defined-or is announced for 5.10, but for some reason have only begun to really use it in the past weeks.)

So much nicer than writing "$a = 1 unless defined($a)" and "$b = defined($a) ? $a : $2". It's the having to repeat myself aspect which I find disgusting, especially if $a is an expression. Which is one of the reason I always hate coding in PHP because in PHP you can't even say "$b = $a || 2".

I'm not switching, state-ing, say-ing, doing recursive patterns, or any of the other cool stuffs in 5.10. So am I so selfish for forcing 5.10 down the throats of other people just for such a minor convenience? The word externality comes to mind (having completed reading Superfreakonomics 2 days ago).

But it's a positive externality, really. ;-)

Selasa, 01 Desember 2009

minicpan

Last weekend I got a new toy: Asus EEE PC S101. This is actually my second netbook (and fourth laptop overall). I sort of dumped my first netbook only after a month of use because I now totally hate hard drives on netbooks: they're hot, they're loud, they're a power drain.

The EEE has a SSD drive, but at only 32GB, putting the whole CPAN (currently at 6.9GB) on it is a bit taxing. With the help of minicpan, I got it down to only 1.2GB. So thanks again Ricardo!

Then I immediately wondered how big BackPAN is, guessing it might be between 30-100GB (with 14 years of CPAN's history and all). But then:

$ perl -MParse::BACKPAN::Packages -e'$p = Parse::BACKPAN::Packages->new(); printf "BACKPAN is %.1fGB\n", $p->size/1024/1024/1024'
BACKPAN is 12.4GB

Much smaller than I had thought. It's just about twice the current size of CPAN. Which probably means that a lot of CPAN authors still leave much of their stuffs around and not delete them.

Kamis, 26 November 2009

More flexible configuration merging with Data::PrefixMerge

(I'm planning a refactoring for Data::PrefixMerge and will be renaming it to Data::ModeMerge. Thought I'd post something on the blog.)

In a typical Unix program, there are three levels of configuration: system-wide config file (/etc/myapp.conf), per-user config file (~/.myapprc), and command-line options. It's convenient programatically to load each of those in a hash and then merge (e.g. using Data::Merger or Hash::Merge) system-wide hash with the per-user hash, and then merge again the result with the command-line hash to get the a single hash as the final configuration. Your program can from there on deal with this just one hash instead of three.

In a typical merging process between two hashes (left-side and right-side), when there is a conflicting key, then the right-side key will override the left-side. This is usually the desired behaviour in our said program as the system-wide config is there to provide defaults, and the per-user config (and the command-line arguments) allow a user to override those defaults.

But suppose that the user wants to unset a certain configuration setting that is defined by the system-wide config? She can't do that unless she edits the system-wide config (in which she might need admin rights), or the program allows the user to disregard the system-wide config. The latter is usually what's implemented by many Unix programs, e.g. the -noconfig command-line option in mplayer. But this has two drawbacks:

a slightly added complexity in the program. The program needs to provide a special, extra comand-line option.
the user loses all the default settings in the system-wide config. What she needed in the first place was to just unset a single setting (a single key-value pair of the hash).

Here's where Data::PrefixMerge comes in. It provides a so-called DELETE mode.

prefix_merge({foo=>1, bar=>2}, {"!foo"=>undef, bar=>3, baz=>1});

will result ini:

{bar=>3, baz=>1}

The ! prefix tells Data::ModeMerge to do a DELETE mode merging. So the final result will lack the foo key.

On the other hand, what if the system admin wants to protect a certain configuration setting from being overriden by the user or the command-line? This is useful in a hosting or other retrictive environment where we want to limit users' freedom to some degree. This is possible via the KEEP mode (prefix ^):

prefix_merge({"^bar"=>2, "^baz"=>1}, {bar=>3, "!baz"=>0, quux=>7});

will result in:

{bar=>2, baz=>1, quux=>7}

effectively protecting bar and baz from being overriden/deleted/etc.

Aside from the two mentioned modes, there are also a few others available by default: ADD (prefix +), CONCAT (prefix .), SUBTRACT (prefix -), as well as the plain ol' NORMAL/override (optional prefix *).

You can add other modes by writing a mode handler module. (planned in upcoming Data::ModeMerge release)

You can change the default prefixes for each mode if you want. You can disable each mode individually. (planned in upcoming Data::ModeMerge release)

You can default to always using a certain mode, like the NORMAL mode, and ignore all the prefixes, in which case Data::ModeMerge will behave like most other merge modules.

You can change default mode, prefixes, disabling/enabling mode, etc on a per-hash basis using the so-called options key.

Selasa, 24 November 2009

WEHT Emperl?

Speaking of templating systems in the previous post, I also got reminded of Embperl, and wondered why it didn't get more popular. I remember back in 1998-1999 enjoying working with Embperl before moving on to Mason. It has a nice syntax, and one very nice feature: (HTML- and/or URL-)escaping output by default. Say you have in $foo "<script>evil()</script>" then this template:

[+ $foo +]

will output "<script>evil()</script>". You are protected from XSS by default. And if you want to turn off this escaping, you can set EMBPERL_ESCMODE to 0, or, do this:

[+ do { local $escmode = 0; $foo } +]

But then maybe this is akin to what earlier versions of PHP attempted to do with default magic_quotes_gpc and magic_quotes_runtime set to on. These two default configuration have helped spread the backslashitis/toothpick syndrome all over the web and are currently deprecated (and will be removed in PHP 6.0). A majority of PHP programmers apparently never understood the need of these escaping, and got confused/mad by the insistence of PHP to add those pestering backslashes. And most would turn off the configuration, or add a routine to reverse the escaping at the beginning of their programs.

So is the moral of the story: do not overprotect programmers (especially ignorant ones)? Or don't try to fix the problem the wrong way? Or both?

Perl program that generates... Perl programs

Data::Schema is a module that I wrote to do data structure validation (using another data structure acting as a schema). I had not been terribly happy with the validation speed, ranging in about 100 validations/sec for schemas that are only barely complex. This is not bad, actually, because Data::FormValidator also performs more or less the same. But that was below expectation.

Last week I suddenly realized, writing data validator or a templating system is essentially writing a [mini] language, with our schema/template as the miniprogram and our [Perl] program as the interpreter. To make our miniprogram faster, we can compile it into Perl.

FastTemplate from PHP quickly comes to mind. I remember that it was advertised as being fast because it converts the template into PHP code (e.g. into a bunch of echo's, if/else's, and for loops). And I think there must be at least half a dozen Perl-based template languages on CPAN doing the same. Although I'm not sure about how common data validators compile schemas into Perl code.

So anyway, starting from 0.12, Data::Schema can convert schema into Perl subroutines and use it to validate data structures. Validation speed jumps about one order of magnitude faster, and I'm a happy guy again.

Selasa, 17 November 2009

Who's using your module?

To see who on CPAN are using your module (distribution), go to this page: http://cpants.perl.org/dist/used_by/Your-DistName

So I casually entered my distributions, being happily surprised that there are people using them.

If you're like me, and I'm sure thousands of other CPAN authors, you're putting stuffs on CPAN either for fun, or for your own use. Knowing other people are using your module gives you some perspective. I was happily breaking/removing/refactoring stuffs as I please, but now I'll need to think twice everytime.

Merge key support in YAML

None of the many Perl YAML modules support merge key:

$ cat merge.yaml
foo: 1
bar: 2        
<<:
  baz: 3      

$ perl -MYAML::Tiny -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
---                                                                                               
<<:             
  baz: 3      
bar: 2        
foo: 1

$ perl -MYAML -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
---
<<:
  baz: 3
bar: 2
foo: 1

$ perl -MYAML::Syck -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
---
<<:
  baz: 3
bar: 2
foo: 1

$ perl -MYAML::XS -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
---
<<:
  baz: 3
bar: 2
foo: 1

However, good ol' Ruby handles it just fine:

$ ruby -ryaml -e'print YAML::load(File.open("merge.yaml").read).to_yaml'
---
baz: 3
foo: 1
bar: 2

Kinda strange since Ruby's yaml and YAML::Syck both use why's syck library, which has not been updated for quite a while.

Selasa, 10 November 2009

Dist::Zilla for Module::Starter Users: A 2-Minute Guide

So you're a module author. You typically do this when starting a new distribution:

$ module-starter --module=Foo::Bar --author="Your Name" --email="you@example.com"

and then hack away under the resulting Foo-Bar directory. Easy enough right?

Problems are:

too much generated boilerplate code and text;
lack of automation for the building and release process.

I'm sure you have experienced one or more of these:

Having to search+replace copyright year in every file;
Forgetting to "make clean" or remove backup files before creating tarball;
Forgetting to update MANIFEST;
Feeling tired and bored of all the tedious and laborious tasks;
Wondering if there is a better way.

You need a distribution builder like Dist::Zilla. It helps:

eliminate a lot of duplicate text;
automate updating MANIFEST;
automate generating README;
build tarball;
automate a lot of other stuffs;
upload to CPAN;
and more.

Dist::Zilla is flexible and has a lot of plugin/plugin bundles, but it can be less straightforward to use it. Here's a simple step-by-step guide you can follow.

Install these modules from CPAN:
- Dist::Zilla
- Dist::Zilla::Plugin::PodWeaver
- Dist::Zilla::Plugin::ReadmeFromPod
Create ~/.dzil/config.ini, containing:

[!new] author = Your Name copyright_holder = Your Name initial_version = 0.01 [!release] user = YOUR-PAUSE-ID password = YOUR-PAUSE-PASSWORD
Now, instead of using module-starter, you run dzil new to start your new distribution:

$ dzil new Foo-Bar

Instead of a bunch of files under Foo-Bar/, there's now just one file, dist.ini:

name = Foo-Bar version = 0.01 author = youruser license = Perl_5 copyright_holder = youruser [@Classic]

There's currently a small bug in Dist::Zilla not supplying the correct author/copyright_holder, so edit dist.ini, as well as for adding some lines:

name = Foo-Bar version = 0.01 author = Your Name <you@example.com> license = Perl_5 copyright_holder = Your Name [@Classic] [PodWeaver] [ReadmeFromPod]
Create the most basic distribution structure:

$ mkdir -p lib/Foo t

Put some test files into t/. Default tests generated from module-starter like 00-load.t might be a good start.

Create Changes file. You can copy paste from the one generated by module-starter:
```
0.01    2009-11-11
        First version, released on an unsuspecting world.
```
As for lib/Foo/Bar.pm, here's what I use for template. You can just fill out the [[...]] parts:

package Foo::Bar; # ABSTRACT: [[Abstract of module]] use strict; use warnings; [[YOUR CODE]] 1; __END__ =head1 SYNOPSIS [[YOUR SYNOPSIS]] =head1 DESCRIPTION [[YOUR DESCRIPTION]] =cut

It's much simpler and shorter than what module-starter generates. Some POD sections like VERSION, NAME, AUTHOR, LICENSE AND COPYRIGHT are deliberately omitted. They will be generated by Dist::Zilla later when building the distro.
Hack away. Write your code in lib/Foo/Bar.pm. Add some tests in t/. Add other files when needed.
To test the distribution, run dzil test.
To build the distribution, run dzil build. This will create Foo-Bar-0.01.tar.gz which contains all the necessary goodies of a standard classic distribution, like README, LICENSE, MANIFEST, META.yml, etc.
To release the distribution, run dzil release. This will upload your module to CPAN. Sweet!
To release a new distribution, just update version number in dist.ini and $VERSION in your main module file. Don't forget to add an entry to Changes. Repeat dzil test, build, release.

In the future "dzil new" might allow creating a more complete skeleton.

There are lots of other nice things Dist::Zilla can do for you, like checking Changes file, do automatic version numbering, etc. Welcome to the nice world of Dist::Zilla!

Comments/corrections are welcome.

Regex Editor in Padre 0.50

Thanks to Gábor Szabó, we now have a regex editor in Padre. Yay! Sure it's really basic at the moment, but it's a start.

The Padre Features page says this about the regex editor: "A tool that allows the less experienced users as well to build and debug a regular expressions".

Well, if your inspiration is a tool to *teach* regex, then maybe. But in general I don't quite agree on the "less experienced users" emphasis. Regex is a minilanguage, and a featureful IDE/editor and debugger helps beginners as well as experienced users a lot. Would you say Padre is a tool for the less experienced Perl programmers? Up to a certain point, it's becoming really hard/cumbersome to debug rather long and complex regular expressions.

Personally I've only used Rx Toolkit in Komodo. Its interface is simple and not as colorful as some Windows regex editors on the market, but it does the job well. If I can cite a wishlist for Rx Toolkit, it would just be to add position indicator (line number + column + offset) in each of regex/string/matches subwindow. Being able to step regex is nice too I guess, but not essential. Usually my regexes aren't *that* long/complex.

Of course the future for Padre's regex editor is to handle Perl 6 grammars, which will be a full-fledged language in itself, so there *should* be step over/trace into/watch/etc debugger features for it.

Selasa, 03 November 2009

Creating shortcuts for long Perl module names

My main interface with the computer is the shell, the terminal, the keyboard. Thus I love shortcuts. My .bash_profile is littered with one- and two-letter shortcuts like:

alias m='mplayer -fixed-vo -osdlevel 3'
alias mn='mplayer -fixed-vo -osdlevel 3 -nosound'
alias m11='m -speed 1.1'
alias m12='m -speed 1.2'
alias m20='m -xy 2'
alias m21='m20 -speed 1.1'
...

And when aliases are not enough, because they do not work outside the shell (like in KDE's mini commander), I create scripts. My ~/bin is also littered with alias commands like:

# k
#!/bin/bash
exec konsole "$@"

As well as my homedir, project directories, and the whole filesystem ornamented with short symlinks to here and there.


$ ln -s public p
$ ln -s proj/perl pp
$ ln -s sites/steven.builder.localdomain/www .
...

When I have long Perl module names like Spanel::API::Account::Shared, Spanel::API::Account::XenVPS, I would also like to create shortcuts for these.

There are several modules to do this. I ended up using Package::Alias because I like the interface. Other ones I tried include aliased and namespace::alias.

Suppose I want to create a shortcut of SAAS for Spanel::API::Account::Shared (which can export foo). All I need to do is:

# in SAAS.pm
use Spanel::API::Account::Shared;
use Package::Alias 'SAAS'=>'Spanel::API::Account::Shared';
1;

Now aside from these:

$ perl -MSpanel::API::Account::Shared -e'Spanel::API::Account::Shared::foo()'
$ perl -MSpanel::API::Account::Shared=foo -e'foo()'

All of below work too:

$ perl -MSAAS -e'Spanel::API::Account::Shared::foo()'
$ perl -MSAAS -e'SAAS::foo()'
$ perl -MSAAS=foo -e'foo()'

My fingers thank Joshua Keros, the author of Package::Alias.

The while(1) {...last...} construct

In the past year or two I've been comfortably using this construct in Perl as well as in PHP (and probably others too):

while (1) {
    do_stuff;
    do_some_checks;
    do { warn error 1; last } if not OK;

    do_more_stuff;
    do_some_checks;
    do { warn error 2; last } if not OK;

    do_even_more_stuff;
    do_some_checks;
    do { warn error 3; last } if not OK;

    ...

    #finally
    last;
}

I prefer it over the one below:

do_stuff;
do_some_checks;
if (!OK) {
    warn error 1;
} else {
    do_more_stuff;
    do_some_checks;
    if (!OK) {
        warn error 2;
    } else {
        do_even_more_stuff;
        do_some_checks;
        if (!OK) {
             ...
        }
        # finally
        success!
    }
}

The while (1) { ... last ... } style is clearer and avoids extraneous indentation. The only thing biting me in the past when using this construct is that sometimes I forgot to add the final last, resulting in an infinite loop. But this series-of-checks-and-early-bail pattern happens so often in my code that the construct quickly became second nature.

Anyone got the same habit, or perhaps using some alternative (like the new given-when)?

Selasa, 27 Oktober 2009

Using mod_perl aside from deploying web apps?

History has it that mod_perl was (is?) marketed as a high performance alternative to Perl CGI, and that virtually all success stories mentioned on the website are about using it to deploy web apps.

It is unfortunate because: first of all, mod_perl is much more than just about running web apps, it is actually a tool to customize and extend Apache using Perl instead of C. There's so much you can do with it.

Second, I believe mod_perl is suboptimal for running web apps: it makes your Apache processes fat (and thus you'll often need a front-end proxy), it's tied to Apache (so you can't experiment with other webservers), it's relatively more complex to configure and tune compared to, say, FastCGI (and thus potentially more insecure), and it's just too damn powerful if you just want a fast CGI.

mod_perl is also theoretically more insecure because it bundles webserver and the application engine together instead of cleanly separates it. Some of your Perl code might run as root too. All of these are unnecessary if you just want to run web apps.

Here at work we have been using mod_perl for years, not for deploying web apps but for creating custom Apache handlers in Perl (basically it was because my C sucks). The handlers do the following:

connect to CGI daemon (because we also write our own CGI daemon, which is much more paranoid in some respects but also more flexible in others like running PHP scripts under different configurations);
filter URLs using own rules (this can be done using a series of regexps with mod_rewrite, but much more readable and comfortable if done with full-fledged Perl code);
authenticate hosting users;
do per-vhost aliases (mod_alias can also do this, but we are using mass/dynamic virtual hosts);
etc.

The servers on which we run mod_perl are shared hosting servers, and we allow users to install their own .htaccess, so we have to patch mod_perl to restrict the Perl* directives from the users. This is because mod_perl does not have something like mod_ruby's RubyRestrictDirectives. This kind of functionality is available as a build-time configuration.

So there you have it, an unusual/unpopular application of mod_perl: customizing Apache in a shared hosting environment.

Anyone else using mod_perl not to deploy web apps?

Trying out Padre 0.47

This is written after a few days of trying Padre. Normally I use emacs, joe, Komodo Edit/IDE, kate, and recently geany, so this post is basically about comparing Padre's editing features with these other editors.

Note: Yup, 0.48 was already out a couple of weeks ago, but I haven't been able to install it due to segfaults in libwx*. Incidentally my box was recently upgraded from Jaunty to Karmic so that might have caused the problem. Anyway, I did read the 0.48's ChangeLog just in case.

First and foremost, I'm blown away! Between my first try a few months back and 0.47, Padre has tranformed into a very usable editor/IDE, and it's pretty fast too, faster than Komodo IDE/Edit which is notoriously sluggish. Kudos to the hardworking Padre team, you guys rock!

Here are the features which I find still missing:

a button (or keyboard shortcut) to quickly toggle on/off the directory tree.
next-tab and prev-tab functionality. There is next-file and prev-file, but after you rearrange the tabs, those two become less useful. Maybe assign Ctrl-PgUp/PgDn or Alt-Left/Right for this?
justify/reflow paragraph/manual word wrap. A la emacs' M-q or joe's C-kj.
autodelete trailing whitespaces when saving (and autoadd newline at the end if missing).
remember folding state when reopening files.
(Bug?) the "autofold POD when folding is enabled" feature is very nice, but right now is behaving rather strangely. It only works after I open Preferences and hit Save. Opening or saving files does not automatically fold PODs as advertised.
maybe include something like Rx Toolkit in Komodo? Because I do write a *lot* of regexes when coding in Perl.
(Suggestion) soft characters for automatic bracket completion. I find this feature in Komodo very very nice as I seem to have the habit of typing { or (, and then after thinking a while cancel it.

I'd gladly submit these into Padre's Trac if someone would give me an account. Last time I visited #padre no one had the admin rights to do this. Signup from the website is temporarily disabled to overwhelming amount of spam.

Anyway, keep up the wonderful work, guys! Looking forward to using Padre more and more often in the future. Envisioning writing Padre config files in Perl, just like in emacs using Lisp... Life will be good...

Selasa, 20 Oktober 2009

HTTP-style sub return

In a subroutine, we often want to return the status of operation (success/failure/error code) as well as the result of the operation.

When a function does not need to return any result, it can just return the status, usually as an integer scalar. In C and Unix the convention is to return 0 for success and non-zero for the error code. In Perl and many other languages, it is the other way around: zero/false/undef for failure and true values for success. You can also return the result as well right there and then as long as the result evaluates to true. This can be a problem if there is a possibility that the result be zero/false/undef.

Alternatively, you can return the result instead. How do we return the error code then? Usually via some global variable like $? and $! in Perl. This has drawbacks of its own, like in multithreaded/reentrant code.

It is safer thus, to return both the status and the result separately and explicitly, e.g. using a 2-element array:

($status, $result) = foo(...);

For roughly a year now, I have been adopting something like the above, with what I call HTTP-style return convention. Instead of 2, I return a 3-element array in my subs:

return ($status, $extra_info, $result);

$status is a 3-digit integer values, with values taken as much as possible from the HTTP spec: 200 means success, 4xx means generic "client side" (i.e. caller side) error like missing or invalid arguments, 404 means not found, 403 means forbidden, 5xx means error in the "server side" (our side, the sub side), 501 means unimplemented, and so on.

$extra_info is a hashref which contains, well... extra info, like error string, debugging messages, or intermediate results. This is the equivalent of HTTP response headers. But it can be an undef too if the sub does not offer extra information. So it will avoid creating unnecessary hashref.

$result is the actual result.

An example code:

my @resp = process_stuff(@stuffs);
if ($resp[0] != 200) {
    die "Failed ($resp[1]{errstr})";
} else {
    print "Number of stuffs input: ",     scalar(@stuffs);
    print "Number of stuffs processed: ", $resp[2];
}

Another example:

my @resp = search_stuff($stuff);
if ($resp[0] == 404) {
    die "Stuff $stuff not found";
} elsif ($resp[0] != 200) {
    die "Failed";
} else {
    print "Stuff $stuff found in $resp[2]";
}

It is a bit confusing for readers not familiar with this style, but I find this clear as it is (i.e., without declaring/using constants for 200, 404, etc).

The advantages of this return style:

You can return the status as well as the result as well as extra info.
The convention for the status codes is already familiar to many. HTTP has been so popular for most of the lifetime of Perl, that most Perl programmers who have dabbed in CGI or Internet programming should be familiar with it. Even many nonprogrammers know what 404 or 500 mean since they often see this while browsing. The 2xx, 4xx, 5xx convention is also used in other protocols like SMTP.

If you are not, you can always use a comment or a constant to explain the meaning of the numbers.
You can easily wrap your sub later in a REST API or web service. Just pass $status as HTTP status code, (selected) pairs in $extra_info into HTTP response headers, and $result (possibly encoded in JSON/YAML/whatever).

A bonus, because Perl has contexts, you can also do this:

if (wantarray) {
    return ($status, $extra_info, $result);
} else {
    return $result;
}

so that you can fallback to a very simple style when all the other stuffs are not needed:

my $result = foo(...); # don't care about status, assume always success

I find this return style somewhat relevant in the light of PSGI's deservedly speedy upshoot to popularity.

Rabu, 14 Oktober 2009

Dumping content to files using Log::Dispatch::Dir

Logging frameworks like Log::Log4perl and Log::Dispatch are great. They relieve you of the burden of reinventing your own (which, admit it, will probably suck more), and they decouple your code from logging details (which can be changed later independently). You just say I want to log "something something" in the code, and you can later configure whether those messages are actually written to the logs, where those logs are, how the messages are formatted, etc without changing your log-using code.

Logging can be used, to some extent, to replace debugging or to aid it. Each log message is usually only a single line, like "Starting foo ...", "Ended foo ...", "The value of foo is $val", although we can also log large data structure dumps or file contents.

When writing web robots like HTML scrapers or interfaces to online banking sites, which are particularly fragile, it is often convenient to save each server's full response into a separate file so you can easily check each step by opening the saved file in a browser.

If you want to use Log::Log4perl or Log::Dispatch for this, you can too, using Log::Dispatch::Dir. This module will write each log message to a separate file in a specified log directory.

An example, in Finance::Bank::ID::Base I have code like this:


$self->logger_dump->trace(
    "<!-- result of mech request #$i ($method ".Dump($args)."):\n".
    $mech->response->status_line."\n".
    $mech->response->headers->as_string."\n".
    "-->\n".
    $mech->content
);

where $mech is a WWW::Mechanize object. I use $self->logger for "normal" log messages and $self->logger_dump specifically for dumping contents. Both the logger and logger_dump attributes can be supplied by the module user, e.g.:


my $ibank = Finance::Bank::ID::BCA->new(
    ...
    logger      => Log::Log4perl->get_logger("Messages"),
    logger_dump => Log::Log4perl->get_logger("Dumps"),
);

and the Log::Log4perl configuration is something like this:


log4perl.logger.Messages=TRACE, SCREEN, LOGFILE
log4perl.logger.Dumps=TRACE, LOGDIR

log4perl.appender.SCREEN=Log::Log4perl::Appender::ScreenColoredLevels
log4perl.appender.SCREEN.layout=PatternLayout
log4perl.appender.SCREEN.layout.ConversionPattern=[\%r] %m%n

log4perl.appender.LOGFILE=Log::Log4perl::Appender::File
log4perl.appender.LOGFILE.filename=/path/to/logs/main.log
log4perl.appender.LOGFILE.layout=PatternLayout
log4perl.appender.LOGFILE.layout.ConversionPattern=[\%d] %m%n

log4perl.appender.LOGDIR=Log::Dispatch::Dir
log4perl.appender.LOGDIR.dirname=/path/to/logs/dumps
log4perl.appender.LOGDIR.layout=PatternLayout
log4perl.appender.LOGDIR.layout.ConversionPattern=%m

This is convenient enough for me, but in the future I want to do some MIME checking to the log messages, so Log::Dispatch::Dir can automatically add a suitable file extension e.g. .html, .txt, .jpg, etc.

Mengecek rekening BCA dan Mandiri dengan Perl

Akhirnya, kesampean juga merilis modul Finance::Bank::ID::BCA dan Finance::Bank::ID::Mandiri. Kini Anda bisa mengecek rekening BCA dan Mandiri dengan Perl!

Bertahun-tahun lalu saya pernah membuat skrip serupa untuk KlikBCA tapi pake kombinasi curl/wget. Sejak terjadi perubahan layout/program di KlikBCA dari ASP ke Java, gak pernah lagi ngupdate skrip ini sampe beberapa bulan yang lalu. Dan akhirnya minggu lalu dan minggu ini menyempatkan memodulkan kodenya.

Oya, sebetulnya sebagian kodenya pertama-tama ditulis dalam PHP ;-p. Cuma, kode PHP-nya untuk kantor dan gak dirilis (dan mungkin gak akan pernah dirilis, karena saya males memaintain kode PHP utk publik).

Senin, 12 Oktober 2009

Planet CPAN

Recently I have been enjoying Iron Man blog posts that talk about some particular CPAN module like HTML::FormHandler, Finance::Quote, Term::ProgressBar, local::lib, Log::Log4perl, Dist::Zilla, even the good ol' Digest::MD5.

I'm hoping though that even more people (authors and users alike) would blog more about CPAN modules, because although we arguably have one of the richest sets of interfaces to our wonderful software library, with more than 16000 modules it's near impossible to browse them all. Feature blog posts certainly help people stumble on interesting stuffs even if they don't follow Recent CPAN Uploads.

Since ~ 95% of all interesting things in the Perl world are happening inside CPAN (not to belittle the huge efforts of the p5p team or the Parrot & Perl 6 designers/implementors, of course) shouldn't we be blogging more about it?

Filtering RSS

On a somewhat unrelated note, lately I've also been tired of all the Microsoft/Windows/Vista/7/8/9 that are filling up from the Slashdot RSS feed. Most are irrelevant to me as I use Linux, besides, those news are really minor/unimportant/of marketing type, like rereading old Vista reviews, intentionally ambiguous 128-bit version of future Windows, or repeated news items telling me that products are being delayed yet again. Who cares?

Nowadays I'm using Google Reader on a cellphone to read most feeds, so less junk would be nice. Google Reader doesn't have filtering yet, so I ended up using Yahoo! Pipes. Filtering and doing other stuffs to feeds (and other kinds of data like CSV) are surprisingly quick and easy using this web-based visual editor. A user-friendly Perl- and Unix-killer? :-) Just go to pipes.yahoo.com, click on My Pipes, create a new pipe, and do some drag-and-drops and text field filling, publish your pipe, and get RSS. Perfect (here are two examples of pipes I've created: slashdot-noms and slashdot-nofb). Looking forward to a more Microsoft-free news reading ahead.

CPAN download counter? +1!

From time to time people (including me once, a few years back) would ask questions like: what are the 'top' (or 'most popular' or 'widely used' or most downloaded) dists/modules on CPAN? Is there a download counter for each dist/module? Like this one from prz, a budding module author.

The answer is there isn't one, because CPAN is just a bunch of static files. The upside of this, CPAN is very easily mirrored (e.g. via FTP or rsync or offline via CDs) and served (e.g. via FTP, HTTP, or local filesystem). The downside, there isn't a place for much intelligence/logic on the serving side.

To implement this feature, we can put some stats gathering code on the client side, like what Debian has been doing for a while; in fact you can already see the list of most widely installed Perl modules from the data. Or we can add some stats to search.cpan.org like most viewed/clicked/downloaded dists and modules, and maybe top search keywords. Not representative of all mirrors, sure, but it's better than nothing.

Download counter, or at least Popular/Top Downloads, is a common feature on download/catalog/shopping/news sites, from freshmeat and Download.com, to Amazon and iTunes Store. So common that many users expect it to be there as a standard feature.

It's not hard to imagine why people like to know what's popular, what everybody else is using/doing, what's in, what's hot. It's a social side of human nature. And it's beneficial to know which modules are getting downloaded and used more, to direct development efforts to the more important stuffs. Volunteers can surely take the top modules list as one consideration when picking which project to spend their valuable time on.

What I'm not very clear on though is why, aside from PHP, many programming languages' communities don't like this particular feature? Do we hate competition, do we hate popularity contest, or are we just plain lazy?

Anyway, effort like CPANHQ might soon make the Top/most $foo modules, and more, possible. Yay!

Selasa, 06 Oktober 2009

Moving from bzr to git

As with a lot of coders out there, I've moved on from no source control, to CVS, to Subversion, and recently to one of the distributed ones. In fact, I tried both Bazaar and git roughly at the same time a couple of years back and have been quite comfortably using both to manage my projects.

Last week though, I've migrated all but one of my projects to git. The only bzr repository left is the one for $work where I want to avoid retraining people to use git since they are not coders.

I had no real complaints with bzr. Sure, it's generally slower than git, the repository size is slightly larger than git's, and branching is a bit more cumbersome than in git, but all of those have never bothered me enough to part with bzr.

The clincher, however, came when I needed to write some post-commit hook to post my commit messages to an internal web-based bulletin board. In bzr you need to write a plugin written in Python. Not that there's anything wrong with it, but I don't think I even want a Perl-based SCM where I need to write a Perl plugin for everything when a two-line shell script will do. Add to that fact that bzr doesn't provide a template/skeleton for the plugin and I have to spend a few minutes to google around the plugin API (in addition to refreshing my memory on Python syntax), etc.

I think I'm much more comfortable with git nowadays.

Some Indonesian-specific modules to check SIM, NPWP, NIK

Today for some reason I finally got motivated enough to write and release Business-ID-{NIK,SIM,NPWP} to CPAN. Been wanting to for years, but haven't got around to it due to lack of challenge and, more importantly, reference. For example, so far I haven't found a publicly available and authoritative document explaining the numbering scheme and the valid area codes for the SIM. I had to do some deducing (read: guessing) from some thousands of actual user-submitted SIM, NPWP, and NIK/KTP numbers (some of them most certainly bogus). Isn't it sad?

If you encounter some valid numbers being rejected by the modules, or vice versa, feel free to report them as bugs.

Senin, 28 September 2009

Perlsatu kita teguh, Perlcerai kita runtuh

Yup, garing abis judulnya, tapi yang kepikir dalam 10 detik cuma itu sih.

Blog ini pertama dibuat untuk ikutan Perl Iron Man Blogging Challenge. Karena gw gak yakin bisa keep up ngeblog sendirian seminggu sekali (tepatnya, 4 post dalam 32 hari) maka gw buat nama blog ini mewakili komunitas Perl Indonesia.

Bila Anda cinta/suka Perl, ingin berbagi tentang Perl, atau sedang belajar Perl, saya harapkan mau ikut bergabung ngeblog di blog ini. Posting boleh dalam bahasa Indonesia maupun English. Gak harus murni tentang Perl, yang kira-kira cuma 10-20% berhubungan dengan Perl juga gpp. Dan sekali-kali OOT juga no problem. Bahasa gak harus resmi, yang penting enak dibaca dan komunikatif. Level post gak harus advanced, semua level diterima.

Berminat? Silakan hubungi Steven di: steven2 setrip id setrip perl et masterweb titik net. Nanti 'tak tambahkan akun Google Anda ke dalam daftar anggota tim blog ini supaya bisa ikutan ngepost. Yuk?

Buat para pembaca, enjoy dan doakan kita2x!