Rabu, 27 Januari 2010

Four months after...

Four months ago I joined the Iron Man challenge. So what have I done and what did I get out of this?

What I have done: wrote several new Perl modules. I didn't feel like I have enough things to say (that's why I set up the blog as "Perl Indonesia" and invited others to join instead of "Steven Haryanto's Perl Adventure"), and I still don't, really. So I write more code instead, and post about them.

The modules are typically small, ones which I can finish in one or two sittings. The ideas for these modules usually have been floating in my head for some time, but since they are trivial enough I have never gotten around to write them. The blogging challenge changed that.

Some of the other modules actually come from code I have written, in one form or another, but have not been properly packaged. Dist::Zilla changed that, as it lowers the cost of maintaining and releasing Perl distributions. Thanks, RJBS!

What I get: joy and satisfaction that I am actively doing something for Perl.

All in all, Ironman is a great idea and a good cause. I encourage everybody to keep blogging and writing.

Rabu, 20 Januari 2010

Lingua::ZH::PinyinConvert::ID

Today I released Lingua::ZH::PinyinConvert::ID. This release is dedicated to the recently deceased former President of Indonesia, Abdurrahman Wahid, which in 2001 declared Chinese new year as national holiday as well as re-allowed the use of Chinese characters and the expression of Chinese culture in daily life, thus ending the 30 years of anti-Chinese laws during the Suharto regime.

Sadly, a generation of Indonesian Chinese people grew up while being cut off from their heritage, not learning Chinese languages nor a lot of the Chinese culture. This will probably be fixed in the next couple of generations as many elementary schools now start to include Mandarin in their curriculum.

Selasa, 12 Januari 2010

Log::Any

A few weeks ago I found about Log::Any, and have since migrated many of modules to it (usually from Log::Log4perl).

The two main advantages of Log::Any for me:
  • your users don't need to configure anything if they don't want logging. When my module, say Foo, uses Log4perl to produce logs, then when you use Foo, Log4perl will emit a warning: Log4perl: Seems like no initialization happened. Forgot to call init()?. To remove this warning you need to initialize Log4perl, e.g. using use Log::Log4perl qw(:easy); Log::Log4perl->easy_init($FATAL); This is annoying if you happen to not care about logging.

    With Log::Any, the default is null logging.

  • printf-style logging. For example, $log->debugf("data = %s", $data); It can also handle references/nested data structures, so you don't have to resort to something like $log->debug(sub { "data = " . Dumper($big_data) });. In fact, 99% of the time I use sub{} is precisely because I want to avoid the cost of dumping.


Of course, there are some features in Log4perl that is missing in Log::Any, like logdie() and the TRACE level. But it is a small price to pay. You gain other benefits, the most important of which is compatibility with other logging frameworks. I'm sure some people prefer something else to Log4perl. Log::Any currently supports Log4perl as well as Log::Dispatch, and possibly others too in the future.

Thanks Jonathan for Log::Any!

Selasa, 05 Januari 2010

The Help MySQL petition

As much as I love(d) MySQL and am still using it a lot (mostly for PHP web applications that are married to it), there seems to be too much politics surrounding it these days.

From helpmysql.org (emphasis mine):
If those IPRs fall into the hands of MySQL's primary competitor, then MySQL immediately ceases to be an alternative to Oracle's own high-priced products. So far, customers had the choice to use MySQL in new projects instead of Oracle's products. Some large companies even migrated (switched) from Oracle to MySQL for existing software solutions. And every one could credibly threaten Oracle's salespeople with using MySQL unless a major discount was granted. If Oracle owns MySQL, it will only laugh when customers try this. Getting rid of this problem is easily worth one billion dollars a year to Oracle, if not more.

Is Oracle really MySQL's primary competitor? I thought they represent two very distinct segments?

Also, if Oracle owns MySQL, why can't I still threaten Oracle sales reps to use MySQL to get a deep discount on Oracle DB? Because Oracle will threaten to kill MySQL, or sue every other company that provides paid support for MySQL, or deliberately delay fixing critical bugs in MySQL? I'm really not convinced with this argument. In any case MySQL would still be a cheaper substitute for Oracle. And I were an Oracle client wanting to get a discount (and not laughs), I think I would rather threaten to switch to SQL Server or Postgres instead.

Also, online petition immediately conjures up the image of a teenager trying to save his/her favorite cartoon TV show that is being cancelled.

Also, let's never forget the bigger "politics" before this, regarding the position of MySQL AB on the usefulness of things like transactions or foreign key constraints, depending on whether its product has support for them.

Let's take one particular example with foreign key constraints. It shows that you really can't trust the opinion of people with ulterior motives. Here's a snippet from the old MySQL 3.23.x manual, when MySQL has no support for foreign key checking (emphasis mine):
5.4.5.1 Reasons NOT to Use Foreign Keys constraints

There are so many problems with foreign key constraints that we don't know where to start:
  • Foreign key constraints make life very complicated, because the foreign key definitions must be stored in a database and implementing them would destroy the whole ``nice approach'' of using files that can be moved, copied, and removed.
  • The speed impact is terrible for INSERT and UPDATE statements, and in this case almost all FOREIGN KEY constraint checks are useless because you usually insert records in the right tables in the right order, anyway.
  • There is also a need to hold locks on many more tables when updating one table, because the side effects can cascade through the entire database. It's MUCH faster to delete records from one table first and subsequently delete them from the other tables.
  • You can no longer restore a table by doing a full delete from the table and then restoring all records (from a new source or from a backup).
  • If you use foreign key constraints you can't dump and restore tables unless you do so in a very specific order.
  • It's very easy to do ``allowed'' circular definitions that make the tables impossible to re-create each table with a single create statement, even if the definition works and is usable.
  • It's very easy to overlook FOREIGN KEY ... ON DELETE rules when one codes an application. It's not unusual that one loses a lot of important information just because a wrong or misused ON DELETE rule.

The only nice aspect of FOREIGN KEY is that it gives ODBC and some other client programs the ability to see how a table is connected and to use this to show connection diagrams and to help in building applicatons.

And here's a snippet from MySQL 5.1 manual, when, through InnoDB, MySQL now has support for foreign keys. Notice the complete change of heart (emphasis mine):
1.8.5.4. Foreign Keys

The InnoDB storage engine supports checking of foreign key constraints, including CASCADE, ON DELETE, and ON UPDATE. See Section 13.6.4.4, “FOREIGN KEY Constraints”.

[...]

Foreign key enforcement offers several benefits to database developers:
  • Assuming proper design of the relationships, foreign key constraints make it more difficult for a programmer to introduce an inconsistency into the database.
  • Centralized checking of constraints by the database server makes it unnecessary to perform these checks on the application side. This eliminates the possibility that different applications may not all check the constraints in the same way.
  • Using cascading updates and deletes can simplify the application code.
  • Properly designed foreign key rules aid in documenting relationships between tables.

Do keep in mind that these benefits come at the cost of additional overhead for the database server to perform the necessary checks. Additional checking by the server affects performance, which for some applications may be sufficiently undesirable as to be avoided if possible. (Some major commercial applications have coded the foreign key logic at the application level for this reason.)

[...]

Be aware that the use of foreign keys can sometimes lead to problems:

[...]

This either means MySQL AB deliberately added misleading opinion about foreign key constraints, or MySQL AB grew up and saw the benefits of foreign key constraints during the later days of 3.23/4.x. Either way doesn't bode well on MySQL.

But anyway, I guess all of these so-called "politics" exist in any product advocacy. No product can support all possible features, so unsupported features sometimes get downplayed, either deliberately or innocently. Since Perl subscribes to the multiparadigm and TIMTOWTDI thinking, we suffer less from these. But haven't we all heard more than a handful of otherwise brilliant Perl programmers casting aside things like block indentation or even OOP as overrated, just because other languages support these features (better) than us?

Parsing with Perl

When it comes to text processing and manipulation, I suspect that the majority of Perl programmers depend on their regex skill to do the job. So much that it becomes their hammer. At least it is mine, because I whip up regexes for almost anything, from web scraping to converting dictionaries.

Which is a pity because there are other useful techniques, like parsing.

Admittedly, parsing is a bit harder (to do correctly), and libraries for parsing most formats out there, from XML to YAML, from CSV to PPI, already exist anyway. So ironically it's even harder to find useful, practical applications for parsing that are easy to implement. Most parser tools seem to invariably give the tired, if not overused, calculator example (i.e., parsing simple mathematical expression like '3 * (2 + 4)').

In the recent years, I've also have only encountered one instance where I want to do some parsing. Two weeks ago I thought it would be nice for my Data::Schema module to provide some shortcuts.

'int*' would be translated to [int => {set=>1}]

'int[]' to [array => {of => "int"}],

'(int*)[]' to [array => {of => [int => {set=>1}]}]

'(int[])*' to [array => {set => 1, of => "int"}]

'(int|int[])[]' to [array => {of => {either => {of => ["int", [array => {of => "int"}]]}}}]

and so on. (These shortcuts, along with some other goodies, will be in the next release of Data::Schema.)

I ended up using Regexp::Grammars for this, since I'm already on Perl 5.10. Regexp::Grammars is basically just a syntactic sugar and helper on top of Perl 5.10's regex which is already capable of doing recursive matching. Parse::RecDescent is also equally easy to use, but I prefer the simpler interface of Regexp::Grammars. I quickly gave up on Parse::Yapp and Parse::Eyapp though. Not that they're no good, it's just that they are too much trouble for my particularly simple need.

Anyway, parsing is fun, as long as--like anything else in life--it doesn't get too complicated :-) I just need to find other small use cases where I can do some more parsing...