Selasa, 05 Januari 2010

Parsing with Perl

When it comes to text processing and manipulation, I suspect that the majority of Perl programmers depend on their regex skill to do the job. So much that it becomes their hammer. At least it is mine, because I whip up regexes for almost anything, from web scraping to converting dictionaries.

Which is a pity because there are other useful techniques, like parsing.

Admittedly, parsing is a bit harder (to do correctly), and libraries for parsing most formats out there, from XML to YAML, from CSV to PPI, already exist anyway. So ironically it's even harder to find useful, practical applications for parsing that are easy to implement. Most parser tools seem to invariably give the tired, if not overused, calculator example (i.e., parsing simple mathematical expression like '3 * (2 + 4)').

In the recent years, I've also have only encountered one instance where I want to do some parsing. Two weeks ago I thought it would be nice for my Data::Schema module to provide some shortcuts.

'int*' would be translated to [int => {set=>1}]

'int[]' to [array => {of => "int"}],

'(int*)[]' to [array => {of => [int => {set=>1}]}]

'(int[])*' to [array => {set => 1, of => "int"}]

'(int|int[])[]' to [array => {of => {either => {of => ["int", [array => {of => "int"}]]}}}]

and so on. (These shortcuts, along with some other goodies, will be in the next release of Data::Schema.)

I ended up using Regexp::Grammars for this, since I'm already on Perl 5.10. Regexp::Grammars is basically just a syntactic sugar and helper on top of Perl 5.10's regex which is already capable of doing recursive matching. Parse::RecDescent is also equally easy to use, but I prefer the simpler interface of Regexp::Grammars. I quickly gave up on Parse::Yapp and Parse::Eyapp though. Not that they're no good, it's just that they are too much trouble for my particularly simple need.

Anyway, parsing is fun, as long as--like anything else in life--it doesn't get too complicated :-) I just need to find other small use cases where I can do some more parsing...

Tidak ada komentar:

Poskan Komentar