Kamis, 26 November 2009

More flexible configuration merging with Data::PrefixMerge

(I'm planning a refactoring for Data::PrefixMerge and will be renaming it to Data::ModeMerge. Thought I'd post something on the blog.)

In a typical Unix program, there are three levels of configuration: system-wide config file (/etc/myapp.conf), per-user config file (~/.myapprc), and command-line options. It's convenient programatically to load each of those in a hash and then merge (e.g. using Data::Merger or Hash::Merge) system-wide hash with the per-user hash, and then merge again the result with the command-line hash to get the a single hash as the final configuration. Your program can from there on deal with this just one hash instead of three.

In a typical merging process between two hashes (left-side and right-side), when there is a conflicting key, then the right-side key will override the left-side. This is usually the desired behaviour in our said program as the system-wide config is there to provide defaults, and the per-user config (and the command-line arguments) allow a user to override those defaults.

But suppose that the user wants to unset a certain configuration setting that is defined by the system-wide config? She can't do that unless she edits the system-wide config (in which she might need admin rights), or the program allows the user to disregard the system-wide config. The latter is usually what's implemented by many Unix programs, e.g. the -noconfig command-line option in mplayer. But this has two drawbacks:

  1. a slightly added complexity in the program. The program needs to provide a special, extra comand-line option.
  2. the user loses all the default settings in the system-wide config. What she needed in the first place was to just unset a single setting (a single key-value pair of the hash).

Here's where Data::PrefixMerge comes in. It provides a so-called DELETE mode.

prefix_merge({foo=>1, bar=>2}, {"!foo"=>undef, bar=>3, baz=>1});

will result ini:

{bar=>3, baz=>1}

The ! prefix tells Data::ModeMerge to do a DELETE mode merging. So the final result will lack the foo key.

On the other hand, what if the system admin wants to protect a certain configuration setting from being overriden by the user or the command-line? This is useful in a hosting or other retrictive environment where we want to limit users' freedom to some degree. This is possible via the KEEP mode (prefix ^):

prefix_merge({"^bar"=>2, "^baz"=>1}, {bar=>3, "!baz"=>0, quux=>7});

will result in:

{bar=>2, baz=>1, quux=>7}

effectively protecting bar and baz from being overriden/deleted/etc.

Aside from the two mentioned modes, there are also a few others available by default: ADD (prefix +), CONCAT (prefix .), SUBTRACT (prefix -), as well as the plain ol' NORMAL/override (optional prefix *).

You can add other modes by writing a mode handler module. (planned in upcoming Data::ModeMerge release)

You can change the default prefixes for each mode if you want. You can disable each mode individually. (planned in upcoming Data::ModeMerge release)

You can default to always using a certain mode, like the NORMAL mode, and ignore all the prefixes, in which case Data::ModeMerge will behave like most other merge modules.

You can change default mode, prefixes, disabling/enabling mode, etc on a per-hash basis using the so-called options key.

Selasa, 24 November 2009

WEHT Emperl?

Speaking of templating systems in the previous post, I also got reminded of Embperl, and wondered why it didn't get more popular. I remember back in 1998-1999 enjoying working with Embperl before moving on to Mason. It has a nice syntax, and one very nice feature: (HTML- and/or URL-)escaping output by default. Say you have in $foo "<script>evil()</script>" then this template:

[+ $foo +]

will output "&lt;script&gt;evil()&lt;/script&gt;". You are protected from XSS by default. And if you want to turn off this escaping, you can set EMBPERL_ESCMODE to 0, or, do this:

[+ do { local $escmode = 0; $foo } +]

But then maybe this is akin to what earlier versions of PHP attempted to do with default magic_quotes_gpc and magic_quotes_runtime set to on. These two default configuration have helped spread the backslashitis/toothpick syndrome all over the web and are currently deprecated (and will be removed in PHP 6.0). A majority of PHP programmers apparently never understood the need of these escaping, and got confused/mad by the insistence of PHP to add those pestering backslashes. And most would turn off the configuration, or add a routine to reverse the escaping at the beginning of their programs.

So is the moral of the story: do not overprotect programmers (especially ignorant ones)? Or don't try to fix the problem the wrong way? Or both?

Perl program that generates... Perl programs

Data::Schema is a module that I wrote to do data structure validation (using another data structure acting as a schema). I had not been terribly happy with the validation speed, ranging in about 100 validations/sec for schemas that are only barely complex. This is not bad, actually, because Data::FormValidator also performs more or less the same. But that was below expectation.

Last week I suddenly realized, writing data validator or a templating system is essentially writing a [mini] language, with our schema/template as the miniprogram and our [Perl] program as the interpreter. To make our miniprogram faster, we can compile it into Perl.

FastTemplate from PHP quickly comes to mind. I remember that it was advertised as being fast because it converts the template into PHP code (e.g. into a bunch of echo's, if/else's, and for loops). And I think there must be at least half a dozen Perl-based template languages on CPAN doing the same. Although I'm not sure about how common data validators compile schemas into Perl code.

So anyway, starting from 0.12, Data::Schema can convert schema into Perl subroutines and use it to validate data structures. Validation speed jumps about one order of magnitude faster, and I'm a happy guy again.

Selasa, 17 November 2009

Who's using your module?

To see who on CPAN are using your module (distribution), go to this page: http://cpants.perl.org/dist/used_by/Your-DistName

So I casually entered my distributions, being happily surprised that there are people using them.

If you're like me, and I'm sure thousands of other CPAN authors, you're putting stuffs on CPAN either for fun, or for your own use. Knowing other people are using your module gives you some perspective. I was happily breaking/removing/refactoring stuffs as I please, but now I'll need to think twice everytime.

Merge key support in YAML

None of the many Perl YAML modules support merge key:

$ cat merge.yaml
foo: 1
bar: 2
baz: 3

$ perl -MYAML::Tiny -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
baz: 3
bar: 2
foo: 1

$ perl -MYAML -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
baz: 3
bar: 2
foo: 1

$ perl -MYAML::Syck -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
baz: 3
bar: 2
foo: 1

$ perl -MYAML::XS -MFile::Slurp -e'print Dump Load(scalar read_file "merge.yaml")'
baz: 3
bar: 2
foo: 1

However, good ol' Ruby handles it just fine:

$ ruby -ryaml -e'print YAML::load(File.open("merge.yaml").read).to_yaml'
baz: 3
foo: 1
bar: 2

Kinda strange since Ruby's yaml and YAML::Syck both use why's syck library, which has not been updated for quite a while.

Selasa, 10 November 2009

Dist::Zilla for Module::Starter Users: A 2-Minute Guide

So you're a module author. You typically do this when starting a new distribution:

$ module-starter --module=Foo::Bar --author="Your Name" --email="you@example.com"

and then hack away under the resulting Foo-Bar directory. Easy enough right?

Problems are:

  • too much generated boilerplate code and text;
  • lack of automation for the building and release process.

I'm sure you have experienced one or more of these:

  • Having to search+replace copyright year in every file;
  • Forgetting to "make clean" or remove backup files before creating tarball;
  • Forgetting to update MANIFEST;
  • Feeling tired and bored of all the tedious and laborious tasks;
  • Wondering if there is a better way.

You need a distribution builder like Dist::Zilla. It helps:

  • eliminate a lot of duplicate text;
  • automate updating MANIFEST;
  • automate generating README;
  • build tarball;
  • automate a lot of other stuffs;
  • upload to CPAN;
  • and more.

Dist::Zilla is flexible and has a lot of plugin/plugin bundles, but it can be less straightforward to use it. Here's a simple step-by-step guide you can follow.

  1. Install these modules from CPAN:

    • Dist::Zilla
    • Dist::Zilla::Plugin::PodWeaver
    • Dist::Zilla::Plugin::ReadmeFromPod

  2. Create ~/.dzil/config.ini, containing:

    author = Your Name
    copyright_holder = Your Name
    initial_version = 0.01

    user = YOUR-PAUSE-ID
    password = YOUR-PAUSE-PASSWORD

  3. Now, instead of using module-starter, you run dzil new to start your new distribution:

    $ dzil new Foo-Bar

    Instead of a bunch of files under Foo-Bar/, there's now just one file, dist.ini:

    name = Foo-Bar
    version = 0.01
    author = youruser
    license = Perl_5
    copyright_holder = youruser


    There's currently a small bug in Dist::Zilla not supplying the correct author/copyright_holder, so edit dist.ini, as well as for adding some lines:

    name = Foo-Bar
    version = 0.01
    author = Your Name <you@example.com>
    license = Perl_5
    copyright_holder = Your Name


  4. Create the most basic distribution structure:

    $ mkdir -p lib/Foo t

    Put some test files into t/. Default tests generated from module-starter like 00-load.t might be a good start.

    Create Changes file. You can copy paste from the one generated by module-starter:

    0.01    2009-11-11
    First version, released on an unsuspecting world.

    As for lib/Foo/Bar.pm, here's what I use for template. You can just fill out the [[...]] parts:

    package Foo::Bar;
    # ABSTRACT: [[Abstract of module]]

    use strict;
    use warnings;

    [[YOUR CODE]]

    =head1 SYNOPSIS

    =head1 DESCRIPTION



    It's much simpler and shorter than what module-starter generates. Some POD sections like VERSION, NAME, AUTHOR, LICENSE AND COPYRIGHT are deliberately omitted. They will be generated by Dist::Zilla later when building the distro.

  5. Hack away. Write your code in lib/Foo/Bar.pm. Add some tests in t/. Add other files when needed.

  6. To test the distribution, run dzil test.

  7. To build the distribution, run dzil build. This will create Foo-Bar-0.01.tar.gz which contains all the necessary goodies of a standard classic distribution, like README, LICENSE, MANIFEST, META.yml, etc.

  8. To release the distribution, run dzil release. This will upload your module to CPAN. Sweet!

  9. To release a new distribution, just update version number in dist.ini and $VERSION in your main module file. Don't forget to add an entry to Changes. Repeat dzil test, build, release.

In the future "dzil new" might allow creating a more complete skeleton.

There are lots of other nice things Dist::Zilla can do for you, like checking Changes file, do automatic version numbering, etc. Welcome to the nice world of Dist::Zilla!

Comments/corrections are welcome.

Regex Editor in Padre 0.50

Thanks to Gábor Szabó, we now have a regex editor in Padre. Yay! Sure it's really basic at the moment, but it's a start.

The Padre Features page says this about the regex editor: "A tool that allows the less experienced users as well to build and debug a regular expressions".

Well, if your inspiration is a tool to *teach* regex, then maybe. But in general I don't quite agree on the "less experienced users" emphasis. Regex is a minilanguage, and a featureful IDE/editor and debugger helps beginners as well as experienced users a lot. Would you say Padre is a tool for the less experienced Perl programmers? Up to a certain point, it's becoming really hard/cumbersome to debug rather long and complex regular expressions.

Personally I've only used Rx Toolkit in Komodo. Its interface is simple and not as colorful as some Windows regex editors on the market, but it does the job well. If I can cite a wishlist for Rx Toolkit, it would just be to add position indicator (line number + column + offset) in each of regex/string/matches subwindow. Being able to step regex is nice too I guess, but not essential. Usually my regexes aren't *that* long/complex.

Of course the future for Padre's regex editor is to handle Perl 6 grammars, which will be a full-fledged language in itself, so there *should* be step over/trace into/watch/etc debugger features for it.

Selasa, 03 November 2009

Creating shortcuts for long Perl module names

My main interface with the computer is the shell, the terminal, the keyboard. Thus I love shortcuts. My .bash_profile is littered with one- and two-letter shortcuts like:

alias m='mplayer -fixed-vo -osdlevel 3'
alias mn='mplayer -fixed-vo -osdlevel 3 -nosound'
alias m11='m -speed 1.1'
alias m12='m -speed 1.2'
alias m20='m -xy 2'
alias m21='m20 -speed 1.1'

And when aliases are not enough, because they do not work outside the shell (like in KDE's mini commander), I create scripts. My ~/bin is also littered with alias commands like:

# k
exec konsole "$@"

As well as my homedir, project directories, and the whole filesystem ornamented with short symlinks to here and there.

$ ln -s public p
$ ln -s proj/perl pp
$ ln -s sites/steven.builder.localdomain/www .

When I have long Perl module names like Spanel::API::Account::Shared, Spanel::API::Account::XenVPS, I would also like to create shortcuts for these.

There are several modules to do this. I ended up using Package::Alias because I like the interface. Other ones I tried include aliased and namespace::alias.

Suppose I want to create a shortcut of SAAS for Spanel::API::Account::Shared (which can export foo). All I need to do is:

# in SAAS.pm
use Spanel::API::Account::Shared;
use Package::Alias 'SAAS'=>'Spanel::API::Account::Shared';

Now aside from these:

$ perl -MSpanel::API::Account::Shared -e'Spanel::API::Account::Shared::foo()'
$ perl -MSpanel::API::Account::Shared=foo -e'foo()'

All of below work too:

$ perl -MSAAS -e'Spanel::API::Account::Shared::foo()'
$ perl -MSAAS -e'SAAS::foo()'
$ perl -MSAAS=foo -e'foo()'

My fingers thank Joshua Keros, the author of Package::Alias.

The while(1) {...last...} construct

In the past year or two I've been comfortably using this construct in Perl as well as in PHP (and probably others too):

while (1) {
do { warn error 1; last } if not OK;

do { warn error 2; last } if not OK;

do { warn error 3; last } if not OK;



I prefer it over the one below:

if (!OK) {
warn error 1;
} else {
if (!OK) {
warn error 2;
} else {
if (!OK) {
# finally

The while (1) { ... last ... } style is clearer and avoids extraneous indentation. The only thing biting me in the past when using this construct is that sometimes I forgot to add the final last, resulting in an infinite loop. But this series-of-checks-and-early-bail pattern happens so often in my code that the construct quickly became second nature.

Anyone got the same habit, or perhaps using some alternative (like the new given-when)?