Senin, 14 Februari 2011

Backup data dengan dengan Perl, rsync, dan git

Saat ini, saya menyimpan data pribadi di 2 direktori utama: ~/repos dan ~/media. Semua file-file teks (termasuk source code, website, catatan/tulisan, konfigurasi, agenda .org Emacs) ditaruh di bawah ~/repos di dalam repo-repo git, per proyek (Contoh: ada ~/repos/settings, ~/repos/writings, ~/repos/perl-Git-Bunch, dsb). Semua file lain yang berupa file media besar-besar ditaruh di ~/media.

Untuk membackup data di ~/media, saya menggunakan File::RsyBak, yang menyediakan skrip command-line rsybak. Skrip ini pada dasarnya hanyalah wrapper untuk perintah rsync dan membuat snapshot-snapshot backup sesuai jangka waktu histori yang diinginkan (defaultnya: 7 harian + 4 mingguan + 3 bulanan). Skrip ini dijalankan tiap hari lewat cron dan backupnya disimpan di harddisk terpisah /backup.

Untuk membackup data di ~/repos, saya menggunakan Git::Bunch, yang menyediakan skrip command-line gitbunch. Pada dasarnya, gitbunch membackup menggunakan rsync juga, tapi tanpa histori (karena git sudah menyimpan sejarah perubahan). Selain itu, yang dibackup juga hanya subdirektori .git/ dari tiap repo. Ini mengirit ruang disk, karena ~/repos masih sering saya kopi ke flashdisk yang kapasitasnya terbatas. Untuk merestore dari backup, kita tinggal melakukan "git checkout" dari hasil backup .git/ tiap repo ini.

Skrip gitbunch juga dapat melakukan sinkronisasi dari satu direktori ~/repos ke direktori ~/repos lainnya. Pada intinya, "gitbunch sync" hanyalah wrapper untuk "git pull". Dengan cara ini, saya bisa mensinkronkan pekerjaan PC ke netbook atau sebaliknya dengan mudah.

Artikel yang lebih mendetil, pernah ditulis untuk majalah InfoLINUX: Manajemen data pribadi dengan git.

Bagaimana strategi backup Anda?

Senin, 07 Februari 2011

The coming bloated Perl apps?

A few weeks ago, I got annoyed by the fact that one of our command line applications was getting slower and slower to start up (the delay was getting more and more noticable), so I thought I'd do some refactoring, e.g. split large files into smaller ones and delay loading modules until necessary.

Sure enough, one of the main causes of the slow start up was preloading too many modules. Over the years I had been blindly sticking 'use' statements into our kitchen sink $Proj::Utils module, which was used by almost all scripts in the project. Loading $Proj::Utils alone pulled in over 60k lines from around 150 files!

After I split things up, it became clearer which modules are particularly heavy. This one stood out:

% time perl -MFile::ChangeNotify -e1
real 0m0.972s


% perl -MDevel::EndStats -e1
# Total number of module files loaded: 129
# Total number of modules lines loaded: 46385


So almost 130 files and a total of 45k+ lines just from loading File::ChangeNotify alone. 130 files just for a filesystem monitoring routine! Who would've thought that a filesystem monitor needs so many lines of program? Compare with, say, a recent HTTP client:

% perl -MHTTP::Tiny -e1
# Total number of module files loaded: 18
# Total number of modules lines loaded: 6089


I quickly switched to Linux::Inotify2 and things are much better now (but I might have to revisit this since we want to give the new Debian/kFreeBSD a Squeeze).

As I suspected (since the module is written by Dave Rolsky also), File::ChangeNotify utilizes Moose, which is not particularly lightweight either:

% time perl -MMoose -e1
real 0m0.712s


% perl -MDevel::EndStats -MMoose -e1
# Total number of module files loaded: 100
# Total number of modules lines loaded: 35760


Compare with:

% time perl -MMouse -e1
real 0m0.089s


% perl -MDevel::EndStats -MMouse -e1
# Total number of module files loaded: 20
# Total number of modules lines loaded: 6675


Come to think of it, running Dist::Zilla is also quite painfully slow these days. Just running "dzil foo" pulled in around 60k lines and took 1.7s! Of course, dzil is Moose-based.

While it is a good thing that Moose is getting more popular, it's a bit shameful to see that Ruby and Python scripts "get OO for free" while Moose scripts have to endure a 0.7s startup penalty. Mouse, Moo, Role::Basic come to the rescue but I wonder what would Ruby/Python programmers think (you have how many object systems?? Why do you people can never agree on one thing and TIMTOWTDI everything?)

Disclaimer: Number of lines includes all blanks/comment/POD/DATA/etc from all files loaded in %INC, actual SLOC is probably significantly less. Timing is done on a puny HP Mininote netbook (Atom N450 1.66GHz) which I'm currently stuck with in the past few weeks. With all due respects to all authors of modules mentioned. They all write fantastic, working code.