HTML::TableExtract is beautiful

And, it will help you save time and make money ;-)

I was motivated to post this because of another one of those Stackoverflow questions. I decided at the outset not to answer that question because the poster basically wants a job done for him for free:

I need the script to get the HTML, parse the table then to save the content (User + Online time), I would also want it to run every 15 mins and to make a report in the end of the day.

However, a so-called answer stated:

in my opinion perl can get a little ugly.

does it need to be perl….if it does ot i would recommend python.

Of course, I am kinda used to people proclaiming Perl sucks, but the supreme irony of the ugliness of the post asserting Perl’s ugliness motivated me.

HTML::TableExtract is beautiful. Over the years, it has saved me a lot of time, and even helped me make some money.

So, consider the Personal Income table available from the Bureau of Economic Analysis.

Let’s say I want to get the Unemployment Insurance row out of that table. Here’s how you do it using HTML::TableExtract:

#!/usr/bin/env perl

use strict; use warnings;
use HTML::TableExtract;

my $te = HTML::TableExtract->new(
    attribs => { id => 'tbl' },
);

# local copy of
# http://bea.gov/iTable/iTableHtml.cfm?reqid=9&step=3&isuri=1&903=58

$te->parse_file('personal-income.html');

my ($table) = $te->tables;

for my $row ($table->rows) {
    my ($undef, $label, @row) = @$row;
    next unless defined $label;
    if ($label eq 'Unemployment insurance') {
        print "$label\t@row\n";
    }
}

And, here is the output:

C:\temp> uu Unemployment insurance 101.1 127.9 144.8 148.7 152.8 137.4 135.8 128.7 117.5 108.8 103.0 100.1

Of course, things can be refined, but this is pretty beautiful.

HTML::TableExtract is beautiful

A. Sinan Unur

April 16, 2012