Some background
The first programming language I truly loved was C. It took a while to get there: As a child, I started with Z80 assembly on the venerable ZX Spectrum.
Those were the days when you actually owned your computer, and you did not need permission from Apple, Google, or Microsoft, or anyone else to write a program. I learned about paging memory blocks in an out of the CPU’s address space on the beautiful 128K ZX Spectrum +2. It was only in college that I was able to get my hands on an IBM PC. I played around with spreadsheets, tried to fix existing Fortran programs, patched up keyboard drivers with Turkish characters, fiddled with Pascal a bit. After that, during my stint at the Central Bank of Turkey, I was introduced to SQL, and, APL.
I kept hearing about C, but I did not have access to a C compiler. I would have to wait until I arrived at Cornell to have access to a Unix account, and compile my first hello.c
. A little later, I had my first PC … I installed DJGPP on the DOS partition, built my first Linux kernel (my first distro was Debian), and started learning C. Plauger’s “Standard C Library” was my favorite book.
By the time I became comfortable with C, C++ had been around for more than 10 years. So, it seemed like it would a natural next step.
Except … Well, except that C++ was a mess. This was a time when everyone was enamoured by inheritance hierarchies, and they were all writing elaborately designed string classes. Most hard drives were not fast enough to compile reasonable C++ libraries in finite time (OK, I do exaggerate a bit), most CPUs melted trying to instantiate templates, and most people who pretended to be C++ programmers were C programmers who liked to cast the return value of malloc.
At the time, I was busy trying to build networked custom economics experiments, and Java seemed to have the edge. At least, it did not take an act of Congress to slap together a dialog, make a few socket connections, and have the your application build and run on a variety of systems. Of course, both AWT and Swing were ugly and cumbersome, but, for my purposes it didn’t matter.
Still, if I just could not run my experiments outside the lab where I had configured all the computers so the Java applications would run without issues. So, I slapped FreeBSD on one of the 100Mhz Pentium machines with 16Mb memory that were collecting dust in the corner, built Apache with mod_perl, and went to work. That’s when I fell in love with Perl.
That love springs completely from pragmatic reasons. It is not that I believe Perl is particularly beautiful, but, then, I don’t think many languages are. Every single one of them has its warts.
Perl has always minimized the amount of work I have to do to solve a particular problem. Some of this is due to language features, but most of it is due to CPAN.
For instance, for a Perl programmer, parsing HTML as HTML is a solved problem. All I have to do is to decide if I want to construct the whole treee, or if I am OK with a streaming approach. The former is advantageous in some circumstances, but the latter has the benefit of keeping memory requirements to a minimum which, even in this day and age, can still matter if you are dealing with HTML documents that take megabytes. Either way, these tools do not choke on invalid HTML, or valid HTML that is not XML.
And, Perl offers portability. So long as I don’t need operating system specific features, my code can run without any modification whereever Perl can run.
When I write classes, I write them for encapsulation, not to build elaborate architecture.
C++ renaissance
For the past few years, C++ has been going through a renaissance. A lot of smart people have focused on providing C++ programmers with building blocks both as part of the work of the ISO committee, and as part of the growing influence of boost.
In the real world, it is still the case that 90% of the mortals who claim to be C++ programmers are C programmers who don’t realize new
is still a valid identifier in C. In that regard, C++ is very similar to Perl: Most people who type up Perl programs also don’t realize Perl is not C, or Java, or Python, or shell, or Awk, or, take your pick.
But, when you read the new stuff that is being included in the new C++ standards, and the pace at which compilers are actually implementing those features, it is hard not to be intrigued by the possibilities they represent.
A word counting exercise
This is a simple exercise that does not depend on external libraries in C++ or Perl, so it is a good place to start.
Here is the Perl version for your reference:
#!/usr/bin/env perl
use strict;
use warnings;
@ARGV);
run(\
sub run {
my $argv = shift;
my @counts;
for my $file ( @$argv ) {
my $count = -1;
eval {
$count = word_count($file);
1;
or warn "$@";
}
push @counts, {
$file,
file => $count,
word_count =>
};
}
for my $result (@counts) {
printf "%s: %d words\n", $result->{file}, $result->{word_count};
}
}
sub word_count {
my $file = shift;
my %words;
open my $fh, '<', $file
or die "Cannot open '$file': $!";
while (my $line = <$fh>) {
my @words = split ' ', $line;
$words{ $_ } += 1 for @words;
}
close $fh;
my $word_count;
$word_count += $_ for values %words;
return $word_count;
}
And, here is my best effort at translating this to modern-looking C++. I did not try to write particularly efficient code: Just like with Perl, I put the emphasis on writing code that feels most natural to me, while making sure both programs do roughly the same thing.
#include
#include
#include
#include
#include
#include
#include
#include
#include
using std::accumulate;
using std::cerr;
using std::cout;
using std::endl;
using std::ifstream;
using std::make_pair;
using std::pair;
using std::strerror;
using std::string;
using std::unordered_map;
using std::vector;
int word_count(const char *const file) noexcept(false);
int main(int argc, char *argv[]) {
vector< pair > counts {};
for (auto i = 1; i < argc; i += 1) {
try {
counts.push_back(make_pair(argv[i], word_count(argv[i])));
} catch (const string& e) {
cerr << e << endl;
counts.push_back(make_pair(argv[i], -1));
}
}
for (auto& result : counts) {
cout << result.first << ": " << result.second << " words" << endl;
}
return 0;
}
int
word_count(const char *const file) noexcept(false) {
errno = 0;
ifstream fp(file);
{
// Does fp.fail() preserve errno?
int save_errno = errno;
if (fp.fail()) {
throw("Cannot open '" + string(file) + "': " + strerror(save_errno));
}
}
unordered_map word_count {};
string word;
while (fp >> word) {
word_count[word] += 1;
}
fp.close();
return accumulate(
word_count.cbegin(),
word_count.cend(),
0,
[](int sum, auto& el) { return sum += el.second; }
);
}
20 lines dedicated to #include
and using
statements might look excessive, but I hate using namespace std
, and I hate typing std::
constantly … Mostly because I like shorter lines.
First thing to notice is there is no explicit memory allocation in sight. Containers manage their own memory.
Second, and this is a biggie: We have autovivification! unordered_map word_count {}; string word;
while (fp >> word) {
word_count[word] += 1;
}
And, third, we have lambdas: return accumulate( word_count.cbegin(), word_count.cend(), 0, [](int sum, auto& el) { return sum += el.second; } ); Behind the scenes, accumulate
initializes an internal variable with 0, and calls the anonymous function specified as its last argument with its current value, and the next element in word_count
.
Now, I have to admit, I don’t know exactly which feature appeared when, but this all works with Microsoft Visual C++ 2015 RC: Microsoft seems to be finally catching up with the latest developments in the field.