Back in May, starting with UTF-8 output from Perl and C programs in cmd.exe on Windows 8, I wrote a few posts summarizing my bewilderment with extra bytes appearing in UTF-8 output from perl
when cmd.exe codepage was set to UTF-8 via chcp 65001
.
The same problem still exists with the perl 5.20.1 I built recently.
So, I decided to see what perl6, using MoarVM, can give me.
In the same cmd.exe Window, I typed:
C:\Temp> perl6 -e "Buf.new(0xce, 0xb1, 0xce, 0xb2, 0xce, 0xb3, 0x31).decode('UTF-8').say" αβγ1
or, in a script:
use v6;
'αβγ1'.say;
which gave me the output:
αβγ1
It may not seem like much, but remember that the Perl script:
use utf8;
use strict;
use warnings;
use warnings qw(FATAL utf8);
binmode STDOUT, ':utf8';
print 'αβγ1', "\n";
outputs
αβγ1
1
Let’s see now:
.say for (
"Hava karlı",
"Bu iş kârlı",
"İstanbul",
"Yağ yağ yağmur");
Hava karlı
Bu iş kârlı
İstanbul
Yağ yağ yağmur
And, then,
say "karlı" eq "kârlı";
False
which means perl6
understands the difference between snowy and profitable.
Of course, so does perl5
… but, when it comes to printing, it has issues:
use 5.020;
use utf8;
binmode STDOUT, ":utf8";
print for "karlı", "kârlı";
gives the output:
karlıkârlılı�