At first glance, the spurious trailing output perl
produces in cmd.exe Window set to code page 65001 seems to be trivially explainable by the WriteFile bug brought to my attention by Tony Cook.
After all, look at this pattern:
C:\> chcp 65001
C:\> perl -e "print qq{\xce\xb1a}" 3 bytes, 2 characters
αaa
C:\> perl -e "print qq{\xce\xb1\xce\xb1a}" 5 bytes, 3 characters
ααa�a
C:\> perl -e "print qq{\xce\xb1\xce\xb1aa}" 6 bytes, 4 characters
ααaaaa
Then, you hit this one:
C:\> perl -e "print qq{\xce\xb1\xce\xb1\xce\xb1a}" 4 characters, 7 bytes
αααaαaa
If WriteFile
reporting the number of characters written instead of bytes were the sole culprit, one would expect to see the original string of “αααa” and the last three bytes, i.e. “αa”, displayed. Instead, the extra output consists of “αaa”. Why is that? Probably because the sequence of events goes:
- Send seven bytes (representing the four character string “αααa”)
- All seven bytes are output, but we are told of only four
- Send three bytes (representing the two character string “αa”)
- All three bytes are output, but we are told of only two
- Send one more by (representing the single character “a”)
- We are told one byte was written, and it indeed was
So, what’s the problem?!
I got this from a perl
where I modified PerlIOWin32_write to ignore what WriteFile
says, and always return the count
argument it was passed!
Yes, I searched: PerlIOWin32_write
seems to be the only relevant place from which WriteFile
is called.
As the short example at the end of my previous post shows, it is not WriteFile
that keeps looping in response the confusion between numbers of bytes and characters written. If it were, then pushing extra layers onto STDOUT
would not eliminate the problem of extraneous output:
#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
use PerlIO::Layers qw( get_layers );
use YAML::XS;
binmode STDOUT, ':unix:encoding(utf8):crlf';
print qq{αβγabc};
print Dump get_layers(\*STDOUT);
C:\> perl g.pl
αβγabc--- ← Note correct output
- unix
- ~
- - CANWRITE
- OPEN
- TRUNCATE
- CRLF
---
- crlf
- ~
- - CANWRITE
- LINEBUF
- TRUNCATE
- FASTGETS
- CRLF
---
- unix
- ~
- - CANWRITE
- OPEN
- TRUNCATE
---
- encoding
- utf8
- - CANWRITE
- LINEBUF
- UTF8
- TRUNCATE
- FASTGETS
---
- crlf
- ~
- - CANWRITE
- LINEBUF
- UTF8
- TRUNCATE
- FASTGETS
- CRLF
- WRBUF
Don’t get me wrong, as I showed before, the WriteFile
bug is real.
But, the fact that spurious output persists when perl
is compiled to ignore what WriteFile
reports, and disappears when I push extra layers onto STDOUT
seems to suggest something else might be in play as well.
That CRLF flag on the bottom-most Unix layer keeps bothering me, too.