Previously, I discovered something that seemed odd to me. Running the following script in a cmd.exe window on Windows told me that PERLIO_F_CRLF
was set on the bottom-most ‘unix’ layer:
#!/usr/bin/env perl
use strict;
use warnings;
use YAML::XS;
print Dump [
map {
my $x = defined($_) ? $_ : '';
$x =~ s/\A([0-9]+)\z/sprintf '0x%08x', $1/eg;
$x;
} PerlIO::get_layers(STDOUT, details => 1)
];
Output:
---
- unix
- ''
- 0x01205200
- crlf
- ''
- 0x00c85200
You can find the flag values in my previous post.
At the time, I wasn’t sure what to make of this. However, while trying to figure out where this flag gets set, I noticed something. perliol seems to be very clear:
“unix”
A basic non-buffered layer which calls Unix/POSIX
read()
,write()
,lseek()
,close()
. No buffering. Even on platforms that distinguish betweenO_TEXT
andO_BINARY
this layer is alwaysO_BINARY
. (emphasis mine)
This statement seems to imply that PERLIO_F_CRLF
should never be set on a ‘unix’ layer. Not even on Windows.
Note that if I open a simple file, and check that filehandle, the CRLF flag is not set on the bottom-most ‘unix’ layer:
open my $fh, '>', 'test';
print Dump [
map {
my $x = defined($_) ? $_ : '';
$x =~ s/\A([0-9]+)\z/sprintf '0x%08x', $1/eg;
$x;
} PerlIO::get_layers($fh, details => 1)
];
Output:
---
- unix
- ''
- 0x00201200
- crlf
- ''
- 0x00405200
Now, I do not know if changing this would fix the UTF-8 display problem in cmd.exe, but it seems to me that it should given that the problem only shows up in cmd.exe set to code page 65001.
I just need to figure out where this flag gets set. Any pointers?
Update
The standard streams are initialized in PerlIO_stdstreams
:
void
PerlIO_stdstreams(pTHX)
{
dVAR;
if (!PL_perlio) {
PerlIO_init_table(aTHX);
PerlIO_fdopen(0, "Ir" PERLIO_STDTEXT);
PerlIO_fdopen(1, "Iw" PERLIO_STDTEXT);
PerlIO_fdopen(2, "Iw" PERLIO_STDTEXT);
}
}
Note:
#ifdef PERLIO_USING_CRLF
#define PERLIO_STDTEXT "t"
#else
#define PERLIO_STDTEXT ""
#endif
PerlIO_fdopen
calls PerlIO_openn
which somehow gets to apply layers. Eventually, we find ourselves in PerlIOUnix_pushed
which leads to PerlIOBase_pushed
where we have:
while (*mode) {
switch (*mode++) {
case '+':
l->flags |= PERLIO_F_CANREAD | PERLIO_F_CANWRITE;
break;
case 'b':
l->flags &= ~PERLIO_F_CRLF;
break;
case 't':
l->flags |= PERLIO_F_CRLF;
break;
So, PERLIO_STDTEXT
is "t"
on Windows, and this leads to l->flags |= PERLIO_F_CRLF;
, but that doesn’t explain why the bottom-most ‘unix’ layer on STDOUT
in cmd.exe has this flag set whereas the same type of layer on a plain filehandle does not.
PerlIOUnix_pushed
ends with:
PerlIOBase(f)->flags |= PERLIO_F_OPEN;
Given the guarantee expressed in the documentation, maybe there should also be a:
PerlIOBase(f)->flags &= ~PERLIO_F_CRLF;
Hmmmmm …
Another update
I guess that was a red herring. I built a perl
with that line. The flags value is “fixed”, but the output problem remains.