Tony Cook provided some insight about what might be going on regarding extraneous trailing output when cmd.exe is set to code page 65001.
It turns out there is a bug in Windows’ WriteFile function: When console is set to code page 65001, it reports the number of characters written instead of bytes. So, for example,:
perl -e "print qq{\xce\xb1\xce\xb2\xce\xb3123}"
sends 9 bytes to output. However, WriteFile
reports that it wrote only 6 bytes (which is the number of characters in the string "αβγ123"
) and therefore it goes back and outputs the last three bytes again:
C:\> perl -e "print qq{\xce\xb1\xce\xb2\xce\xb3123}"
αβγ123123
The relevant code is in win32io.c:
238 SSize_t
239 PerlIOWin32_write(pTHX_ PerlIO *f, const void *vbuf, Size_t count)
240 {
241 PerlIOWin32 *s = PerlIOSelf(f,PerlIOWin32);
242 DWORD len;
243 if (WriteFile(s->h,vbuf,count,&len,NULL))
244 {
245 return len;
246 }
247 else
248 {
249 PerlIOBase(f)->flags |= PERLIO_F_ERROR;
250 return -1;
251 }
252 }
Now, looking at the documentation for WriteFile, we have:
The
WriteFile
function returns when one of the following conditions occur:
- The number of bytes requested is written.
- A read operation releases buffer space on the read end of the pipe (if the write was blocked). For more information, see the Pipes section.
- An asynchronous handle is being used and the write is occurring asynchronously.
- An error occurs.
Further reading suggests to me that if WriteFile
is being used for synchronous IO, as PerlIO seems to be doing, then either count
bytes will be successfully written, or the function will return with an error (broken pipes also result in error returns).
So, might the issue be fixed simply by ensuring PerlIOWin32_write
always returns count
, disregarding what WriteFile
returns?
To investigate this, I recompiled my brand new perl 5.20.0
after changing PerlIOWin32_write
to:
SSize_t
PerlIOWin32_write(pTHX_ PerlIO *f, const void *vbuf, Size_t count)
{
PerlIOWin32 *s = PerlIOSelf(f,PerlIOWin32);
DWORD len;
if (WriteFile(s->h,vbuf,count,&len,NULL))
{
return count;
}
else
{
PerlIOBase(f)->flags |= PERLIO_F_ERROR;
return -1;
}
}
Well, long story short, it doesn’t work. In fact, the erroneous output above was generated with that binary.
Incidentally, for some reason CRLF translation flag is still being set for the bottom-most unix
layer on STDOUT
:
C:\> perl -MYAML::XS -MPerlIO::Layers=get_layers -e "print Dump get_layers(\*STDOUT)"
---
- unix
- ~
- - OPEN
- TRUNCATE
- CRLF
- CANWRITE
---
- crlf
- ~
- - FASTGETS
- TRUNCATE
- LINEBUF
- CRLF
- CANWRITE
Given the WriteFile
bug, this seems to be a red herring, but it still bugs me.
Coming back to the topic at hand, note the following simple C program:
C:\> type tt.c
#include <stdio.h>
int main(void) {
char x[] = {
0xce, 0xb1, /* α */
0xce, 0xb2, /* β */
0xce, 0xb3, /* γ */
49, 50, 51, 0 /* 123 */
};
printf("\n%d\n", printf("%s", x));
return 0;
}
C:\> chcp
Active code page: 65001
C:\> tt
αβγ123
9
Compare that to the following version:
#define WIN32_LEAN_AND_MEAN
#include <stdio.h>
#include <string.h>
#include <windows.h>
int main(void) {
DWORD n;
HANDLE out = GetStdHandle(STD_OUTPUT_HANDLE);
char x[] = {
0xce, 0xb1, /* α */
0xce, 0xb2, /* β */
0xce, 0xb3, /* γ */
49, 50, 51, 0 /* 123 */
};
WriteFile(out, x, strlen(x), &n, NULL);
printf("\n%u\n", n);
return 0;
}
Output:
C:\> tx
αβγ123
6
So, while WriteFile
cannot count, it is not responsible for the repeated output. Why, then, is ignoring the number of bytes reported in PerlIOWin32_write
not solving the problem?