UTF-8 Everywhere is a good idea.
In particular, see their advice on how to do text on Windows. It is possible to follow their advice manually.
This morning, I thought of a utility I could write very easily in any scripting language, but decided I would implement it in modernish C++. In writing the utility, I thought I should take advantage of the standalone version of boost::nowide so as to minimize the amount of code I’d need to write to make sure it could handle command line arguments including fancy characters in both Windows and *nixy environments.
One of the facilities this library provides is nowide::args. It “temporarily replaces standard main() function arguments with their equal, but UTF-8 encoded values under Microsoft Windows for the lifetime of the instance.”
The class uses
GetCommandLineW()
,CommandLineToArgvW()
andGetEnvironmentStringsW()
in order to obtain Unicode-encoded values. It does not relate to actual values of argc, argv and env under Windows.
This is not wrong per se, but it interacts badly with another dimension of handling command line arguments on Windows: cmd.exe
does not do glob expansion. Instead, if you want prog *.txt
to give you file1.txt
, file2.txt
, etc in argv
, you need to explicitly link with setargv.obj
or wsetargv.obj
. That way, the runtime sets up an expanded argv
using either the OEM charset or the “Unicode” charset depending on whether the program has a main
or wmain
.
Since boost::nowide::args
bypasses the actual argv
, but instead reparses the “Unicode” version of the command line as originally given, it is oblivious to the now expanded arguments. Since there is no Win32 API function you can call to the filename expansion on the result of CommandLineToArgvW()
(at least, I could not find it), this means the Windows version of my utility will need to have a wmain
instead of main
.
I’ve written about fixing this in MoarVM a few years ago and submitted a PR. When I first read about boost::nowide::args
, I thought it was going to help me avoid the need to engage in various contortions. Unfortunately, it seems like if you do want file name expansion in command line arguments, you cannot use boost::nowide::args
(or its standalone equivalent).
It sure is not rocket surgery, but disappointing nevertheless.
I am going to include a few examples to illustrate the problems I mentioned here.
No filename expansion in cmd
Consider the following C program:
#include <stdio.h>
int main(int argc, char *argv[])
{for (int i = 1; i < argc; ++i)
{
puts(argv[i]);
}return 0;
}
Compile it using:
C:\Temp> cl t.c
and now run it in cmd
:
C:\Temp> t t.*
t.*
Now, open a Cygwin or Git Bash shell and try again without re-compiling:
$ ./t t.*
t.c
t.c.swp
t.exe
t.obj
Link with setargv.obj
for filename expansion
Now, let’s recompile:
C:\Temp> cl t.c /link setargv.obj
and try again in cmd.exe
:
C:\Temp> t t.*
t.c
t.c.swp
t.exe
t.obj
Can’t handle “funny” characters
In cmd
:
C:\Temp> dir /b k*
kârlı.txt
C:\Temp> t k*
kΓrli.txt
No file name expansion with nowide::args
Let’s try this minimal program:
#include <nowide/args.hpp>
#include <nowide/iostream.hpp>
int
int argc, char* argv[])
main(
{
nowide::args a(argc, argv);"With 'nowide::args'\n";
nowide::cout <<
for (int i = 1; i < argc; ++i) {
'\n';
nowide::cout << argv[i] <<
}
return 0;
}
Compile using:
cl /EHsc /DUNICODE /D_UNICODE /MD /Ic:\...\opt\include t.cpp /link setargv.obj c:\...\opt\lib\nowide.lib Shell32.lib
In cmd
:
After 'nowide::args'
k*
In bash
:
$ ./t k*
After 'nowide::args'
kârlı.txt
Let’s make a simple modification by deleting the instantiation of the nowide::args
object:
#include <nowide/args.hpp>
#include <nowide/iostream.hpp>
int
int argc, char* argv[])
main(
{"Without 'nowide::args'\n";
nowide::cout <<
for (int i = 1; i < argc; ++i) {
'\n';
nowide::cout << argv[i] <<
}
return 0;
}
Compile using the same command line and run in cmd
:
C:\Temp> t k*
Without 'nowide::args'
k�li.txt
So, why do we want to use nowide::args
anyway? Simple:
C:\Temp> t kârlı.txt
Without 'nowide::args'
k�li.txt
whereas:
C:\Temp> t kârlı.txt
With 'nowide::args'
kârlı.txt
Conclusion
I want the utility I am writing to both handle filenames containing non-OEM characters and have the benefit of file name expansion in command line arguments. Therefore, I can’t take advantage of nowide::args
and will need to ensure the entry point for the Windows version is wmain
and will need to handle the UTF-8 encoding of argv
myself.