Having grown up in a society evolved beyond the confines of 7-bit ASCII and lived through the nightmare of codepages as well as cursed the illiterate that have so little to say they can manage with 26 letters in their alphabet, I was pleased when I read Joel Spolsky’s tutorial on Unicode. It was a relief – finally somebody understood.
Years passed and I thought I knew now how to do things right. And then I had to do Windows C in anger and was lost in the jungle of wchar_t and TCHAR and didn’t know where to turn.
Finally I found this resource here:
http://utf8everywhere.org/
And the strategies outlined to deal with UTF-8 in Windows are clear:
- Define UNICODE and _UNICODE
- Don’t use wchar_t or TCHAR or any of the associated macros. Always assume std::string and char * are UTF-8. Call Wide windows APIs and use boost nowide or similar to widen the characters going in and narrowing them coming out.
- Never produce any text that isn’t UTF-8.
Do note however that the observations that Windows would not support true UTF-16 are incorrect as this was fixed before Windows 7.