[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Wed Nov 26 19:54:54 CET 2014

Michael Schnell schrieb:
> On 11/26/2014 11:40 AM, Mattias Gaertner wrote:
>> Ansistring supports only one byte per character code pages. 
> 
> Even more confused. Am I wrong thinking that with code aware Strings,  
> for Delphi XE compatibility, in Windows CP_ACP needs to be UTF16 (if not 
> right, than due later) ?

Delphi XE does not properly support UTF-8. CP_ACP seems to depend on 
western/far-eastern versions, where the western version assumes and 
allows for any SBCS; I don't know of the same in far-east versions.
The SBCS restriction allows to simplify standard string handling and 
conversions, because every character (=byte) can be exchanged in place. 
UTF-8 doesn't fit into this picture, because it's a MBCS.

UTF-16 is not a valid value for CP_ACP in Delphi, because it's a 2-byte 
encoding. Even if the Delphi architects may have thought about an common 
string type, with a variable element size (1,2,4), this certainly turned 
out soon as a stupid idea, so that AnsiString and 
WideString/UnicodeString still are strictly distinct types. WideString 
and UnicodeString imply UTF-16, with platform specific byte order 
(endianness). The latter becomes important almost only to compiler and 
library coders, in host/network byteorder conversions. For the sake of 
completeness, pdp-11 processors use yet another byte order, maybe more 
word-based processors (DG...) as well.

DoDi