[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Michael Schnell mschnell at lumino.de
Fri Nov 28 14:43:34 CET 2014


On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote:
> Michael Schnell schrieb:
>>  E.g. there are (are least two "Code pages" for UTF-16 ("LE", and 
>> "BE"), that would be worth supporting.
>
> You are confusing codepages and encodings :-(
That is why I put "goose-feet" around "Code pages". I used this wording 
because fpc (and Delphi ?) uses it abbreviated as "CP" in the constant 
name "CP_UTF-8",  "CP_UTF16" and "CP_UTF16BE) [ see Jonas post: 
"CP_UTF16 and CP_UTF16BE can be returned by StringCodePage() when called 
on a unicodestring, and that's it." ]


>
> See it as a multi-level protocol for text processing. ....
Yep. I see that is is workable and I understand the (supposedly mostly 
historical) reasons. But IMHO not a good (i.e. crafted from ground up) 
concept.

>
> It's known that the Delphi AnsiString implementation is flawed,...
And hence it's frustrating to see that fpc needs to follow for 
compatibility reasons. That is why I suggested an improved 
implementation (see -> 
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support). 
While the seriously flawed Delphi compatible use of the dynamic 
encoding-brand (and bytes-per element) information (only implemented 
with  RawByteString) can be left at it is and a decent implementation 
with a new DynmicString Type (CP_ANY) should be crafted.

>
> I see no problem in using the same names and values. Delphi documents 
> clearly state: ...
I fear that there will be code that relies on the "flawed" behavior of 
RawByteString ("it's a feature, not a bug") and using the same name with 
different behavior would brake same. And a really usable DynmicString 
would not adhere to  that description.

-Michael



More information about the fpc-devel mailing list