[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Michael Schnell
mschnell at lumino.de
Fri Nov 28 14:43:34 CET 2014
On 11/27/2014 07:29 PM, Hans-Peter Diettrich wrote:
> Michael Schnell schrieb:
>> E.g. there are (are least two "Code pages" for UTF-16 ("LE", and
>> "BE"), that would be worth supporting.
>
> You are confusing codepages and encodings :-(
That is why I put "goose-feet" around "Code pages". I used this wording
because fpc (and Delphi ?) uses it abbreviated as "CP" in the constant
name "CP_UTF-8", "CP_UTF16" and "CP_UTF16BE) [ see Jonas post:
"CP_UTF16 and CP_UTF16BE can be returned by StringCodePage() when called
on a unicodestring, and that's it." ]
>
> See it as a multi-level protocol for text processing. ....
Yep. I see that is is workable and I understand the (supposedly mostly
historical) reasons. But IMHO not a good (i.e. crafted from ground up)
concept.
>
> It's known that the Delphi AnsiString implementation is flawed,...
And hence it's frustrating to see that fpc needs to follow for
compatibility reasons. That is why I suggested an improved
implementation (see ->
http://wiki.freepascal.org/not_Delphi_compatible_enhancement_for_Unicode_Support).
While the seriously flawed Delphi compatible use of the dynamic
encoding-brand (and bytes-per element) information (only implemented
with RawByteString) can be left at it is and a decent implementation
with a new DynmicString Type (CP_ANY) should be crafted.
>
> I see no problem in using the same names and values. Delphi documents
> clearly state: ...
I fear that there will be code that relies on the "flawed" behavior of
RawByteString ("it's a feature, not a bug") and using the same name with
different behavior would brake same. And a really usable DynmicString
would not adhere to that description.
-Michael
More information about the fpc-devel
mailing list