[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
DrDiettrich1 at aol.com
Fri Nov 28 21:15:10 CET 2014
Michael Schnell schrieb:
> On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:
>> An *efficient* implementation would be based on a single program-wide
>> string representation, with different encodings being handled only in
>> an exchange with external data sources.
> Yep. But it would result in severe user code portability issues (see
> above). IMHO using DynamicString at the correct locations would not be
> (noticeably) less efficient but a lot more versatile.
You suggested to use "string" as UTF-16 on Windows, and UTF-8 on Linux.
That's what I understand as a unique program-wide string representation
(not sourcecode-wide, instead program as *compiled*). Then I cannot see
any need or use for another DynamicString type.
> I also don't think we will ever see a fix for the poor implementation of
> RawByteString (avoiding the word flaw and the suggestion of a bad
> purpose), because it would brake existing user code.
Nothing can be broken, as long as the Delphi behaviour is undefined.
Code relying on specific compiler/library bugs is bound to that
compiler, not portable in any way.
> Regarding fpc, "correcting the flaws" and keeping the name RawByteString
> would result in incompatibility issues vs Delphi and breaking code that
> will be ported from Delphi.
Same as above. When application code works properly with strings of
*sometimes* different static and dynamic encoding, it will not stop
working with strings of *never* different encodings.
Of course the opposite is not true. When some code works properly (only)
with strings of the same static and dynamic encoding, it will stop
working when compiled with Delphi. Then the coder has to insert explicit
checks for the dynamic encoding of *all* strings, all over his code.
Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means
that it's obviously easier to *prevent* possibly different
static/dynamic encodings, instead of *checking and reacting* on such
flaws throughout the entire codebase. Apart from that, every
encoding-tolerant code will execute much slower than code without a need
for checks and conversions everywhere.
I seriously doubt that the FPC developers ever realized these
consequences, and the amount of time required for finding, reporting and
fixing the bugs in all affected pieces of their code :-(
> That is why fpc would need to define an additional type name (e.g
> "DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00) for a
> decently usable type for intermediately holding a String content.
This again would make *FPC* programs incompatible with Delphi. While
fixing the RawByteString flaw would at least allow to *compile* FPC code
with Delphi, the use of an different encoding value would definitely
prevent compilation of such code with Delphi. What's the more serious
> RawXxxString can be used for really "uncoded" data as done with
> old-style strings in a lot of applications.
Such a feature would be appreciated by many users, indeed :-)
More information about the fpc-devel