[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Hans-Peter Diettrich DrDiettrich1 at aol.com
Fri Nov 28 21:15:10 CET 2014


Michael Schnell schrieb:
> On 11/27/2014 03:44 PM, Hans-Peter Diettrich wrote:

>> An *efficient* implementation would be based on a single program-wide 
>> string representation, with different encodings being handled only in 
>> an exchange with external data sources.
> Yep. But it would result in severe user code portability issues (see 
> above). IMHO using DynamicString at the correct locations would not be 
> (noticeably) less efficient but a lot more versatile.

You suggested to use "string" as UTF-16 on Windows, and UTF-8 on Linux. 
That's what I understand as a unique program-wide string representation 
(not sourcecode-wide, instead program as *compiled*). Then I cannot see 
any need or use for another DynamicString type.


> I also don't think we will ever see a fix for the poor implementation of 
> RawByteString (avoiding the word flaw and the suggestion of a bad 
> purpose), because it would brake existing user code.

Nothing can be broken, as long as the Delphi behaviour is undefined. 
Code relying on specific compiler/library bugs is bound to that 
compiler, not portable in any way.

> Regarding fpc, "correcting the flaws" and keeping the name RawByteString 
> would result in incompatibility issues vs Delphi and breaking code that 
> will be ported from Delphi.

Same as above. When application code works properly with strings of 
*sometimes* different static and dynamic encoding, it will not stop 
working with strings of *never* different encodings.

Of course the opposite is not true. When some code works properly (only) 
with strings of the same static and dynamic encoding, it will stop 
working when compiled with Delphi. Then the coder has to insert explicit 
checks for the dynamic encoding of *all* strings, all over his code.

Applied to FPC/Lazarus code (compiler, libraries, IDE...) this means 
that it's obviously easier to *prevent* possibly different 
static/dynamic encodings, instead of *checking and reacting* on such 
flaws throughout the entire codebase. Apart from that, every 
encoding-tolerant code will execute much slower than code without a need 
for checks and conversions everywhere.

I seriously doubt that the FPC developers ever realized these 
consequences, and the amount of time required for finding, reporting and 
fixing the bugs in all affected pieces of their code :-(

> That is why fpc would need to define an additional type name (e.g 
> "DynamicString") and encoding brand number (e.g. "CP_ANY" = $FF00) for a 
> decently usable type for intermediately holding a  String content.

This again would make *FPC* programs incompatible with Delphi. While 
fixing the RawByteString flaw would at least allow to *compile* FPC code 
with Delphi, the use of an different encoding value would definitely 
prevent compilation of such code with Delphi. What's the more serious 
incompatibility?


> RawXxxString can be used for really "uncoded" data as done with 
> old-style strings in a lot of applications.

Such a feature would be appreciated by many users, indeed :-)

DoDi




More information about the fpc-devel mailing list