[fpc-devel] Unicode proceedings

Thu Nov 17 02:43:01 CET 2011

Marco van de Voort schrieb:

> Note that the Delphi2009 definition is theoretically capable of combining one and
> two bytes in one type (like Yury's). Afaik there is no consensus why
> Embarcadero kept the two types separate, though I can think of several
> reasons:
> 
> - performance
> - backwards compatibility (and thus the hurdle to upgrade)
> - While normal code would probably work with an unified type, the big
>   amounts of code that typecast strings, mess with temps or use strings as
>   buffers would cause problems. Maybe they tried, and it was problematic.

The major point is performance. With separate single and double byte 
characters/strings the compiler can insert the required conversions 
before every call. When instead a subroutine has to deal with mixed 
arguments, or adds strings from other places (literals...), any number 
of implicit conversions have to be added to the subroutine code, or all 
combinations must be handled in explicit code.

It also turned out that the best performance (and least code) can be 
achieved with only one string type and encoding, so that new (Delphi) 
code should only use UnicodeStrings. The AnsiStrings are supported 
merely for backwards compatibility (legacy code), and should be used 
only with native encoding - everything else will not be supported by the 
Delphi RTL.

In detail Delphi will not support an immediate conversion between Ansi 
codepages, instead a conversion provides UTF-16 first, which then is 
re-converted into the target codepage. This makes conversions very 
expensive. The only possible (expression) optimization again is based on 
UTF-16, where all sub-expressions are converted into UTF-16, so that 
only one more re-conversion is required when the result is stored.

DoDi