[fpc-devel] Unicode proceedings
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Thu Nov 17 02:43:01 CET 2011
Marco van de Voort schrieb:
> Note that the Delphi2009 definition is theoretically capable of combining one and
> two bytes in one type (like Yury's). Afaik there is no consensus why
> Embarcadero kept the two types separate, though I can think of several
> reasons:
>
> - performance
> - backwards compatibility (and thus the hurdle to upgrade)
> - While normal code would probably work with an unified type, the big
> amounts of code that typecast strings, mess with temps or use strings as
> buffers would cause problems. Maybe they tried, and it was problematic.
The major point is performance. With separate single and double byte
characters/strings the compiler can insert the required conversions
before every call. When instead a subroutine has to deal with mixed
arguments, or adds strings from other places (literals...), any number
of implicit conversions have to be added to the subroutine code, or all
combinations must be handled in explicit code.
It also turned out that the best performance (and least code) can be
achieved with only one string type and encoding, so that new (Delphi)
code should only use UnicodeStrings. The AnsiStrings are supported
merely for backwards compatibility (legacy code), and should be used
only with native encoding - everything else will not be supported by the
Delphi RTL.
In detail Delphi will not support an immediate conversion between Ansi
codepages, instead a conversion provides UTF-16 first, which then is
re-converted into the target codepage. This makes conversions very
expensive. The only possible (expression) optimization again is based on
UTF-16, where all sub-expressions are converted into UTF-16, so that
only one more re-conversion is required when the result is stored.
DoDi
More information about the fpc-devel
mailing list