[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Jonas Maebe jonas.maebe at elis.ugent.be
Sat Jan 5 13:26:51 CET 2013


On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:

> On Sat, 5 Jan 2013, Jonas Maebe wrote:
> 
>> 
>> On 05 Jan 2013, at 12:53, Paul Ishenin wrote:
>> 
>>> ResourceStrings are stored as AnsiString type with 0 codepage (as I remember). Delphi now stores ResourceStrings as UnicodeString type. I think FPC will follow this in m_default_unicodestring modeswitch.
>> 
>> It would probably even be better to always do that. At least I don't see a
>> downside, other than slightly larger binaries (and that's not an issue in
>> this case as far as I'm concerned; maintaining two separate resourcestring
>> systems/handlers is just not worth the trouble).
> 
> But it means that for
> 
> Resourcestring
>  AString = 'Something';
> 
> Var
>  S : Ansistring;
> 
> begin
>  S:=AString;
> end.
> 
> Always a conversion will happen.
> 
> I do not think this is a good idea given that currently, String = Ansistring.

String will always be shortstring or ansistring in the syntax modes in which that is currently the case. And yes, it will involve a conversion in that case. Just like every single constant string assignment to an ansistring in 2.6.x in case the constant string contains non-ASCII characters and is part of a {$codepage xxx} file (because those strings are all stored as unicodestring in the program there).

Then again, it will also involve a conversion if the implementation using ansistrings is fixed to supported non-ASCII resourcestrings and the system codepage is different from the code page in which the resource string has been stored by the compiler. In fact, it will then cause two conversions on most systems (few systems can directly transcode from arbitrary code page X to arbitrary code page Y; most use UTF-16 as intermediate format, although some can probably also use UTF-8).

Yes, the exception is probably UTF-8 on Unix systems, but is that really worth it to complicate the compiler and RTL? Resourcestings are generally not used in performance-critical code, I'd assume. Always using UTF-8 is however also fine for me, btw. I just don't believe it is worth the trouble to support both unicodestring and ansistring resourcestrings.


Jonas


More information about the fpc-devel mailing list