[fpc-devel] String handling in trunk (was utf8 in 2.6.0)

Michael Van Canneyt michael at freepascal.org
Sat Jan 5 15:01:20 CET 2013



On Sat, 5 Jan 2013, Sven Barth wrote:

> On 05.01.2013 14:16, Michael Van Canneyt wrote:
>> 
>> 
>> On Sat, 5 Jan 2013, Jonas Maebe wrote:
>> 
>>> 
>>> On 05 Jan 2013, at 13:10, Michael Van Canneyt wrote:
>>> 
>>>> On Sat, 5 Jan 2013, Jonas Maebe wrote:
>>>> 
>>>>> 
>>>>> On 05 Jan 2013, at 12:53, Paul Ishenin wrote:
>>>>> 
>>>>>> ResourceStrings are stored as AnsiString type with 0 codepage (as I
>>>>>> remember). Delphi now stores ResourceStrings as UnicodeString type.
>>>>>> I think FPC will follow this in m_default_unicodestring modeswitch.
>>>>> 
>>>>> It would probably even be better to always do that. At least I don't
>>>>> see a
>>>>> downside, other than slightly larger binaries (and that's not an
>>>>> issue in
>>>>> this case as far as I'm concerned; maintaining two separate
>>>>> resourcestring
>>>>> systems/handlers is just not worth the trouble).
>>>> 
>>>> But it means that for
>>>> 
>>>> Resourcestring
>>>>  AString = 'Something';
>>>> 
>>>> Var
>>>>  S : Ansistring;
>>>> 
>>>> begin
>>>>  S:=AString;
>>>> end.
>>>> 
>>>> Always a conversion will happen.
>>>> 
>>>> I do not think this is a good idea given that currently, String =
>>>> Ansistring.
>>> 
>>> String will always be shortstring or ansistring in the syntax modes in
>>> which that is currently the case. And yes, it will involve a
>>> conversion in that case. Just like every single constant string
>>> assignment to an ansistring in 2.6.x in case the constant string
>>> contains non-ASCII characters and is part of a {$codepage xxx} file
>>> (because those strings are all stored as unicodestring in the program
>>> there).
>> 
>> Judging by all the code that I have written during 14 years, there would
>> never be a single conversion necessary.
>> This system would force them on me for every single use.
>> 
>> I do not think that the support of both ansi/unicode string resources is
>> such a burden that it justifies that.
>> 
>> I admittedly have limited knowledge of compiler internals, but I cannot
>> imagine that being able to store them in 2 formats (ansi and some form
>> of unicode) is more than a matter of maintaining 1 flag per string, and
>> writing a word instead of a byte.
>> 
>> All the other code, needed for conversions depending on codepage and
>> whatnot settings, is necessary anyway.
>
> You forget also the code necessary to translate resourcestrings (at runtime). 
> Currently the ResourceString related code inside rtl/objpas/objpas.pp only 
> handles AnsiString and then this would need to be adjusted so that 
> UnicodeString can also be handled. For example there will be the need for a 
> "SetResourceStrings" overload with a UnicodeString based TResourceIterator.

No, I had I though of that. 
It will need to be changed anyhow, and fell under "is necessary anyway", 
since we'll need some kind of backwards-compatibility mechanism.

Michael.



More information about the fpc-devel mailing list