[fpc-devel] new string - question on usage

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Oct 11 08:11:47 CEST 2011


Jonas Maebe schrieb:
> On 10 Oct 2011, at 22:11, Luiz Americo Pereira Camara wrote:
> 
>> 1- Most of LCL must be code page agnostic, so not use
>> UTF8String/AnsiString directly (keep String)
> 
> There is no difference between ansistring and string in {$mode
> delphi} and {$mode objfpc}.

You obviously missed that the new AnsiString type has an encoding, with
implicit conversions when strings of different codepages are passed to
subroutines or stored in variables. An AnsiString on one machine may
have a different encoding on a machine with a different user locale.
When a string contains UTF-8, its encoding must be set to UTF-8 as well,
otherwise implicit conversions will result in garbage - as observed by
the OP.

> In a future delphiunicode mode or
> something like that string will be unicodestring, but that's not
> "code-page agnostic" either. The only somewhat code page agnostic
> string type is RawByteString.

RawByteString can be used only for pass-through strings in subroutines,
which have string arguments, but do not manipulate these arguments
themselves. We could start to find out all subroutines and methods of
that type...

All "const" and "var" parameters also may deserve special
considerations. When the encoding of an "const" string can not be
changed, as required, then local copies must be used. I'm not sure of
the implementation of "var" parameters right now, because a user may not
be happy when a called subroutine changes the encoding of his string
variable. But more probably implicit conversions may be inserted, before
and after the subroutine call, resulting in very inefficient code.


For all these reasons Delphi has choosen UTF-16 for the new generic
string type, where encoding conversions are required *really* only when
explicit AnsiStrings are used, e.g. in records or legacy code. IMO the
FPC and Lazarus should take the same step, sooner or later. The few 
situations, where an OS or library (widgetset...) API requires UTF-8 
encoded strings, should have no noticeable runtime impact.

DoDi





More information about the fpc-devel mailing list