[fpc-devel] new string - question on usage

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Oct 11 08:52:33 CEST 2011


Martin schrieb:

> just for how to do
> 
> procedure foo(x: utf8string); begin end;
> 
> var a: string; //ansistring, but contains already utf8

The encoding will be stored or converted when a string is assigned to 
that variable. When the FPC implementation is finished, it should be 
impossible to have strings stored with a wrong encoding.

> foo(a); // do not convert

Why not?


>>> And what happens if an app did read data from some external source 
>>> (serial port) and then wants to declare what encoding it is?
>> http://docwiki.embarcadero.com/VCL/en/System.SetCodePage
>>
>>
> I hadn't seen that.
> 
> That may help. Though not the best solution...

It does *not* help, because SetCodePage does a string *conversion*, when 
it really changes the encoding. Delphi even had allowed to convert 
between UTF-16 (CP 1200) and other (byte oriented) encodings, but later 
disallowed such in-place conversions again. Now an UTF-16 (Delphi 
default) string is *always* converted, when it's passed to a subroutine 
expecting an RawByteString argument.

> I can call it before calling the "foo" proc. But I must revert it 
> afterwards, or at sometime later, the string will be translated, when it 
> will be used in a normal string again (yet expected to keep being utf8..

IMO the only chance for fixing a wrong encoding is a TBytes (or similar) 
buffer, then copy the string content into it (without translation), and 
read it back specifying the correct encoding.

> Yes, I know, what i want to do, is not what it was designed for. 
> ultimately a huge update to the entire source will be needed... but now 
> I need a temporary solution until then

You don't need a temporary solution, until the new strings are perfectly 
implemented in FPC. Afterwards you only have to take care for reading 
strings from *external* sources, where you have to specify the correct 
external encoding - see e.g. 
http://docwiki.embarcadero.com/VCL/en/Classes.TStrings.LoadFromStream
with the added Encoding argument.

When you want a variable to contain strings of a specific encoding, e.g. 
UTF-8, you simply give it the appropriate type. I assume that an 
UTF8String type will be declared like AnsiString<cpUTF8>, with 
appropriate constants being declared for the standard codepages.

DoDi




More information about the fpc-devel mailing list