[fpc-devel] Unicode support (again)

Vincent Snijders vsnijders at vodafonevast.nl
Tue Nov 11 15:26:30 CET 2008


Jonas Maebe schreef:
> 
> On 10 Nov 2008, at 17:00, Vincent Snijders wrote:
> 
>> procedure TForm1.Button1Click(Sender: TObject);
>> var
>>  w: widestring;
>>  i: integer;
>> begin
>>  w := UTF8Decode('hallo äöü');
>>  Edit1.Caption := UTF8Encode(w);
> 
> Note that if the file has been saved using an UTF-8 BOM, then the 
> compiler will at compile time create a widestring containing the UTF-16 
> version of 'hallo äöü'. If you then pass this to a function expecting an 
> ansistring (such as UTF8Decode), then the widestring manager will be 
> used to decode that string and this decoded string will be passed to 
> UTF8Decode. So then you'll pass an ansi-encoded string to UTF8Decode 
> rather than an UTF-8-encoded string (unless ansi = utf-8 for the current 
> execution).
> 

Yes, that might be confusing. Therefore I don't recommend to save if with an 
UTF8-BOM of compile -Fcutf-8

> It seems much more advisable to me to save the file with an UTF-8 BOM, 
> or even better to add {$encoding utf-8} (and/or to pass -Fcutf-8 to the 
> compiler) and then just use
> 
> Edit1.Caption := UTF8Encode('hallo äöü');

As an extra bonus of not adding the UTF-8 BOM, you don't have to use conversions to 
assign the UTF8 string in the source, translated by the compiler to a UTF16 string, 
to an UTF8 encoded ansistring. It saves a conversion at compile time and a 
conversion at run time.

Edit1.Caption := 'hallo äöü'.

Is there an explicit way to tell the compile not to convert widestring string 
constants, even if the file contains an UTF-8 BOM? The UTF-8 BOM might be usefull, 
if you want to edit the file with another text editor.

Vincent



More information about the fpc-devel mailing list