[fpc-devel] Unicode support (yet again)

Luiz Americo Pereira Camara luizmed at oi.com.br
Fri Sep 16 04:49:45 CEST 2011


On 15/9/2011 23:11, Luiz Americo Pereira Camara wrote:
> On 15/9/2011 12:21, Felipe Monteiro de Carvalho wrote:
>>
>> Lazarus is literally being forced to implement it's own RTL...
>>
>> With the currently planned Unicode RTL it will just get worse, we will
>> then need to either migrate to UnicodeString
>
> No.
>
> Lazarus can continue to use UTF-8.
>
> Just there will be an implicit conversion when using those functions. 
> The overhead is minimum.
>
> Lazarus is doing  the (explicit) conversion already for some functions 
> with no problems at all. In fact the code will be clearer

Take the example of FileExists:

The current LCL implementation - the UTF8 -> UTF16 conversion is done 
with the need of auxiliary code:


FileGetAttr_     : function (const FileName: String): Longint = 
@FileGetAttrWide;

function FileGetAttrWide(const FileName: String): Longint;
begin
   
Result:=Integer(Windows.GetFileAttributesW(PWideChar(UTF8Decode(FileName))));
end;

function FileExistsUTF8(const Filename: string): boolean;
var
   Attr: Longint;
begin
   Attr:=FileGetAttrUTF8(FileName);
   if Attr <> -1 then
     Result:= (Attr and FILE_ATTRIBUTE_DIRECTORY) = 0
   else
     Result:=False;
end;

function FileGetAttrUTF8(const FileName: String): Longint;
begin
   Result:=FileGetAttr_(FileName);
end;


The pure UTF-16 RTL: one implicit conversion is done conversion with 
clean/direct code


function FileExists(const Filename: unicodestring): boolean;
var
   Attr: Longint;
begin
   Attr:=FileGetAttr(FileName);
   if Attr <> -1 then
     Result:= (Attr and FILE_ATTRIBUTE_DIRECTORY) = 0
   else
     Result:=False;
end;

function FileGetAttr(const FileName: unicodeString): Longint;
begin
   Result:=Integer(Windows.GetFileAttributesW(PWideChar(FileName)));
end;



The duplicate UTF8/UTF16 RTL - a conversion still necessary but 
internally and explicit

function FileExists(const Filename: utf8string): boolean;
var
   Attr: Longint;
begin
   Attr:=FileGetAttr(FileName);
   if Attr <> -1 then
     Result:= (Attr and FILE_ATTRIBUTE_DIRECTORY) = 0
   else
     Result:=False;
end;

function FileGetAttr(const FileName: utf8String): Longint;
begin
   
Result:=Integer(Windows.GetFileAttributesW(PWideChar(UTF8Decode(FileName))));
end;


So:

- It will be a conversion anyway as is current. In fact current is worse 
because of temporary var created by chained typecasts/function calls
- The double multi RTL UTF8/UTF16 only adds extra code with no 
performance gain
- The fpc core/Marco proposition will help Lazarus have clearer/smaller 
code (at least regarding those RTL functions , TStrings etc is other 
story and where the real problem resides)

Luiz

>
>
> A conversion when calling those functions is necessary anyway because:
>
> - if LCL changes to UnicodeString the conversion will be needed in 
> unix (UTF-16 -> UTF-8)
> - creating own UTF8 RTL functions will do the conversion internally 
> anyway
>
> So no problem at all with the proposition of Marco
>
> Luiz
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>




More information about the fpc-devel mailing list