[fpc-devel] lazarus bug report + fix: Utf8ToUnicode doesn't work correctly

Jonas Maebe jonas at zeus.ugent.be
Wed May 4 12:07:41 CEST 2005


On 4 mei 2005, at 12:04, Michael Van Canneyt wrote:

>> It contains a fixed version of the Utf8ToUnicode function. Since it 
>> is part of
>> the rtl, I close this lazarus issue and send you this message. I did 
>> not test
>> the fixed version.
>
> The files in the zip file are not usable; They're in some unicode 
> format, which
> I can't use nor check on Linux.

They're plain UTF-8. I know for a fact there are editors under Linux 
which support that (at least emacs does, and it would surprise me 
immensely if vim doesn't). Anyway, here's the plain ascii version of 
the "-fixed" file.


Jonas

function Utf8ToUnicode(Dest: PWideChar; MaxDestChars: SizeUInt; Source: 
PChar; SourceBytes: SizeUInt): SizeUInt;
   var
     i,j : SizeUInt;
     w: SizeUInt;
     b : byte;
   begin
     if not assigned(Source) then
     begin
       result:=0;
       exit;
     end;
     result:=SizeUInt(-1);
     i:=0;
     j:=0;
     if assigned(Dest) then
       begin
         while (j<MaxDestChars) and (i<SourceBytes) do
           begin
             b:=byte(Source[i]);
             w:=b;
             inc(i);
             // 2 or 3 bytes?
             if b>=$80 then
               begin
                 w:=b and $3f;
                 if i>=SourceBytes then
                   exit;
                 // 3 bytes?
                 if (b and $20)<>0 then
                   begin
                     b:=byte(Source[i]);
                     inc(i);
                     if i>=SourceBytes then
                       exit;
                     if (b and $c0)<>$80 then
                       exit;
                     w:=(w shl 6) or (b and $3f);
                   end;
                 b:=byte(Source[i]);
                 w:=(w shl 6) or (b and $3f);
                 if (b and $c0)<>$80 then
                   exit;
                 inc(i);
               end;
             Dest[j]:=WideChar(w);
             inc(j);
           end;
         if j>=MaxDestChars then j:=MaxDestChars-1;
         Dest[j]:=#0;
       end
     else
       begin
         while i<SourceBytes do
           begin
             b:=byte(Source[i]);
             inc(i);
             // 2 or 3 bytes?
             if b>=$80 then
               begin
                 if i>=SourceBytes then
                   exit;
                 // 3 bytes?
                 b := b and $3f;
                 if (b and $20)<>0 then
                   begin
                     b:=byte(Source[i]);
                     inc(i);
                     if i>=SourceBytes then
                       exit;
                     if (b and $c0)<>$80 then
                       exit;
                   end;
                 if (byte(Source[i]) and $c0)<>$80 then
                   exit;
                 inc(i);
               end;
             inc(j);
           end;
       end;
     result:=j+1;
   end;





More information about the fpc-devel mailing list