[fpc-devel] lazarus bug report + fix: Utf8ToUnicode doesn't work correctly
Jonas Maebe
jonas at zeus.ugent.be
Wed May 4 12:07:41 CEST 2005
On 4 mei 2005, at 12:04, Michael Van Canneyt wrote:
>> It contains a fixed version of the Utf8ToUnicode function. Since it
>> is part of
>> the rtl, I close this lazarus issue and send you this message. I did
>> not test
>> the fixed version.
>
> The files in the zip file are not usable; They're in some unicode
> format, which
> I can't use nor check on Linux.
They're plain UTF-8. I know for a fact there are editors under Linux
which support that (at least emacs does, and it would surprise me
immensely if vim doesn't). Anyway, here's the plain ascii version of
the "-fixed" file.
Jonas
function Utf8ToUnicode(Dest: PWideChar; MaxDestChars: SizeUInt; Source:
PChar; SourceBytes: SizeUInt): SizeUInt;
var
i,j : SizeUInt;
w: SizeUInt;
b : byte;
begin
if not assigned(Source) then
begin
result:=0;
exit;
end;
result:=SizeUInt(-1);
i:=0;
j:=0;
if assigned(Dest) then
begin
while (j<MaxDestChars) and (i<SourceBytes) do
begin
b:=byte(Source[i]);
w:=b;
inc(i);
// 2 or 3 bytes?
if b>=$80 then
begin
w:=b and $3f;
if i>=SourceBytes then
exit;
// 3 bytes?
if (b and $20)<>0 then
begin
b:=byte(Source[i]);
inc(i);
if i>=SourceBytes then
exit;
if (b and $c0)<>$80 then
exit;
w:=(w shl 6) or (b and $3f);
end;
b:=byte(Source[i]);
w:=(w shl 6) or (b and $3f);
if (b and $c0)<>$80 then
exit;
inc(i);
end;
Dest[j]:=WideChar(w);
inc(j);
end;
if j>=MaxDestChars then j:=MaxDestChars-1;
Dest[j]:=#0;
end
else
begin
while i<SourceBytes do
begin
b:=byte(Source[i]);
inc(i);
// 2 or 3 bytes?
if b>=$80 then
begin
if i>=SourceBytes then
exit;
// 3 bytes?
b := b and $3f;
if (b and $20)<>0 then
begin
b:=byte(Source[i]);
inc(i);
if i>=SourceBytes then
exit;
if (b and $c0)<>$80 then
exit;
end;
if (byte(Source[i]) and $c0)<>$80 then
exit;
inc(i);
end;
inc(j);
end;
end;
result:=j+1;
end;
More information about the fpc-devel
mailing list