[fpc-pascal] Case insensitive comparison of strings with non-ascii characters
theo
xpde at theo.ch
Sat Jul 25 17:46:39 CEST 2009
@Luiz Americo
Your code
WideCompareText(UTF8Decode(Key), UTF8Decode(Str))
will work, but if speed matters, then it's rather bad.
I've tried to make a faster function for UTF-8:
uses unicodeinfo, LCLProc;
function UTF8CompareText(s1, s2: UTF8String): Integer;
var u1, u2: Ucs4Char;
u1l, u2l: longint;
BytePos1, Len1, SLen1: integer;
BytePos2, Len2, SLen2: integer;
begin
Result := 0;
BytePos1 := 1;
BytePos2 := 1;
SLen1 := System.Length(s1);
SLen2 := System.Length(s2);
if SLen1 <> SLen2 then //Assuming lower/uppercase representations
have the same byte length
begin
if SLen1 > SLen2 then Result := 1 else Result := -1;
exit;
end;
repeat
u1 := UTF8CharacterToUnicode(@s1[BytePos1], Len1);
inc(BytePos1, Len1);
u2 := UTF8CharacterToUnicode(@s2[BytePos2], Len2);
inc(BytePos2, Len2);
if u1 <> u2 then
begin
{$IFDEF useunicodinfo}
u1l := unicodeinfo.utf8proc_get_property(u1)^.lowercase_mapping;
if u1l <> -1 then u1 := u1l;
u2l := unicodeinfo.utf8proc_get_property(u2)^.lowercase_mapping;
if u2l <> -1 then u2 := u2l;
{$ELSE}
u1 := UCS4Char(WideUpperCase(WideChar(u1))[1]);
u2 := UCS4Char(WideUpperCase(WideChar(u2))[1]);
{$ENDIF}
if u1 <> u2 then
begin
Result := u1 - u2;
exit;
end;
end;
until (BytePos1 > SLen1) or (BytePos2 > SLen2)
end;
Some numbers for my system (Linux) where WideCompareText is the function
you use now, WideUppercase is the above function and unicodeinfo is
the above function with useunicodinfo defined. See here
http://wiki.lazarus.freepascal.org/Theodp
Comparing identical Strings of 322 Chars 10000 times
WideCompareText: 785ms
unicodeinfo: 75ms
WideUpperCase: 74ms
Comparing Strings of 322 Chars 10000 times where the 3rd char differs
WideCompareText: 268ms
unicodeinfo: 3ms
WideUpperCase: 8ms
Comparing identical Text of 322 Chars 10000 times where one Text is all
uppercase
WideCompareText: 810ms
unicodeinfo: 121ms
WideUpperCase: 1076ms
Regards Theo
More information about the fpc-pascal
mailing list