[fpc-pascal] Case insensitive comparison of strings with non-ascii characters

JoshyFun joshyfun at gmail.com
Sun Jul 26 16:23:10 CEST 2009


Hello FPC-Pascal,

Sunday, July 26, 2009, 9:43:06 AM, you wrote:

t> In your strict sense, AnsiCompareText didn't work either.

Yes, Ansi does not work fine also for some usual languages. But we are
used to "simulate" a comparetext using lowercase(a)=lowercase(b) where
the same character could have different foldings to be represented.

t> Does AnsiCompareText report these strings as equal? No. Same with 'ö' ->
'oe' or 'à'->>'A'

Yeah, you are completly right.

>> Write unicode functions in UTF8 is almost non-sense, most unicode
t> operations are not like we are used in the ANSI world
t> The latter is certainly true, but I don't understand what it has to do
t> with UTF-8 or UTF-16.

Because unicode operations many times needs scan forward and back,
rescan, several pass, etc, so processing it in native UTF-8 is a waste
of CPU instead a gain, except some trivial operations. Usually is
faster to pass the string to UTF-16 or UTF-32 and then perform all the
operations that encode and decode to UTF-8 constantly.

>> The code is not optimized but if somebody wants to use them please ask
t> Yes please!

I'll try to make it compile :) The code is mostly experimental, so
some functions are implemented to simply work and return a value not
to get the value fast.

I'll post a link in the list as soon as I can confirm that it at least
compile.

-- 
Best regards,
 JoshyFun




More information about the fpc-pascal mailing list