[fpc-pascal] UTF-8 versions of Copy() and Length()

Daniël Mantione daniel.mantione at freepascal.org
Sat May 19 12:51:46 CEST 2007



Op Sat, 19 May 2007, schreef Felipe Monteiro de Carvalho:

> On 5/19/07, Rimgaudas Laucius <rimga at ktl.mii.lt> wrote:
> > It is not useful to have functions for both encodings, because these
> > encodings are interconvertable and it is more effective to use UTF-16 for
> > data processing
> 
> I disagree. The conversion impacts performance heavely. It will also
> require memory to store the converted string, and after you perform a
> operation you need to convert back.
>
> Further, UTF-16 contains both 2-byte characters and 4-byte characters,
> so I don't see how it would be any faster to process it in comparison
> to process a utf-8 string.

For most operations, it is not necessary to process characters outside 
the BMP separately, i.e.:

for i:=1 to length(s) do
  s[i]:=upcase(i);

... is valid UTF-16 code, and much faster than the same operation in 
UTF-8.

Daniël


More information about the fpc-pascal mailing list