[fpc-pascal] UTF-8 versions of Copy() and Length()
Daniël Mantione
daniel.mantione at freepascal.org
Sat May 19 12:51:46 CEST 2007
Op Sat, 19 May 2007, schreef Felipe Monteiro de Carvalho:
> On 5/19/07, Rimgaudas Laucius <rimga at ktl.mii.lt> wrote:
> > It is not useful to have functions for both encodings, because these
> > encodings are interconvertable and it is more effective to use UTF-16 for
> > data processing
>
> I disagree. The conversion impacts performance heavely. It will also
> require memory to store the converted string, and after you perform a
> operation you need to convert back.
>
> Further, UTF-16 contains both 2-byte characters and 4-byte characters,
> so I don't see how it would be any faster to process it in comparison
> to process a utf-8 string.
For most operations, it is not necessary to process characters outside
the BMP separately, i.e.:
for i:=1 to length(s) do
s[i]:=upcase(i);
... is valid UTF-16 code, and much faster than the same operation in
UTF-8.
Daniël
More information about the fpc-pascal
mailing list