[fpc-pascal] UnicodeString and Length() function
Graeme Geldenhuys
mailinglists at geldenhuys.co.uk
Fri Mar 25 20:21:22 CET 2016
I never really used the UnicodeString (or WideString for that matter) -
I've always used AnsiString with UTF-8 content. I also have my own UTF8
functions Copy(), Length() etc.
Looking at UnicodeString - with FPC 2.6.4 I seem a bit confused. :-/
Take the following code:
============================
{$mode objfpc}{$h+}
{--- $codepage utf8} // disabled
var
S: UTF8String; // for FPC 2.6.4 this is an alias for AnsiString
U: UnicodeString;
begin
S := 'Tiburón';
WriteLn(Length(S));
U := 'Tiburón';
WriteLn(Length(U));
============================
On my 64-bit FreeBSD system that outputs the following:
==========
10
8
==========
Length() returns the number of bytes, correct?
So why isn't the result 8 and 14? The letter o with acute is 2-bytes in
UTF8 ($C3 & $B4). For Unicode (UTF-16), where a "character" is a word
size (2-bytes), thus 2 bytes * 7 characters = 14 bytes. But Length()
returns totally different values to what I expected.
Enabling the {$codepage utf8} made no difference to the results shown above.
Could anybody explain this please?
Regards,
- Graeme -
--
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/
My public PGP key: http://tinyurl.com/graeme-pgp
More information about the fpc-pascal
mailing list