[fpc-pascal] UnicodeString and Length() function

Graeme Geldenhuys mailinglists at geldenhuys.co.uk
Fri Mar 25 20:21:22 CET 2016


I never really used the UnicodeString (or WideString for that matter) -
I've always used AnsiString with UTF-8 content. I also have my own UTF8
functions Copy(), Length() etc.

Looking at UnicodeString - with FPC 2.6.4 I seem a bit confused. :-/

Take the following code:

============================
{$mode objfpc}{$h+}
{--- $codepage utf8}  // disabled

var
  S: UTF8String; // for FPC 2.6.4 this is an alias for AnsiString
  U: UnicodeString;
begin
  S := 'Tiburón';
  WriteLn(Length(S));
  U := 'Tiburón';
  WriteLn(Length(U));
============================

On my 64-bit FreeBSD system that outputs the following:

==========
10
8
==========

Length() returns the number of bytes, correct?

So why isn't the result 8 and 14?  The letter o with acute is 2-bytes in
UTF8 ($C3 & $B4). For Unicode (UTF-16), where a "character" is a word
size (2-bytes), thus 2 bytes * 7 characters = 14 bytes. But Length()
returns totally different values to what I expected.

Enabling the {$codepage utf8} made no difference to the results shown above.

Could anybody explain this please?

Regards,
  - Graeme -

-- 
fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal
http://fpgui.sourceforge.net/

My public PGP key:  http://tinyurl.com/graeme-pgp



More information about the fpc-pascal mailing list