[fpc-pascal] UnicodeString and Length() function

Graeme Geldenhuys mailinglists at geldenhuys.co.uk
Fri Mar 25 20:21:22 CET 2016

I never really used the UnicodeString (or WideString for that matter) -
I've always used AnsiString with UTF-8 content. I also have my own UTF8
functions Copy(), Length() etc.

Looking at UnicodeString - with FPC 2.6.4 I seem a bit confused. :-/

Take the following code:

{$mode objfpc}{$h+}
{--- $codepage utf8}  // disabled

  S: UTF8String; // for FPC 2.6.4 this is an alias for AnsiString
  U: UnicodeString;
  S := 'Tiburón';
  U := 'Tiburón';

On my 64-bit FreeBSD system that outputs the following:


Length() returns the number of bytes, correct?

So why isn't the result 8 and 14?  The letter o with acute is 2-bytes in
UTF8 ($C3 & $B4). For Unicode (UTF-16), where a "character" is a word
size (2-bytes), thus 2 bytes * 7 characters = 14 bytes. But Length()
returns totally different values to what I expected.

Enabling the {$codepage utf8} made no difference to the results shown above.

Could anybody explain this please?

  - Graeme -

fpGUI Toolkit - a cross-platform GUI toolkit using Free Pascal

My public PGP key:  http://tinyurl.com/graeme-pgp

More information about the fpc-pascal mailing list