[fpc-devel] Unit for handling UTF-8 strings

Sun Apr 7 17:59:14 CEST 2013

On Sun, 7 Apr 2013 13:35:40 +0200
Kostas Michalopoulos <badsectoracula at gmail.com> wrote:

>[...]I looked around in FPC 2.6.2's units and found nothing beyond
> utf8encode/decode (which in linux requires a C widestring manager that i'd
> like to avoid... and doesn't really help in all cases since Unicode can
> exceed the 16bit range).

It does not require a widestring manager.

> Searching in Google i found a discussion from 2007 which basically
> concluded to "yeah, it is a nice feature, has some warts, but people need
> it" but didn't went anywhere
> http://free-pascal-general.1045716.n5.nabble.com/UTF-8-versions-of-Copy-and-Length-td2814536.html
> and
> the apparent lack of a UTF8 unit in FPC six years later (even for basic
> stuff like copy, length, etc) means that that unit never came to exist.
> 
> So, what is going on with that? Graeme mentioned that he already had some
> code and knew some other library that provided a more complete solution

See for example the Lazarus lazutf8 unit.

> that could be imported in FPC and even another guy had yet another library.
> But still no UTF8 in FPC, despite all the different implementations
> floating around out there and despite UTF8 being the most important Unicode
> encoding (being used by practically anything that doesn't falsely believe
> that 16bit integers are enough).

AFAIK there are no UTF16 functions either.
Lazarus provides a lazutf16 unit too.

> Personally i coded yet another unit, which you can find here:
> http://pastebin.com/cJ2TvRdZ
> 
> Of course my code is most likely slow and there might be some bugs there -
> i only did some testing with Greek characters which seem to work fine, but
> nothing like Chinese or the new emoticon stuff which is regularly added in
> Unicode.

I agree, your unit is most likely slow.

>[...]

Mattias