[fpc-devel] Unit for handling UTF-8 strings
Kostas Michalopoulos
badsectoracula at gmail.com
Sun Apr 7 13:35:40 CEST 2013
Hi all,
Yesterday i came up with the need to use some UTF-8 handling (string parts,
length, etc). I looked around in FPC 2.6.2's units and found nothing beyond
utf8encode/decode (which in linux requires a C widestring manager that i'd
like to avoid... and doesn't really help in all cases since Unicode can
exceed the 16bit range).
Searching in Google i found a discussion from 2007 which basically
concluded to "yeah, it is a nice feature, has some warts, but people need
it" but didn't went anywhere
http://free-pascal-general.1045716.n5.nabble.com/UTF-8-versions-of-Copy-and-Length-td2814536.html
and
the apparent lack of a UTF8 unit in FPC six years later (even for basic
stuff like copy, length, etc) means that that unit never came to exist.
So, what is going on with that? Graeme mentioned that he already had some
code and knew some other library that provided a more complete solution
that could be imported in FPC and even another guy had yet another library.
But still no UTF8 in FPC, despite all the different implementations
floating around out there and despite UTF8 being the most important Unicode
encoding (being used by practically anything that doesn't falsely believe
that 16bit integers are enough).
Personally i coded yet another unit, which you can find here:
http://pastebin.com/cJ2TvRdZ
Of course my code is most likely slow and there might be some bugs there -
i only did some testing with Greek characters which seem to work fine, but
nothing like Chinese or the new emoticon stuff which is regularly added in
Unicode.
And since i spent more than i'd like to (~1h, i wasn't exactly proficient
with UTF-8 and at some point i got those loops wrong) making a unit that
should have been part of Free Pascal since the 90s and my unit might not be
exactly the best code for the task out there, i'd like to see this problem
solved once and for all. Even if it doesn't provide UpCase/LoCase or
whatever else that might be the problem since in this case a partial
solution is better than no solution at all because many people are going to
only need the partial solution anyway.
So, is there any progress with that? If not, feel free to use the above
unit if no other is found suitable (which i doubt since it seems to be a
problem that people re-solve all the time because of FPC's lack of an
official UTF-8 unit). If any modification, relicensing or whatever is
needed i can do it.
Just put some UTF-8 in there so i (and others) won't have to do it again in
the future when we want to write code that compiles out of the box in FPC
:-P
Kostas "Bad Sector" Michalopoulos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20130407/a82c8b2d/attachment.html>
More information about the fpc-devel
mailing list