<div dir="ltr">Hi all,<div><br></div><div style>Yesterday i came up with the need to use some UTF-8 handling (string parts, length, etc). I looked around in FPC 2.6.2's units and found nothing beyond utf8encode/decode (which in linux requires a C widestring manager that i'd like to avoid... and doesn't really help in all cases since Unicode can exceed the 16bit range).</div>
<div style><br></div><div style>Searching in Google i found a discussion from 2007 which basically concluded to "yeah, it is a nice feature, has some warts, but people need it" but didn't went anywhere <a href="http://free-pascal-general.1045716.n5.nabble.com/UTF-8-versions-of-Copy-and-Length-td2814536.html">http://free-pascal-general.1045716.n5.nabble.com/UTF-8-versions-of-Copy-and-Length-td2814536.html</a> and the apparent lack of a UTF8 unit in FPC six years later (even for basic stuff like copy, length, etc) means that that unit never came to exist.</div>
<div style><br></div><div style>So, what is going on with that? Graeme mentioned that he already had some code and knew some other library that provided a more complete solution that could be imported in FPC and even another guy had yet another library. But still no UTF8 in FPC, despite all the different implementations floating around out there and despite UTF8 being the most important Unicode encoding (being used by practically anything that doesn't falsely believe that 16bit integers are enough).</div>
<div style><br></div><div style>Personally i coded yet another unit, which you can find here: <a href="http://pastebin.com/cJ2TvRdZ">http://pastebin.com/cJ2TvRdZ</a></div><div style><br></div><div style>Of course my code is most likely slow and there might be some bugs there - i only did some testing with Greek characters which seem to work fine, but nothing like Chinese or the new emoticon stuff which is regularly added in Unicode.</div>
<div style><br></div><div style>And since i spent more than i'd like to (~1h, i wasn't exactly proficient with UTF-8 and at some point i got those loops wrong) making a unit that should have been part of Free Pascal since the 90s and my unit might not be exactly the best code for the task out there, i'd like to see this problem solved once and for all. Even if it doesn't provide UpCase/LoCase or whatever else that might be the problem since in this case a partial solution is better than no solution at all because many people are going to only need the partial solution anyway.</div>
<div style><br></div><div style>So, is there any progress with that? If not, feel free to use the above unit if no other is found suitable (which i doubt since it seems to be a problem that people re-solve all the time because of FPC's lack of an official UTF-8 unit). If any modification, relicensing or whatever is needed i can do it.</div>
<div style><br></div><div style>Just put some UTF-8 in there so i (and others) won't have to do it again in the future when we want to write code that compiles out of the box in FPC :-P</div><div style><br></div><div style>
Kostas "Bad Sector" Michalopoulos</div></div>