[fpc-devel] Unicode conversion routines
joshyfun at gmail.com
Sun Nov 23 16:34:28 CET 2008
Sunday, November 23, 2008, 9:21:09 AM, you wrote:
GG> So UTF8Decode only supports UCS2 output! Now this is why I think
GG> supporting UTF-8 in fpGUI and Lazarus LCL was a good idea. By design
GG> (utf-8), you have to support the whole unicode range. With UTF-16,
GG> many people take shortcuts and actually only support UCS2 - and it
GG> goes unnoticed like this case for many years!
It no so worst, technically it is easy to solve all of those issues,
as you seen in the bug report also Delphi has the same problems and it
has not been fixed to keep Delphi compatibility. My UTF8ToUnicode
takes care of all that problems and the surrogate pairs with a 25%
speed penalty (not the same version posted in the bug report, other
that I had optimized a bit).
GG> I'm busy writing unit tests for all the conversion functions and
GG> implementing some new helper functions as well. Hopefully this will
GG> highlight all the UCS2 shotcuts in UTF-16 implementation and other
GG> possible conversion issues.
There are many test case files in unicode.org but most of them are
quite complex to be coded as a test case :(
Also I whish to know which basic unicode functions will be supported
by FPC, only upper/lower, or maybe some more like decompose,
normalize, char-word-line-paragraph iterators... I have some of them
written if the FPC team wants them.
More information about the fpc-devel