[fpc-devel] Unicode conversion routines
Florian Klaempfl
florian at freepascal.org
Sun Nov 23 17:21:29 CET 2008
JoshyFun schrieb:
> Hello Graeme,
>
> Sunday, November 23, 2008, 9:21:09 AM, you wrote:
>
> GG> So UTF8Decode only supports UCS2 output! Now this is why I think
> GG> supporting UTF-8 in fpGUI and Lazarus LCL was a good idea. By design
> GG> (utf-8), you have to support the whole unicode range. With UTF-16,
> GG> many people take shortcuts and actually only support UCS2 - and it
> GG> goes unnoticed like this case for many years!
>
> It no so worst, technically it is easy to solve all of those issues,
> as you seen in the bug report also Delphi has the same problems and it
> has not been fixed to keep Delphi compatibility.
Afaik we decided to apply the patch, however, I'd no time yet to do so.
> My UTF8ToUnicode
> takes care of all that problems and the surrogate pairs with a 25%
> speed penalty (not the same version posted in the bug report, other
> that I had optimized a bit).
>
> GG> I'm busy writing unit tests for all the conversion functions and
> GG> implementing some new helper functions as well. Hopefully this will
> GG> highlight all the UCS2 shotcuts in UTF-16 implementation and other
> GG> possible conversion issues.
>
> There are many test case files in unicode.org but most of them are
> quite complex to be coded as a test case :(
>
> Also I whish to know which basic unicode functions will be supported
> by FPC, only upper/lower, or maybe some more like decompose,
> normalize, char-word-line-paragraph iterators... I have some of them
> written if the FPC team wants them.
>
It mainly depends if it needs external libs or huge tables.
More information about the fpc-devel
mailing list