[fpc-devel] Unicode conversion routines

Florian Klaempfl florian at freepascal.org
Sun Nov 23 17:21:29 CET 2008


JoshyFun schrieb:
> Hello Graeme,
> 
> Sunday, November 23, 2008, 9:21:09 AM, you wrote:
> 
> GG> So UTF8Decode only supports UCS2 output!  Now this is why I think
> GG> supporting UTF-8 in fpGUI and Lazarus LCL was a good idea. By design
> GG> (utf-8), you have to support the whole unicode range. With UTF-16,
> GG> many people take shortcuts and actually only support UCS2 - and it
> GG> goes unnoticed like this case for many years!
> 
> It no so worst, technically it is easy to solve all of those issues,
> as you seen in the bug report also Delphi has the same problems and it
> has not been fixed to keep Delphi compatibility. 

Afaik we decided to apply the patch, however, I'd no time yet to do so.

> My UTF8ToUnicode
> takes care of all that problems and the surrogate pairs with a 25%
> speed penalty (not the same version posted in the bug report, other
> that I had optimized a bit).
> 
> GG> I'm busy writing unit tests for all the conversion functions and
> GG> implementing some new helper functions as well. Hopefully this will
> GG> highlight all the UCS2 shotcuts in UTF-16 implementation and other
> GG> possible conversion issues.
> 
> There are many test case files in unicode.org but most of them are
> quite complex to be coded as a test case :(
> 
> Also I whish to know which basic unicode functions will be supported
> by FPC, only upper/lower, or maybe some more like decompose,
> normalize, char-word-line-paragraph iterators... I have some of them
> written if the FPC team wants them.
> 

It mainly depends if it needs external libs or huge tables.



More information about the fpc-devel mailing list