[fpc-pascal] Unicode chars losing information

Graeme Geldenhuys mailinglists at geldenhuys.co.uk
Tue Mar 9 01:18:26 CET 2021


On 08/03/2021 7:49 pm, Jonas Maebe via fpc-pascal wrote:
> It's not possible to safely use unicodestring without
> knowing how 16bit unicode works. The compiler can't solve that.

I disagree. Java does just that! The issue is the assumption of using
array indexing into the a string. I guess developers should stop doing
that.

The important point is:
But developer should be able to use Unicode strings without needing
to know the is and outs of Unicode and UTF-16 encoding. At least
that's what's possible with Java and other languages.

FPC need to introduce class helpers or something with methods like
MyUnicodeString.CharAt(x) and if the char at position x is a
surrogate, then return the surrogate. Implicitly include whatever is
needed to make that work. Other helper methods could return
the Byte or CodePoint at position x - depending on what the developer
wants. Naming these methods in a logical way is key, as they become
self-documenting. No need for 10 web pages explaining how to work with
a [unicode] string.

FPC (and Delphi) really need to get with the times.


Regards,
  Graeme


More information about the fpc-pascal mailing list