[fpc-devel] Unicode support (yet again)

Jonas Maebe jonas.maebe at elis.ugent.be
Mon Sep 19 10:42:30 CEST 2011


On 19 Sep 2011, at 10:27, Flávio Etrusco wrote:

> I partly agree it's PEBKAC, but why make it easy to get wrong when you
> can avoid it? Isn't that the point of Pascal? Isn't that the point of
> AnsiStrings? Isn't that the point of strong typed languages in
> general?

Yes, but supporting unicode processing in a way that the user does not have to know about unicode is not possible. Even if everything were UTF-32, you could still have characters where the diacritics are separated from the characters they belong with (or should the iteration also temporarily normalize the string?).

Adding band aids that make plain indexing extremely slow to solve some problems and then still requiring people to write different code to get things working right in general is not the point of Pascal or strongly typed languages. Generally, there is a quick&easy way that is fragile and a somewhat slower and more difficult way that is correct. Having something slower that is still fragile does not belong in this picture. Especially not since indexing strings has always accessed the individual bytes/widechars, so it would also make things more confusing for people who have been using Pascal for a long time.

Furthermore it would be Delphi-incompatible (yet more confusion) and it would require "char" to become a 32 bit type since otherwise it would not be possible to represent every indexed string code point using a char (if any sort of consistency is desired, it should be possible to assign a string element to a char variable without data loss, since a basic Pascal convention that has held forever is that a string is conceptually a packed array of char).


Jonas


More information about the fpc-devel mailing list