[fpc-pascal] Yet another thread on Unicode Strings
nc-gaertnma at netcologne.de
Wed Oct 4 14:52:08 CEST 2017
On Wed, 4 Oct 2017 13:10:02 +0100
Tony Whyman <tony.whyman at mccallumwhyman.com> wrote:
> Unicode Character String handling is a question that keeps coming up on
> the Free Pascal Mailing lists and, empirically, it is hard to avoid the
> conclusion that there is something wrong with the way these character
> string types are handled. Otherwise, why does this issue keep arising?
Mixing string types, mixing encodings, mixing legacy code, confusing
UCS-2 with UTF-16, ....
> Another problem is that there is no character type for a Unicode
I'm curious: What languages have such a type?
> The built-in type “WideChar” is only two bytes and cannot
> hold a UTF-16 code point comprising two surrogate pairs. There is no
> char type for a UTF-8 character and, while UCS4Char exists, the Lazarus
> UTF-8 utilities use “cardinal” as the type for a code point (not exactly
> strong typing).
Should be remedied.
>Let the programmer worry about the algorithm and the compiler worry about the
An UTF-32 string type is seldom the best choice for memory
> I want to propose a new character type called “UniChar” - short for
> Unicode Character, along with a new string type “UniString” and a new
> collection “TUniStrings”. I have presented my thoughts here in a
> detailed paper
> see https://mwasoftware.co.uk/docs/unistringproposal.pdf
> This is intended to be a fully worked proposal and I have circulated it
> to provoke discussion and in the hope that it may be useful.
Adding another string type without disabling some old string types will
increase the confusion. Please provide a proposal for disabling old
Also keep in mind, that there is still no UTF-16 RTL, even though
many people need that for Delphi compatibility. Starting yet another
UTF-32 RTL need some heavy dedicated programmers.
More information about the fpc-pascal