[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
DrDiettrich1 at aol.com
Wed Nov 26 19:13:54 CET 2014
Mattias Gaertner schrieb:
> On Wed, 26 Nov 2014 11:23:17 +0100
> Michael Schnell <mschnell at lumino.de> wrote:
>> Seemingly here the "bytes per character" setting implicitly is thought
>> of as a port of the "code-page" definition. correct ?
> Code page define bytes per character.
Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString), just
like with every dynamic array.
> As you know: Don't confuse character with glyph and codepoint.
Right, but what is what?
I feel a need for an exact (official) definition of such (and more)
terms, in order to prevent further misunderstandings of the
documentation and in discussions.
E.g. "code page" has different meanings, when used with ANSI/ISO and
Unicode character sets.
While ANSI/ISO codepages desribe different mappings of bytes into
characters, Unicode codepages define subsets of the whole Unicode range.
My understanding of "character" is a *logical* unit (letter), with
possibly different encodings, values and sizes in different codepages
What's the term for the *physical* unit (AnsiChar, WideChar)?
> Ansistring supports only one byte per character code pages.
What's your definition of "character"?
AnsiString supports MBCS codepages as well. The restriction is the
physical storage unit (1 byte per string item), as imposed by AnsiChar.
More information about the fpc-devel