[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Nov 26 19:13:54 CET 2014

Mattias Gaertner schrieb:
> On Wed, 26 Nov 2014 11:23:17 +0100
> Michael Schnell <mschnell at lumino.de> wrote:

>> Seemingly here the "bytes per character" setting implicitly is thought 
>> of as a port of the "code-page" definition. correct ?
> Code page define bytes per character.


Not all codepages have a fixed number of bytes per character.
The string preamble contains the *element size* (1 for AnsiString), just 
like with every dynamic array.

> As you know: Don't confuse character with glyph and codepoint.

Right, but what is what?

I feel a need for an exact (official) definition of such (and more) 
terms, in order to prevent further misunderstandings of the 
documentation and in discussions.

E.g. "code page" has different meanings, when used with ANSI/ISO and 
Unicode character sets.
While ANSI/ISO codepages desribe different mappings of bytes into 
characters, Unicode codepages define subsets of the whole Unicode range.

My understanding of "character" is a *logical* unit (letter), with 
possibly different encodings, values and sizes in different codepages 
(character sets).
What's the term for the *physical* unit (AnsiChar, WideChar)?

> Ansistring supports only one byte per character code pages.


What's your definition of "character"?

AnsiString supports MBCS codepages as well. The restriction is the 
physical storage unit (1 byte per string item), as imposed by AnsiChar.


More information about the fpc-devel mailing list