[fpc-devel] ansistrings and widestrings

DrDiettrich drdiettrich at compuserve.de
Sun Jan 9 06:07:06 CET 2005


peter green wrote:
 
> it should be noted that pascal classes are really not suited to doing
> strings.

IMO we should distinguish Strings, as containers, from Text as an
interpretation of data as, ahem, text of some language, in some
encoding, possibly with attributes...

> to do strings with classes you really need language features which fpc
> doesn't have.

Please explain?

> doing strings with non garbage collected heap based classes would make
> something that was as painfull to work with as pchars and that was totally
> different from any string handling pascal has seen before.

FPC has reference counted string and array types, so that GC is
available.

> just as pascal doesn't consider two strings with different cases to be equal
> it should probbablly not consider two strings of unicode code points to be
> equal unless they are binary equivilent.

That's one of the differences between strings and text. All comparable
data types must have associated comparison functions. For numbers and
strings the standard comparison functions are part of the language
(operators), which usually do a simple binary compare. For other data
types such operators can be defined as appropriate. It should be noted
that a comparison for anything but (strict) equality requires
interpretation rules for the data types. E.g. comparing even ordinal
numbers depends on the byte order of the machine, comparing strings
depends on many more attributes, like mappings for upper/lower case.
That's why a programming language, for itself, will supply only
"primitive" string comparisons, that have reasonable restrictions so
that an implementation should be possible for any platform.

> conversion between ansistring and widestring should be done by functions
> that take one and returns the other (use a const param to avoid the implicit
> try-finally) so that no limitations are put on how the conversion is done.

This applies to all string handling procedures. A modification of
non-const string parameters opens a can of worms (aliasing...)!

> Theese functions should be indirected through procvars so that the default
> fallback versions can be replaced by versions supplied by a unit which
> provides proper internationalisation.

(Inter)nationalization goes far beyond any "standard" features. Dealing
with natural languages IMO requires more than only dictionaries and
hard-coded translation rules. Every natural language can have their own
rules, how e.g. the words in a message must be modified or rearranged
when message arguments shall be inserted into the text.

IMO we must distinguish between the handling of Characters, Strings and
Text. For the alphabets (character sets) of natural languages it should
be possible to implement functions to compare and convert characters;
such support often is built into the OS, for selected languages. This is
the level where multibyte characters can come in, so that just a
Character can be different from any fixed-size data type, and that the
same Character can have multiple representations - remember your umlaut
example? Nonetheless the rules on the Character level at least are quite
well defined, so that it's possible to implement according standard
procedures for comparison and conversion. Of course these procedures
require parameters like the language and the encoding of the characters,
so that IMO exchangable and configurable classes are the best containers
for characters.

Strings can be considered as arrays of Characters, so that the string
handling procedures can use the character handling procedures.
Everything else, that requires more than processing an stream of
individual characters, is beyond the scope of standard procedures. Here
it can become problematic when a string just contains words from
different languages, because then an automatic detection of the language
and according rules can not be guaranteed. That's why I hold the
programmer liable for the correct description of whatever he puts into a
string object.

DoDi






More information about the fpc-devel mailing list