[fpc-devel] ansistrings and widestrings

Fri Jan 7 17:24:48 CET 2005

it should be noted that pascal classes are really not suited to doing
strings.

to do strings with classes you really need language features which fpc
doesn't have.

doing strings with non garbage collected heap based classes would make
something that was as painfull to work with as pchars and that was totally
different from any string handling pascal has seen before.

just as pascal doesn't consider two strings with different cases to be equal
it should probbablly not consider two strings of unicode code points to be
equal unless they are binary equivilent.

conversion between ansistring and widestring should be done by functions
that take one and returns the other (use a const param to avoid the implicit
try-finally) so that no limitations are put on how the conversion is done.
Theese functions should be indirected through procvars so that the default
fallback versions can be replaced by versions supplied by a unit which
provides proper internationalisation.

> -----Original Message-----
> From: fpc-devel-bounces at lists.freepascal.org
> [mailto:fpc-devel-bounces at lists.freepascal.org]On Behalf Of DrDiettrich
> Sent: 07 January 2005 15:06
> To: FPC developers' list
> Subject: Re: [fpc-devel] ansistrings and widestrings
>
>
> Florian Klaempfl wrote:
>
> > > The only universal international representation for strings is Unicode
> > > (currently 32 bit), that doesn't require any conversions.
> >
> > That's not true. E.g. the german umlauts can be represented by 2 chars
> > when using UTF-32 (the char and the two dots), same apply to a lot of
> > other languages.
>
> Okay, this is where I didn't understand the difference between code
> points and whatsoever. Doesn't in the umlaut and accented case exist a
> unique glyph and according code, that could be used in the first place?
> In other languages (Arabic...) the glyph may vary with the context, here
> I have no idea how to compare such text, but the native writers
> (speakers) of such glyphs should know ;-)
>
> > Encoding isn't the main problem, you need dedicated procecures and
> > functions for unicode comparision, upper/lower conversion etc.
>
> Agreed, these will become the string class methods. It may be necessary
> to partition Unicode into code pages, with different methods for
> comparison etc.
>
> In the worst case, if we cannot find or agree about a so-far unique
> representation for text, an "uncomparable" value has to become a valid
> result of a comparison.
>
>
> > To achive this platfrom independend is very hard ...
>
> How that? I agree that here the existence of definitely
> compatible/portable OS services is not guaranteed. But when the methods
> have to be implemented for platforms that do not have such services at
> all, then these implementations can be used on all other platforms as
> well.
>
>
> All in all I'd say that we do not intend to implement a text processing
> or translation system. What we can do is to define a string or text
> class, that contains text in a well defined form, for processing with
> all specified methods. The key point is the import of text into an
> object of any such class. If no appropriate class has been implemented,
> the import is simply impossible. Inside, i.e. between these classes, all
> the methods should work. Perhaps with graceful "uncomparable" or
> "unconvertable" results, when somebody insists in using incompletly
> implemented classes.
> We don't want the impossible, the doable will be sufficient ;-)
>
> DoDi
>
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel