[fpc-devel] Re: dominant short strings in compiler source

Flávio Etrusco flavio.etrusco at gmail.com
Fri May 19 19:20:16 CEST 2006


On 5/19/06, Пётр Косаревский <ppkk at mail.ru> wrote:
> Sorry, these two letters were accidentally sent personally.
>
> To Felipe Monteiro de Carvalho:
>
> > > probably Windows will become totally utf16 (not really unicode, but
> > > at least utf16) really soon (at least in newer versions in a way
> > > incompatible with current ones).
> >
> > A small correction, utf16 is a type of unicode.
>
> Uh, Unicode is a big standard with lotsa features.
>
> UTF.. is "Unicode transformation format".
>
> So, utf16 is a type of representing unicode characters.
>
> And, as I think, basic utf16 support does not include support for representing symbols by several utf16 units (unicode is to be about 1000000 symbols, neither much more, nor significantly less).
>
> Also, support for ligatures and other features is weird to non specialized programmers too.
>
> I don't think that Windows will support all unicode features in their implementation of utf16 in filenames.
>
> But Windows, MS Office, MS Internet Explorer and some more M$ products are listed on unicode page as programs supporting unicode.

As the name implies, UTF is a format to allow to represeting every and
all Unicode symbols using diferent block/character size.
UCS-2 and UCS-4 are subsets of the Unicode which can be represented by
single 16bit and 32bit - respectively - characters. UTF-8 and UTF-16
(is there UTF-32?) can represent all of Unicode using encondig some
ranges of symbols with multible "characters".



> To FlАvio Etrusco:
>
> > copy of the string) you should access it as a PChar.
> >
> > as 'const' and 'var', and maybe using PChar in some few places, or can
> >
> > Cheers,
> > Fl  vio
>
> PChar type is no less ugly for pascal than dynamic array.

Why dynamic arrays are ugly? So TList (and TStringList, for that
matter) is ugly, too?
PChar is as ugly as any other Pointer, but in low-level programming it
can be necessary.
Also, using some compiler trickery can be done to optimize usage of
AnsiStrings so we can avoid use of PChars, but of course this will
have to wait a bit.

> I have null characters in my strings sometimes. If a char may have zero code, string should be able to contain it at any place.

Just get a PChar to the end to the string and compare your runner to
it, instead of searching #0 ;-)

Cheers,
Flávio


More information about the fpc-devel mailing list