[fpc-pascal] Unicode file routines proposal

Marco van de Voort marcov at stack.nl
Tue Jul 1 09:58:53 CEST 2008


> En/na Marco van de Voort ha escrit:
> >> with wide unicode support, and in that case every character will use 4
> >> bytes.
> >>
> > That's IMHO a faulty system. It requires you to choose between an incomplete
> > solution or making strings a horrible memory hog.
> 
> OTOH using variable length characters will make string operations 
> expensive (since you can't just multiply the index by 2 or 4 but you 
> have to examine the string from the beginning, and the length in bytes 
> isn't the same as the length in characters).

Yes. In the routines where you do random access on elements. Half of that
can be gained back since most string routines iterate from start to end
anyway.
 
> > But maybe that doesn't
> > matter for mere scripting languages (though I wonder then why they didn't
> > chose UTF-32 directly)
> > 
> > Surrogates are not nice, but they were invented for a reason.
> 
> Well, yes, they're a trade-off between performance and memory 
> consumption, but I fear we're losing one of the advantages that pascal 
> has over C: fast and simple string handling.

We also don't want to slip of the other end and turn into a scripting
language unsuitable for major programming.



More information about the fpc-pascal mailing list