[fpc-devel] Unicode resource strings

Mattias Gaertner nc-gaertnma at netcologne.de
Tue Aug 21 17:16:52 CEST 2012


On Tue, 21 Aug 2012 15:38:31 +0200
"Ludo Brands" <ludo.brands at free.fr> wrote:

>  
> > > There is the large category of network apps. Most protocols 
> > are utf8 
> > > or have a clear preference for utf8 (json for example). 
> > Databases are 
> > > an extension of that and have the additional complication that they 
> > > can mix codepages at any level. These apps can be quite 
> > sensitive to 
> > > conversion overhead.
> > 
> > Well, without more details the advice is probably to use UTF8String.
> > 
> 
> A more detailed example then. A web application that fills in HTML templates
> with variable data coming from fe. a database or whatever. HTML is all
> ASCII. So parsing an iso-8859-1 or UTF8 template and making ASCII tag
> substitutions in both CP is exactly the same. The ascii uppercase works nice
> in both and tags are case insensitive at virtually no cost. The problem
> starts when a string is supposed to have a codepage and conversions are made
> before functions like concatinating strings, uppercase, pos, etc. See
> http://bugs.freepascal.org/view.php?id=22501.

Bug 22501 is about string constants and mismatch of CPs.
Note that this is a different beast than dynamic data coming
from files, sockets or db.


> Detecting code page of the
> template and setting the string cp accordingly? Detecting code pages can be
> quite expensive.

And it is often impossible.

> Even in the utf8 only case, converting all to utf16 to do some basic string
> manipulations as suggested can lead quickly to bottlenecks for such basic
> string manipulations in high volume web servers. I understand one can not
> make an rtl for every code page but the question was to list application
> areas where string conversions could be important or critical. I'm not
> pushing one or the other solution;)

It's about string conversions that are critical and hard to fix.
I have not doubt that changing the string type means that some
functions will become slow. But it does not mean they are hard to fix.
For example you could change the string type of the time critical
strings to UTF8String to make sure that the big strings are never
converted.


Mattias



More information about the fpc-devel mailing list