[fpc-devel] Unicode resource strings

Ludo Brands ludo.brands at free.fr
Tue Aug 21 15:38:31 CEST 2012


 
> > There is the large category of network apps. Most protocols 
> are utf8 
> > or have a clear preference for utf8 (json for example). 
> Databases are 
> > an extension of that and have the additional complication that they 
> > can mix codepages at any level. These apps can be quite 
> sensitive to 
> > conversion overhead.
> 
> Well, without more details the advice is probably to use UTF8String.
> 

A more detailed example then. A web application that fills in HTML templates
with variable data coming from fe. a database or whatever. HTML is all
ASCII. So parsing an iso-8859-1 or UTF8 template and making ASCII tag
substitutions in both CP is exactly the same. The ascii uppercase works nice
in both and tags are case insensitive at virtually no cost. The problem
starts when a string is supposed to have a codepage and conversions are made
before functions like concatinating strings, uppercase, pos, etc. See
http://bugs.freepascal.org/view.php?id=22501. Detecting code page of the
template and setting the string cp accordingly? Detecting code pages can be
quite expensive.
Even in the utf8 only case, converting all to utf16 to do some basic string
manipulations as suggested can lead quickly to bottlenecks for such basic
string manipulations in high volume web servers. I understand one can not
make an rtl for every code page but the question was to list application
areas where string conversions could be important or critical. I'm not
pushing one or the other solution;)

Ludo  




More information about the fpc-devel mailing list