[fpc-devel] XML Parser problems with C-Data and Character Encoding

Andrew Brunner atbrunner at aurawin.com
Thu Sep 27 15:31:35 CEST 2012


On 09/27/2012 04:19 AM, Mattias Gaertner wrote:
> Have you tried setting the right encoding in the xml?  > 
> http://mantis.freepascal.org/view.php?id=22990

I have, and it did and it did work (thanks Sergei :-)
> Maybe you can explain what you are trying to do?

I have a cloud social virtual operating system and each read/write 
operation is done via XML.  So adding content encoding mechanism for 
comparing each byte is extremely costly from a client/server 
standpoint.  Just imagine 1M+ users and having to encode/decode each xml 
fragment just to get the parser to parse the data - unwanted latency.
> AFAIK such statements have seldom a positive effect on volunteer 
> projects.

My frustration is not with FPC team, because they are drawing code from 
projects like firefox.  I am extremely sensitive towards wasted cpu 
cycles as efficiency in scale is maximized by reducing things like byte 
encoding.  Some of my stream fragments can be as large as 1.4MB deflated 
from 8MB.  Multiply that number by say 100 concurrent users on that 1 
node and you'll begin to see my frustration.

To me, an XML parser is just looking for "<>" etc.  Once it hits a CDATA 
section it should only look for ]].  Therefore I was surprised to learn 
that it required the encoding tag (which in itself just increased the 
average network packet size) that I must transmit from client to my 
server nodes.

A good XML parser would not care what is between core essentials. 
Content encoding is for human readability it is not for computers nor 
should it ever be a concern to something as low level as an XML parser.  
This is best for the Internet and it's servers - not my opinion.

> Well, whatever you try, you must at least encode all < > and & 
> characters.

Right.  I guess when it comes to my assertions regarding all this - I am 
looking for an AHH OK notion as to why my servers have to 
decode/encode/stamp each XML fragment just so a parser - that is a 
machine that does not care what a euro looks like - can process it 
correctly.

I suppose we could talk about interoperability here.  And of course it 
seems like an a-ha moment. But these are XML fragments that were created 
in my system.  And I think for the sake of the developer of an app - 
should have tweaks for efficiency aside from the content encoding 
de'joure (UTF8).

I realize my poignancy could be harsh but this is an expensive problem 
for me.  Each server can cost as much as $4,000 US.

-- 
Andrew Brunner

Aurawin LLC
512.574.6298
http://aurawin.com/

Aurawin is a great new place to store, share, and enjoy your
photos, videos, music and more.




More information about the fpc-devel mailing list