[fpc-devel] XML Parser problems with C-Data and Character Encoding
Andrew Brunner
atbrunner at aurawin.com
Thu Sep 27 15:31:35 CEST 2012
On 09/27/2012 04:19 AM, Mattias Gaertner wrote:
> Have you tried setting the right encoding in the xml? >
> http://mantis.freepascal.org/view.php?id=22990
I have, and it did and it did work (thanks Sergei :-)
> Maybe you can explain what you are trying to do?
I have a cloud social virtual operating system and each read/write
operation is done via XML. So adding content encoding mechanism for
comparing each byte is extremely costly from a client/server
standpoint. Just imagine 1M+ users and having to encode/decode each xml
fragment just to get the parser to parse the data - unwanted latency.
> AFAIK such statements have seldom a positive effect on volunteer
> projects.
My frustration is not with FPC team, because they are drawing code from
projects like firefox. I am extremely sensitive towards wasted cpu
cycles as efficiency in scale is maximized by reducing things like byte
encoding. Some of my stream fragments can be as large as 1.4MB deflated
from 8MB. Multiply that number by say 100 concurrent users on that 1
node and you'll begin to see my frustration.
To me, an XML parser is just looking for "<>" etc. Once it hits a CDATA
section it should only look for ]]. Therefore I was surprised to learn
that it required the encoding tag (which in itself just increased the
average network packet size) that I must transmit from client to my
server nodes.
A good XML parser would not care what is between core essentials.
Content encoding is for human readability it is not for computers nor
should it ever be a concern to something as low level as an XML parser.
This is best for the Internet and it's servers - not my opinion.
> Well, whatever you try, you must at least encode all < > and &
> characters.
Right. I guess when it comes to my assertions regarding all this - I am
looking for an AHH OK notion as to why my servers have to
decode/encode/stamp each XML fragment just so a parser - that is a
machine that does not care what a euro looks like - can process it
correctly.
I suppose we could talk about interoperability here. And of course it
seems like an a-ha moment. But these are XML fragments that were created
in my system. And I think for the sake of the developer of an app -
should have tweaks for efficiency aside from the content encoding
de'joure (UTF8).
I realize my poignancy could be harsh but this is an expensive problem
for me. Each server can cost as much as $4,000 US.
--
Andrew Brunner
Aurawin LLC
512.574.6298
http://aurawin.com/
Aurawin is a great new place to store, share, and enjoy your
photos, videos, music and more.
More information about the fpc-devel
mailing list