[fpc-devel] Want to remove AVL_Tree from DOM
Sergei Gorelkin
sergei_gorelkin at mail.ru
Sat Oct 24 00:14:26 CEST 2009
Hello,
I want to remove the avl_tree-related stuff from DOM unit. The reasons are:
1) It is targeted to optimize one particular type of documents, "configs",
that require all node names within a single parent to be unique (strictly
speaking, this isn't xml, it is more like ini-files or Windows registry).
For all other xml documents (which have repeating names) it simply doesn't work.
2) It is a severe performance and memory bottleneck. This maybe wasn't too noticeable
in the past, but things had improved over time. Here are some numbers:
- Speed: here is the sample calltree of TDOMNode_WithChildren.InsertBefore
(the main DOM tree building procedure):
650,963 * DOM_TDOMNODE_WITHCHILDREN_$__INSERTBEFORE$TDOMNODE$TDOMNODE$$TDOMNODE
25,890 > DOM_TDOMNODE_WITHCHILDREN_$__GETFIRSTCHILD$$TDOMNODE (2589x)
7,671,657 > DOM_TDOMNODE_WITHCHILDREN_$__ADDTOCHILDNODETREE$TDOMNODE (8471x)
76,239 > DOM_TDOMNODE_$__CHANGING (8471x)
8 > DOM_TDOMDOCUMENT_$__GETNODETYPE$$LONGINT (1x)
51,616 > DOM_TDOMELEMENT_$__GETNODETYPE$$LONGINT (6452x)
47,056 > DOM_TDOMTEXT_$__GETNODETYPE$$LONGINT (5882x)
36,856 > DOM_TDOMATTR_$__GETNODETYPE$$LONGINT (4607x)
As one may see, the avl_tree consumes more than 90% of total CPU time.
- Memory (numbers are for i386-linux):
TDOMAttr.InstanceSize = 56
TDOMElement.InstanceSize = 60
TDOMText.InstanceSize = 32
TAVLTree.InstanceSize = 20
TAVLTreeNode.InstanceSize = 24
Since each textnode has an associated TAVLTreeNode, and each Element/Attribute has both TAVLTreeNode
and TAVLTree, the overall memory usage grows by about 75%.
3) It is known to have yet unresolved multithreading-related problems (Mantis 12984).
Given the above, I see no sense in further keeping this feature. However, the configs likely
need some sort of optimization anyway. I see the following options:
1) Since all names in DOM document are now stored in a hashtable, it is easy to modify
FindNode() so it compares pointers rather than strings.
2) The new xmlconf unit has the OpenKey/CloseKey methods which improve performance by limiting
the search range to a specific element. Utilizing this feature, however, need modifications to
the user code so it actually calls OpenKey/CloseKey (note that is works perfectly without the
modifications, it's only the matter of performance).
3) It is possible to create even faster xmlconfig class, consisting of TXMLDocument + THashTable,
offering O(1) access to any element. The problem is that it is somewhat mutually exclusive
to the previous solution (OpenKey/CloseKey).
4) Finally, the xpath, being sufficiently mature, can supersede it all and provide
a much more powerful solution.
Well... Thank you for reading this long message :-). I'd be happy to hear other's comment on it.
Regards,
Sergei
More information about the fpc-devel
mailing list