[fpc-devel] Want to remove AVL_Tree from DOM

Sergei Gorelkin sergei_gorelkin at mail.ru
Sat Oct 24 00:14:26 CEST 2009


Hello,

I want to remove the avl_tree-related stuff from DOM unit. The reasons are:

1) It is targeted to optimize one particular type of documents, "configs",
    that require all node names within a single parent to be unique (strictly
    speaking, this isn't xml, it is more like ini-files or Windows registry).

    For all other xml documents (which have repeating names) it simply doesn't work.

2) It is a severe performance and memory bottleneck. This maybe wasn't too noticeable
    in the past, but things had improved over time. Here are some numbers:

  - Speed:  here is the sample calltree of TDOMNode_WithChildren.InsertBefore
   (the main DOM tree building procedure):

   650,963  * DOM_TDOMNODE_WITHCHILDREN_$__INSERTBEFORE$TDOMNODE$TDOMNODE$$TDOMNODE
    25,890  >   DOM_TDOMNODE_WITHCHILDREN_$__GETFIRSTCHILD$$TDOMNODE (2589x)
7,671,657  >   DOM_TDOMNODE_WITHCHILDREN_$__ADDTOCHILDNODETREE$TDOMNODE (8471x)
    76,239  >   DOM_TDOMNODE_$__CHANGING (8471x)
         8  >   DOM_TDOMDOCUMENT_$__GETNODETYPE$$LONGINT (1x)
    51,616  >   DOM_TDOMELEMENT_$__GETNODETYPE$$LONGINT (6452x)
    47,056  >   DOM_TDOMTEXT_$__GETNODETYPE$$LONGINT (5882x)
    36,856  >   DOM_TDOMATTR_$__GETNODETYPE$$LONGINT (4607x)

As one may see, the avl_tree consumes more than 90% of total CPU time.

  - Memory (numbers are for i386-linux):

TDOMAttr.InstanceSize = 56
TDOMElement.InstanceSize = 60
TDOMText.InstanceSize = 32

TAVLTree.InstanceSize = 20
TAVLTreeNode.InstanceSize = 24

Since each textnode has an associated TAVLTreeNode, and each Element/Attribute has both TAVLTreeNode 
and TAVLTree, the overall memory usage grows by about 75%.

3) It is known to have yet unresolved multithreading-related problems (Mantis 12984).


Given the above, I see no sense in further keeping this feature. However, the configs likely
need some sort of optimization anyway. I see the following options:

1) Since all names in DOM document are now stored in a hashtable, it is easy to modify
    FindNode() so it compares pointers rather than strings.

2) The new xmlconf unit has the OpenKey/CloseKey methods which improve performance by limiting
    the search range to a specific element. Utilizing this feature, however, need modifications to
    the user code so it actually calls OpenKey/CloseKey (note that is works perfectly without the
    modifications, it's only the matter of performance).

3) It is possible to create even faster xmlconfig class, consisting of TXMLDocument + THashTable,
    offering O(1) access to any element. The problem is that it is somewhat mutually exclusive
    to the previous solution (OpenKey/CloseKey).

4) Finally, the xpath, being sufficiently mature, can supersede it all and provide
    a much more powerful solution.

Well... Thank you for reading this long message :-). I'd be happy to hear other's comment on it.


Regards,
Sergei



More information about the fpc-devel mailing list