Tuesday 21 May 2013

O(n) Iterative Depth-First Post-Order Tree Traversal

The title really is already a mouth full. But every part is important and no strict subset of descriptive words is quite enough to grasp the whole range of the problem. Especially when turning to Google looking for information about this "thing", I quickly realized that although the answer may be out there, I would not find it quickly. In the end, I'd rather spend my time dipping my hands into algorithms again, developing my own solution and writing up a blog entry.

Properties

Let's go through those points in a meaningful order:
  1. Tree: nodes, edges, root, free of cycles, yadayada.
    • I expect trees to have a root node (root).
    • I expect nodes to have a container full of sub nodes (subs).
    • I assume the order of sub nodes to be of no importance. (See at the end for details and remedies.)
    It is important to note that the algorithm places no further restrictions on the structure of the trees.
    • It works for unidirectional trees, although bi-directionality certainly doesn't hinder it.
    • It makes no assumptions on the arity of the trees. Every node may have variable and potentially unlimited number of sub nodes.
    • It does not use any information in or for the nodes, like flags, tags or additional pointers to pre- or post-order nodes.
  2. Traversal: The algorithm visits each node in those trees once. Visiting as in processing, not as in touching, because nodes may need to be touched multiple times.
  3. Depth-First: Each visiting branch is examined to the deepest nodes before backtracking up the tree.
  4. Post-Order: The algorithm visits all of a node's sub nodes (and their sub nodes, ...) before visiting the node itself.
  5. Iterative: As opposed to recursive. The algorithm doesn't need recursion but a loop and data structures.
  6. O(n): The algorithm takes linear time to complete with regards to the number of nodes. This also requires the helping data structures to be sufficiently fast. Especially when called inside the main loop of the algorithm, the data structures need to provide access in O(1).

Algorithm

Here goes the algorithm. It is easy enough to understand, once it is written in plain sight. Generally speaking, the algorithm touches every node twice:
  1. Once on the way down to examine it and push its sub nodes onto the stack.
  2. Once on the way up to visit it.
The following implementation contains one small optimization in that it visits nodes without sub nodes immediately upon examination, thus touching them only once.

   traverse(tree, visit)
      todo : stack
     
      if tree.root
null
          todo.push([tree.root, down])
          while todo.empty = false
              node, direction = todo.pop()
              if node
.subs.empty = true or direction = up
                  visit(node)

              else
                  todo.push([node, up])
                  foreach sub in node.subs
                      todo.push([sub, down])

That's it. If the programming language of your choice doesn't support pairs or makes them difficult to handle, here is another implementation that uses two stacks which, combined, can be interpreted as those up and down flags. But it is not as plainly to understand.

   traverse(tree, visit)
      depth : stack
      todo : stack
     
      if tree.root
null 
          todo.push(tree.root)
          while todo.empty = false
              node = todo.top
              if node
.subs.empty = true or (depth.empty = false and depth.top = node)
                  visit(node)

                  if depth.empty = false and depth.top = node
                      depth.pop()
                  todo
.pop()
              else
                  depth.push(node)
                  foreach sub in node.subs
                      todo.push(sub)

Note

Trees assign no importance at all to the order of sub nodes inside a node, but that is just a mathematical theory. In the practical world of programming languages, nodes have a container of sub nodes that is iterable and iteration necessarily sequentializes the sub nodes thereby giving order where none is required. The algorithm as written above pushes the sub nodes onto the nodes stack in interation order of the sub nodes container. When poping the elements from that stack, the sub nodes are visited in reverse order. If that is undesirable, the sub nodes should be pushed onto the stack in reverse iteration order. Most programming languages allow reverse iteration or use a container that provides reverse iteration, but if yours fails to offer that convenience, just use another stack, push the sub nodes into it and pop them again. Now you have the sub nodes in reverse iteration order, ready to push onto the todo stack. It's not beautiful, but it doesn't change the complexity of the algorithm.