CommentAlgorithm

From GeeklogWiki
Revision as of 01:33, 13 May 2011 by Vinny (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Representing Geeklog's Comments in the Database

Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (lft, rht) representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story.

Details

The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the lft value of a comment is always less than the lft and rht values of all of the children of that node. Likewise the rht value of a node is always greater that the lft and rht values of all of the children of that node. (And trivially for any node, lft is always less than rht.) A node without any children has the property rht is exactly one more than lft.

Example: Comment A is a top level comment to a story. Comments A-A and A-B are children of A and comments A-A-A, A-A-B, and A-A-C are children of A-A. The table below shows the values for each comment:

comment parent lft rht
A N/A 1 12
A-A A 2 9
A-B A 10 11
A-A-A A-A 3 4
A-A-B A-A 5 6
A-A-C A-A 7 8

The resulting database structure has many advantages. Selecting a subtree from the database is as easy as specifying the lft and rht numbers of the root node of the subtree. For instance, to get the subtree with root comment "A-A":

SELECT * FROM comments WHERE lft >= 2 AND rht <= 9

Or selecting all the children of a give node:

SELECT * FROM comments WHERE lft > 2 AND rht < 9

To select comments ordered for display in a "nested" format:

SELECT * FROM comments ORDER BY lft

Storing the preorder traversal information also allows us to easily find the path to a node (i.e. a nodes parents, grandparents, etc).

SELECT * FROM comments WHERE lft < 7 AND rht > 8

Counting the number of children of a comment is equally easy: ('rht' - 'lft' - 1)/2

Previously these operations required a (very slow) recursive algorithm. The way Geeklog had implemented this algorithm required one database query for every comment in the result set. Another way would have been to do one query to fetch all comments, and then use a recursive function to choose the desired comments. Both methods were extremely inefficient and caused load problems on frequently viewed Geeklog sites.

The downside to this method is that inserting and deleting comments takes more work (three queries instead of one) and must be done in a single transaction (or a write lock on the table). Inserts:

START TRANSACTION
UPDATE comments SET rht = rht + 2 WHERE rht >= parent_comment.rht
UPDATE comments SET lft = lft + 2 WHERE lft >= parent_comment.rht
INSERT INTO comments SET lft = parent_comment.rht, rht = parent_comment.rht + 1
COMMIT

Deletions:

START TRANSACTION
UPDATE comments SET rht = rht - 2 WHERE rht > comment.rht
UPDATE comments SET lft = lft - 2 WHERE lft > comment.rht
DELETE FROM comments WHERE id = comment.id
COMMIT

However, since fetching (reading) comments is a much more frequent action then adding or deleting comments and the increase in efficiency for fetching comments is much greater than the loss of efficiency for adding and deleting comments, the trade off is worthwhile.

Please note that in the SQL examples above, parts of the query were left out to make the algorithm more clear. For more detail, see the Geeklog source code (specifically lib-comment.php).

External related articles