# Difference between revisions of "CommentAlgorithm"

(New page: =Representing Geeklog's Comments in the Database= Since Geeklog 1.4.x, Geeklog's comments have been stored in the database with columns (''lft'', ''rht'') representing a (modified) pre-or...) |
(→Representing Geeklog's Comments in the Database) |
||

Line 1: | Line 1: | ||

=Representing Geeklog's Comments in the Database= | =Representing Geeklog's Comments in the Database= | ||

− | Since Geeklog 1. | + | Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (''lft'', ''rht'') representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story. |

The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the ''lft'' value of a comment is always less than the ''lft'' and ''rht'' values of all of the children of that node. Likewise the ''rht'' value of a node is always greater that the ''lft'' and ''rht'' values of all of the children of that node. (And trivially for any node, ''lft'' is always less than ''rht''.) A node without any children has the property ''rht'' is exactly one more than ''lft''. | The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the ''lft'' value of a comment is always less than the ''lft'' and ''rht'' values of all of the children of that node. Likewise the ''rht'' value of a node is always greater that the ''lft'' and ''rht'' values of all of the children of that node. (And trivially for any node, ''lft'' is always less than ''rht''.) A node without any children has the property ''rht'' is exactly one more than ''lft''. |

## Revision as of 22:36, 23 July 2009

# Representing Geeklog's Comments in the Database

Since Geeklog 1.3.10, Geeklog's comments have been stored in the database with columns (*lft*, *rht*) representing a (modified) pre-order traversal ordering. This method of storage was chosen to improve comment retrieval performance, especially when viewing a subset of the comments for a given story.

The idea behind pre-order traversal ordering, is that each node is ordered (or visited) before any of its children. In the modified preorder traversal algorithm implemented by Geeklog, the *lft* value of a comment is always less than the *lft* and *rht* values of all of the children of that node. Likewise the *rht* value of a node is always greater that the *lft* and *rht* values of all of the children of that node. (And trivially for any node, *lft* is always less than *rht*.) A node without any children has the property *rht* is exactly one more than *lft*.

Example: Comment A is a top level comment to a story. Comments A-A and A-B are children of A and comments A-A-A, A-A-B, and A-A-C are children of A-A. The table below shows the values for each comment:

comment | parent | lft | rht |
---|---|---|---|

A | N/A | 1 | 12 |

A-A | A | 2 | 9 |

A-B | A | 10 | 11 |

A-A-A | A-A | 3 | 4 |

A-A-B | A-A | 5 | 6 |

A-A-C | A-A | 7 | 8 |

The resulting database structure has many advantages. Selecting a subtree from the database is as easy as specifying the *lft* and *rht* numbers of the root node of the subtree. For instance, to get the subtree with root comment "A-A":

`SELECT * FROM comments WHERE lft >= 2 AND rht <= 9`

Or selecting all the children of a give node:

`SELECT * FROM comments WHERE lft > 2 AND rht < 9`

To select comments ordered for display in a "nested" format:

`SELECT * FROM comments ORDER BY lft`

Storing the preorder traversal information also allows us to easily find the path to a node (i.e. a nodes parents, grandparents, etc).

```
SELECT * FROM comments WHERE lft < 7 AND rht > 8</tt>
```

`Counting the number of children of a comment is equally easy: <code>('rht' - 'lft' - 1)/2`

Previously these operations required a (very slow) recursive algorithm. The way Geeklog had implemented this algorithm required one database query for every comment in the result set. Another way would have been to do one query to fetch all comments, and then use a recursive function to choose the desired comments. Both methods were extremely inefficient and caused load problems on frequently viewed Geeklog sites.

The downside to this method is that inserting and deleting comments takes more work (three queries instead of one) and must be done in a single transaction (or a write lock on the table). Inserts:

`START TRANSACTION`

UPDATE comments SET rht = rht + 2 WHERE rht >= comment.rht

UPDATE comments SET lft = lft + 2 WHERE lft >= comment.rht

INSERT INTO comments SET lft = comment.rht, rht = comment.rht + 1

COMMIT

Deletions:

`START TRANSACTION`

UPDATE comments SET rht = rht - 2 WHERE rht > comment.rht

UPDATE comments SET lft = lft - 2 WHERE lft > comment.rht

DELETE FROM comments WHERE id = comment.id

COMMIT

However, since fetching (reading) comments is a much more frequent action then adding or deleting comments and the increase in efficiency for fetching comments is much greater than the loss of efficiency for adding and deleting comments, the trade off is worth while.

Please note that in the SQL examples above, parts of the query were left out to make the algorithm more clear. For more detail, see the Geeklog source code (specifically lib-comment.php).

Related Articles: