Trees in SQL : an approach based on materialized paths and normalization for MySQL
I’m working since a few months on how storing trees in MySQL. It started like “hey, let’s make a database of every genre of heavy metal music!”. So I began with the easy way : a tree structure in a table genre with id and parent_id columns. And then I came to the point where I wanted to display the entire tree. This is just impossible with a reasonable number of queries, as there is no recursive syntax in standard SQL nor in MySQL (Oracle has the CONNECT BY extension). So I started researching the web.
There are basically three ways to store a tree in a relationnal databases : adjacency list, nested set and materialized path, plus a few more like nested intervals. Let’s put it clear : they all suck.
- Adjacency list model is the one I used the first time (with
parent_idcolumn). It’s easy to create but impossible to query deeply without recursivity. - Nested set model is quite easy to query, but is indeed pretty fucked up (tree is limited in size, very hard to read without maths, almost all the lines of the table need to be updated each time a node is added, moved or deleted!). Nested intervals model tries to remove the size limitation of nested sets model, but involves way to much maths.
- Materialized path is great and easy to understand, but very slow because it involves heavy use of the “LIKE” operator.
More on that here, here and here.
Then I found two very interesting articles. One is a reply on MySQL Forums about materialized path performance problem, recommending an approach based on normalized schema to store paths. The other is an implementation using PostgreSQL by depesz that also use a normalized schema. The point is to create a table dedicated to store all the paths from every nodes to every nodes. This looks to me like a very good approach, as it’s easy to understand, easy to maintain, easy to query and performance-ok (with appropriate indexes). Unfortunalty, the implementation I found relies to much on PostgreSQL and doesn’t work out-of-the-box with MySQL (because of the use of triggers and of some queries that needs to be rewriten). So I reworked it to work with MySQL 5, and changed a few things. I strongly recommend that you read depesz’s post for a complete understanding of the approach before you continue.
