Trees in SQL : an approach based on materialized paths and normalization for MySQL

I’m working since a few months on how storing trees in MySQL. It started like “hey, let’s make a database of every genre of heavy metal music!”. So I began with the easy way : a tree structure in a table genre with id and parent_id columns. And then I came to the point where I wanted to display the entire tree. This is just impossible with a reasonable number of queries, as there is no recursive syntax in standard SQL nor in MySQL (Oracle has the CONNECT BY extension). So I started researching the web.

There are basically three ways to store a tree in a relationnal databases : adjacency list, nested set and materialized path, plus a few more like nested intervals. Let’s put it clear : they all suck.

  • Adjacency list model is the one I used the first time (with parent_id column). It’s easy to create but impossible to query deeply without recursivity.
  • Nested set model is quite easy to query, but is indeed pretty fucked up (tree is limited in size, very hard to read without maths, almost all the lines of the table need to be updated each time a node is added, moved or deleted!). Nested intervals model tries to remove the size limitation of nested sets model, but involves way to much maths.
  • Materialized path is great and easy to understand, but very slow because it involves heavy use of the “LIKE” operator.

More on that here, here and here.

Then I found two very interesting articles. One is a reply on MySQL Forums about materialized path performance problem, recommending an approach based on normalized schema to store paths. The other is an implementation using PostgreSQL by depesz that also use a normalized schema. The point is to create a table dedicated to store all the paths from every nodes to every nodes. This looks to me like a very good approach, as it’s easy to understand, easy to maintain, easy to query and performance-ok (with appropriate indexes). Unfortunalty, the implementation I found relies to much on PostgreSQL and doesn’t work out-of-the-box with MySQL (because of the use of triggers and of some queries that needs to be rewriten). So I reworked it to work with MySQL 5, and changed a few things. I strongly recommend that you read depesz’s post for a complete understanding of the approach before you continue.

Data Model

First, we create a table containing the data, let’s name it data. The parent_id column stores the parent’s ID, like in the traditionnal adjacency list approach, but we’re not going to use it for querying the tree. The value column is whatever you need to store as value for the node, and we’re going to use it for ordering. We use InnoDB as storage engine.

CREATE TABLE data (
    id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT,
    parent_id INTEGER UNSIGNED null,
    value VARCHAR(255) NOT NULL,

    PRIMARY KEY (id),

    INDEX data_data_fk (parent_id),
    FOREIGN KEY data_data_fk (parent_id) REFERENCES data (id) ON DELETE CASCADE
) ENGINE = InnoDB;

Then we create the table that will store every paths. A path contains a parent node, a child node, and the number of hops between the two nodes (depth). Each node has at least one entry on that table: the path to itself with a depth egal 0. The path column will hold the materialized path, that will be used for ordering. If you’re absolutely sure that you will never ever need to display your tree in alphabetical order, then you can safely remove this column and remove related queries from triggers.

CREATE TABLE data_tree (
    parent_id INTEGER UNSIGNED NOT NULL,
    child_id INTEGER UNSIGNED NOT NULL,
    depth INTEGER UNSIGNED NOT NULL default 0,
    path TEXT NOT NULL,

    PRIMARY KEY (parent_id, child_id),
    INDEX data_tree_child_depth_idx (child_id, depth),

    INDEX data_tree_parent_data_fk (parent_id),
    FOREIGN KEY data_tree_parent_data_fk (parent_id) REFERENCES data (id) ON DELETE CASCADE,
    INDEX data_tree_child_data_fk (child_id),
    FOREIGN KEY data_tree_child_data_fk (child_id) REFERENCES data (id) ON DELETE CASCADE,
);

Creating a new node

When a node is inserted in the table data, then every path leading to this node has to be inserted in the table data_tree. To achieve this, I use a trigger (MySQL 5 or higher) because it’s safer for the DB’s integrity, but a client side implementation (in PHP or any other langage) will do, as long as you use a transaction.

DELIMITER |
CREATE TRIGGER insert_data
AFTER INSERT ON data
FOR EACH ROW BEGIN
    INSERT INTO data_tree (parent_id, child_id, depth, path)
    VALUES (NEW.id, NEW.id, 0, CONCAT(NEW.value,"/"));

    INSERT INTO data_tree (parent_id, child_id, depth, path)
        SELECT parent_id, NEW.id, depth + 1, CONCAT(path, NEW.value, "/")
        FROM data_tree
        WHERE child_id = NEW.parent_id;
END; |
DELIMITER ;

Updating the tree

First possible update : moving a node. To move a node, we basically have to delete every path between the node’s ancestors (if any) to the node’s childs (if any), and then insert all the new paths. This can be done easily with two queries in a trigger, or again, if you prefer, client side. No big changes from depesz’s post, I only had to rewrite queries for MySQL.

Second possible update : changing a node’s value. This step is optionnal if you decided to drop the path column. When a node’s value is updated, for exemple from “Apinat” to “Banaanit”, then every path including that node value has to be updated as well, otherwise the tree won’t be correctly ordered. This step has to be completly rewriten for MySQL, has the original depesz’s implementation for PostgreSQL uses regexp replacements, which are not supported. Hence I came with a workaround, using only pure string functions to make the replacement.

DELIMITER |
CREATE TRIGGER update_data
AFTER UPDATE ON data
FOR EACH ROW BEGIN
    DECLARE old_path_len INT;
    IF NEW.parent_id != OLD.parent_id OR NEW.parent_id IS NULL OR OLD.parent_id IS NULL THEN
        if OLD.parent_id IS NOT NULL THEN
            DELETE t2
            FROM data_tree t1
            JOIN data_tree t2 ON t1.child_id = t2.child_id
            WHERE t1.parent_id = OLD.id
            AND t2.depth > t1.depth;
        END IF;

        IF NEW.parent_id IS NOT NULL THEN
            INSERT INTO data_tree (parent_id, child_id, depth, path)
                SELECT t1.parent_id, t2.child_id, t1.depth + t2.depth + 1, CONCAT(t1.path, t2.path)
                FROM data_tree t1, data_tree t2
                WHERE t1.child_id = NEW.parent_id
                AND t2.parent_id = OLD.id;
        END IF;
    END IF;

    IF NEW.value != OLD.value THEN
        SELECT CHAR_LENGTH(path) INTO old_path_len
        FROM data_tree
        WHERE child_id = OLD.id
        AND DEPTH = 0;

        IF old_path_len > 0 THEN
            UPDATE data_tree t1
            JOIN data_tree t2 ON t1.child_id = t2.child_id
            SET t2.path = CONCAT(
                SUBSTRING(t2.path, 1, CHAR_LENGTH(t2.path) - CHAR_LENGTH(t1.path)),
                CONCAT(NEW.value, SUBSTRING(t1.path, old_path_len))
            )
            WHERE t1.parent_id = OLD.id
            AND t2.depth >= t1.depth;
        END IF;
    END IF;
END; |
DELIMITER ;

Deleting a node

Thanks to the ON DELETE CASCADE clause, we don’t need to do anything when deleting a node. Beware that every childs will also be deleted from the tree and from the data table.

Examples

Let’s play a little with this implementation.

Populating the tree

mysql> INSERT INTO data (parent_id, value) VALUES (null, 'Heavy metal');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM data;
+----+-----------+-------------+
| id | parent_id | value       |
+----+-----------+-------------+
|  1 |      NULL | Heavy metal |
+----+-----------+-------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM data_tree;
+-----------+----------+-------+--------------+
| parent_id | child_id | depth | path         |
+-----------+----------+-------+--------------+
|         1 |        1 |     0 | Heavy metal/ |
+-----------+----------+-------+--------------+
1 row in set (0.00 sec)

mysql> INSERT INTO data (parent_id, value) VALUES (1, 'Power metal'), (1, 'Trash metal'), (1, 'Death metal');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM data;
+----+-----------+-------------+
| id | parent_id | value       |
+----+-----------+-------------+
|  1 |      NULL | Heavy metal |
|  2 |         1 | Power metal |
|  3 |         1 | Trash metal |
|  4 |         1 | Death metal |
+----+-----------+-------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM data_tree;
+-----------+----------+-------+--------------------------+
| parent_id | child_id | depth | path                     |
+-----------+----------+-------+--------------------------+
|         1 |        1 |     0 | Heavy metal/             |
|         1 |        2 |     1 | Heavy metal/Power metal/ |
|         1 |        3 |     1 | Heavy metal/Trash metal/ |
|         1 |        4 |     1 | Heavy metal/Death metal/ |
|         2 |        2 |     0 | Power metal/             |
|         3 |        3 |     0 | Trash metal/             |
|         4 |        4 |     0 | Death metal/             |
+-----------+----------+-------+--------------------------+
7 rows in set (0.00 sec)

mysql> INSERT INTO data (parent_id, value) VALUES (2, 'Symphonic power metal'), (2, 'Extreme power metal'), (4, 'Brutal death metal');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> SELECT * FROM data_tree;
+-----------+----------+-------+------------------------------------------------+
| parent_id | child_id | depth | path                                           |
+-----------+----------+-------+------------------------------------------------+
|         1 |        1 |     0 | Heavy metal/                                   |
|         1 |        2 |     1 | Heavy metal/Power metal/                       |
|         1 |        3 |     1 | Heavy metal/Trash metal/                       |
|         1 |        4 |     1 | Heavy metal/Death metal/                       |
|         1 |        5 |     2 | Heavy metal/Power metal/Symphonic power metal/ |
|         1 |        6 |     2 | Heavy metal/Power metal/Extreme power metal/   |
|         1 |        7 |     2 | Heavy metal/Death metal/Brutal death metal/    |
|         2 |        2 |     0 | Power metal/                                   |
|         2 |        5 |     1 | Power metal/Symphonic power metal/             |
|         2 |        6 |     1 | Power metal/Extreme power metal/               |
|         3 |        3 |     0 | Trash metal/                                   |
|         4 |        4 |     0 | Death metal/                                   |
|         4 |        7 |     1 | Death metal/Brutal death metal/                |
|         5 |        5 |     0 | Symphonic power metal/                         |
|         6 |        6 |     0 | Extreme power metal/                           |
|         7 |        7 |     0 | Brutal death metal/                            |
+-----------+----------+-------+------------------------------------------------+
16 rows in set (0.00 sec)

mysql> SELECT * FROM data;
+----+-----------+-----------------------+
| id | parent_id | value                 |
+----+-----------+-----------------------+
|  1 |      NULL | Heavy metal           |
|  2 |         1 | Power metal           |
|  3 |         1 | Trash metal           |
|  4 |         1 | Death metal           |
|  5 |         2 | Symphonic power metal |
|  6 |         2 | Extreme power metal   |
|  7 |         4 | Brutal death metal    |
+----+-----------+-----------------------+
7 rows in set (0.00 sec)

Requesting the tree

First generation of childs FROM “Heavy metal”

mysql> SELECT d.id, d.value
       FROM data_tree t
       JOIN data d ON d.id = t.child_id
       WHERE t.parent_id = 1
       AND depth = 1;
+----+-------------+
| id | value       |
+----+-------------+
|  2 | Power metal |
|  3 | Trash metal |
|  4 | Death metal |
+----+-------------+
3 rows in set (0.00 sec)

Printing the whole tree, ordered

mysql> SELECT CONCAT(REPEAT('-',t.depth), d.value) AS value
       FROM data_tree t
       JOIN data d ON d.id = t.child_id
       WHERE t.parent_id = 1
       ORDER by t.path;
+-------------------------+
| value                   |
+-------------------------+
| Heavy metal             |
| -Death metal            |
| --Brutal death metal    |
| -Power metal            |
| --Extreme power metal   |
| --Symphonic power metal |
| -Trash metal            |
+-------------------------+
7 rows in set (0.00 sec)

Ancestors of “Symphonic power metal”, starting from the root

mysql> SELECT d.id, d.value
       FROM data_tree t
       JOIN data d ON d.id = t.parent_id
       WHERE t.child_id = 5
       ORDER BY depth DESC;
+----+-----------------------+
| id | value                 |
+----+-----------------------+
|  1 | Heavy metal           |
|  2 | Power metal           |
|  5 | Symphonic power metal |
+----+-----------------------+
3 rows in set (0.00 sec)

Updating the tree

Moving a node to another place

Let’s add a Hard rock in our tree, and move Heavy metal as a child of Hard rock.

mysql> INSERT INTO data (parent_id, value) VALUES (null, 'Hard rock');
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM data;
+----+-----------+-----------------------+
| id | parent_id | value                 |
+----+-----------+-----------------------+
|  1 |      NULL | Heavy metal           |
|  2 |         1 | Power metal           |
|  3 |         1 | Trash metal           |
|  4 |         1 | Death metal           |
|  5 |         2 | Symphonic power metal |
|  6 |         2 | Extreme power metal   |
|  7 |         4 | Brutal death metal    |
|  8 |      NULL | Hard rock             |
+----+-----------+-----------------------+
8 rows in set (0.00 sec)

mysql> UPDATE data SET parent_id = 8 WHERE id = 1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM data;
+----+-----------+-----------------------+
| id | parent_id | value                 |
+----+-----------+-----------------------+
|  1 |         8 | Heavy metal           |
|  2 |         1 | Power metal           |
|  3 |         1 | Trash metal           |
|  4 |         1 | Death metal           |
|  5 |         2 | Symphonic power metal |
|  6 |         2 | Extreme power metal   |
|  7 |         4 | Brutal death metal    |
|  8 |      NULL | Hard rock             |
+----+-----------+-----------------------+
8 rows in set (0.00 sec)

mysql> SELECT * FROM data_tree;
+-----------+----------+-------+----------------------------------------------------------+
| parent_id | child_id | depth | path                                                     |
+-----------+----------+-------+----------------------------------------------------------+
|         1 |        1 |     0 | Heavy metal/                                             |
|         1 |        2 |     1 | Heavy metal/Power metal/                                 |
|         1 |        3 |     1 | Heavy metal/Trash metal/                                 |
|         1 |        4 |     1 | Heavy metal/Death metal/                                 |
|         1 |        5 |     2 | Heavy metal/Power metal/Symphonic power metal/           |
|         1 |        6 |     2 | Heavy metal/Power metal/Extreme power metal/             |
|         1 |        7 |     2 | Heavy metal/Death metal/Brutal death metal/              |
|         2 |        2 |     0 | Power metal/                                             |
|         2 |        5 |     1 | Power metal/Symphonic power metal/                       |
|         2 |        6 |     1 | Power metal/Extreme power metal/                         |
|         3 |        3 |     0 | Trash metal/                                             |
|         4 |        4 |     0 | Death metal/                                             |
|         4 |        7 |     1 | Death metal/Brutal death metal/                          |
|         5 |        5 |     0 | Symphonic power metal/                                   |
|         6 |        6 |     0 | Extreme power metal/                                     |
|         7 |        7 |     0 | Brutal death metal/                                      |
|         8 |        1 |     1 | Hard rock/Heavy metal/                                   |
|         8 |        2 |     2 | Hard rock/Heavy metal/Power metal/                       |
|         8 |        3 |     2 | Hard rock/Heavy metal/Trash metal/                       |
|         8 |        4 |     2 | Hard rock/Heavy metal/Death metal/                       |
|         8 |        5 |     3 | Hard rock/Heavy metal/Power metal/Symphonic power metal/ |
|         8 |        6 |     3 | Hard rock/Heavy metal/Power metal/Extreme power metal/   |
|         8 |        7 |     3 | Hard rock/Heavy metal/Death metal/Brutal death metal/    |
|         8 |        8 |     0 | Hard rock/                                               |
+-----------+----------+-------+----------------------------------------------------------+
24 rows in set (0.00 sec)

mysql> SELECT CONCAT(REPEAT('-',t.depth), d.value) AS value
       FROM data_tree t
       JOIN data d ON d.id = t.child_id
       WHERE t.parent_id = 8
       ORDER BY t.path;
+--------------------------+
| value                    |
+--------------------------+
| Hard rock                |
| -Heavy metal             |
| --Death metal            |
| ---Brutal death metal    |
| --Power metal            |
| ---Extreme power metal   |
| ---Symphonic power metal |
| --Trash metal            |
+--------------------------+
8 rows in set (0.00 sec)

Changing a node’s value

Ok, I don’t know why we would do that, but let’s just rename Death metal to ZDeath metal and see what happens.

mysql> UPDATE data SET value = 'ZDeath metal' WHERE id = 4;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> SELECT * FROM data_tree;
+-----------+----------+-------+----------------------------------------------------------+
| parent_id | child_id | depth | path                                                     |
+-----------+----------+-------+----------------------------------------------------------+
|         1 |        1 |     0 | Heavy metal/                                             |
|         1 |        2 |     1 | Heavy metal/Power metal/                                 |
|         1 |        3 |     1 | Heavy metal/Trash metal/                                 |
|         1 |        4 |     1 | Heavy metal/ZDeath metal/                                |
|         1 |        5 |     2 | Heavy metal/Power metal/Symphonic power metal/           |
|         1 |        6 |     2 | Heavy metal/Power metal/Extreme power metal/             |
|         1 |        7 |     2 | Heavy metal/ZDeath metal/Brutal death metal/             |
|         2 |        2 |     0 | Power metal/                                             |
|         2 |        5 |     1 | Power metal/Symphonic power metal/                       |
|         2 |        6 |     1 | Power metal/Extreme power metal/                         |
|         3 |        3 |     0 | Trash metal/                                             |
|         4 |        4 |     0 | ZDeath metal/                                            |
|         4 |        7 |     1 | ZDeath metal/Brutal death metal/                         |
|         5 |        5 |     0 | Symphonic power metal/                                   |
|         6 |        6 |     0 | Extreme power metal/                                     |
|         7 |        7 |     0 | Brutal death metal/                                      |
|         8 |        1 |     1 | Hard rock/Heavy metal/                                   |
|         8 |        2 |     2 | Hard rock/Heavy metal/Power metal/                       |
|         8 |        3 |     2 | Hard rock/Heavy metal/Trash metal/                       |
|         8 |        4 |     2 | Hard rock/Heavy metal/ZDeath metal/                      |
|         8 |        5 |     3 | Hard rock/Heavy metal/Power metal/Symphonic power metal/ |
|         8 |        6 |     3 | Hard rock/Heavy metal/Power metal/Extreme power metal/   |
|         8 |        7 |     3 | Hard rock/Heavy metal/ZDeath metal/Brutal death metal/   |
|         8 |        8 |     0 | Hard rock/                                               |
+-----------+----------+-------+----------------------------------------------------------+
24 rows in set (0.00 sec)

mysql> SELECT CONCAT(REPEAT('-',t.depth), d.value) AS value
       FROM data_tree t
       JOIN data d ON d.id = t.child_id
       WHERE t.parent_id = 8
       ORDER by t.path;
+--------------------------+
| value                    |
+--------------------------+
| Hard rock                |
| -Heavy metal             |
| --Power metal            |
| ---Extreme power metal   |
| ---Symphonic power metal |
| --Trash metal            |
| --ZDeath metal           |
| ---Brutal death metal    |
+--------------------------+
8 rows in set (0.00 sec)