The hierarchical data storage problem is a non-trivial task in relational database context. For example, your online shop has goods of different categories and subcategories creating tree spans for 5 levels. How should they be stored in a database?

Luckily, there are several approaches (design patterns) that will help the developer to design database structure without both odd tables and code. As a result, the site will work faster and any changes, even on database layer, won’t cause troubles. We will study these approaches below.

Adjacency List

This design pattern is the simplest solution to store and work with hierarchical data structures. Below you can see what it looks like:

Representation of hierarchy structure for adjacency table

Representation of hierarchy structure for adjacency table

Quite simple, isn’t it? The only thing needed is to add some explanatory statements to furnish a thorough understanding of the adjacency list:

  • Parent_id is a foreign key which is connected to category_id.
  • The no-children nodes are called leaf nodes. The children and grandchildren are called descendants, and parents with grandparents are ancestors. Root node is an element that has no parents. In our case, it is an Electronics element.

 The last comment is valid for every design pattern which will be mentioned in this article. Now, let’s create a table to see how an adjacency list works in practice.

Note, we use the foreign key with ON DELETE CASCADE ON UPDATE CASCADE – it allows deleting and updating foreign key values.

Firstly, we add a root node element:

After that, we can add the root node’s descendants
And the next levels:
The listing shows how we realize the structure on picture 1. As you can see, there is nothing hard at all. All we need is to insert values with the right parent_id.

Now let’s consider approaches for how to use some specific operations in a context of an adjacency list.

To get the root element:

The result is:

design-patterns-hierarchical-data-storage (1)

Finding Television’s children:

Here they are:

design-patterns-hierarchical-data-storage (2)

 

 

 

Let’s go to more engaging operations. To unfold the full tree we need:

Self-join unfolds the full tree. As a result we have:

design-patterns-hierarchical-data-storage (3)

 

 

 

 

And the last example – getting all leaf nodes. The leaf nodes are elements which have no children.

And here are all our no-children elements:

design-patterns-hierarchical-data-storage

 

 

 

 

So, if you are creating tree-structured data, an adjacency list is a convenient pattern to ease your DB design process.

Materialized Path

This design pattern is usually used for the realization of a comment system on a website. Why is it convenient to use a materialized path? Let’s create a comment table and look into the matter:

In the insertions listed above, we add new comments and paths that show the hierarchy of comments.

design-patterns-hierarchical-data-storage

For example, path 1/3/5 equals Nice article->really nice->good

To get all comments that are related to 1/3/[value] hierarchy:

design-patterns-hierarchical-data-storage

 

 

To delete the hierarchy which starts with “wow” comment

Result:

design-patterns-hierarchical-data-storage

 

 

 

Materialized Path, or path enumeration, isn’t the best choice to use with relational databases. In the MongoDB database it works more easily and doesn’t look as frightful as in the examples above.

Nested sets

Another approach in building hierarchical structures is Nested sets. The main pattern attribute is the presence of left key and right key. The main advantage of nested sets is data selection speed. Main disadvantage – changes in the DB are time-consuming, because it’s necessary to change left and right keys.

To demonstrate the basic capabilities of nested sets, we will create a tree table and fill it with data:

At first glance, it can be hard to understand what the values of keys rgt and lft mean. Suppose the root elements are lft = 1 and rgt = 28. It means that number of elements equals 28/2 = 14 (the golden rule is that the root’s rgt value is always a double of the total number of nodes in DB).

The root elements lft = 1 and rgt = 28, and all child elements in the range lft >1 and rgt < 28 belong to the main_branch1 element. This works for other elements (in an example below you will see what it looks like).

design-patterns-hierarchical-data-storage

 

 

 

 

Retrieving data from our tree by nested sets is much easier than in the Adjacency list. We avoid lots of JOINs when using Nested sets. The SQL request below is more expanded compared to the SQL query above, but it’s easier in use to select the needed values:

design-patterns-hierarchical-data-storage

 

 

 

 

See, how easy the selection of data in hierarchic structures can be. Now we are going to the last design pattern – Closure Table.

Closure table

One of the most popular and cross functional ways to store hierarchic structures in a relational database is the Closure table. A simple and elegant solution that allows you to work with tables without extra harassment and to enjoy the process.

Instead of the concepts listed above, the Closure table works by using two tables. The first table will store basic information, while the second stores the connections between entities (ancestor-descendant relations).

Basic table:

Relations table:
Adding needed data:
To read data, we join the family and relation tables. Thanks to the relation @SON-@SON we are able to get all the son’s side members including their ancestor:
design-patterns-hierarchical-data-storage

 

 

 

 

If we don’t want to get ancestor, we can simply add r.depth > 0

design-patterns-hierarchical-data-storage (8)design-patterns-hierarchical-data-storage

 

 

 

 

Developers all over the world strongly recommend using a Closure table, because it’s easy both in understanding and use. Now we will compare all approaches to find the best solution.

Making the best decision

After considering each concept, we are able to compare and find the best design pattern. As the first step, let’s highlight the main demands for each design pattern for hierarchical data:

  • Minimal number of requests (for example, we want to use only one query to get branch)
  • High-performance and low request runtime
  • Data validity
  • Convenient working with MySQL. It is the most popular DBMS which, unfortunately, doesn’t support recursion, and it would be good if the purchased concept furnishes us with reliable service.

The table below shows which hierarchical data storage concept is most convenient and reliable:

Adjacency list Materialized Path Nested Sets Closure Table
Number of tables 1 1 1 2
Selection one descendants Easy Hard Hard Easy
One element deletion Hard Easy Hard Easy
One element adding Easy Easy Hard Easy
Moving the branch Easy Easy Hard Easy
Data validity Yes No No Yes

So, Closure table is the best option. Of course, it has its own disadvantages, but in fact, considering a developer’s basic needs, Closure table would be a right variant for everyone who wants to implement the hierarchical structure in his project. More about considered concepts you can read in the book SQL Antipatterns: Avoiding the Pitfalls of Database Programming (pdf).