Recursive query in SQL can be as useful as recursive functions that developers use all the time. Snowflake has documentations on how to use CTE to construct recursive queries. Example can be found here: Working with CTEs. It uses a simple company employees hierarchy to demonstrate how to run recursive queries, and it also describes in a bit of detail on how it works. However, the doc does not detail in what exactly happened when query was run. So I think it might be helpful to write a bit of my own understanding and share on my blog.
To start with, below are the queries from the official Snowflake doc that were used to demonstrate this. I will copy them below for your convenience:
CREATE OR REPLACE TABLE EMPLOYEES ( TITLE VARCHAR, EMPLOYEE_ID INTEGER, MANAGER_ID INTEGER ); INSERT INTO EMPLOYEES (TITLE, EMPLOYEE_ID, MANAGER_ID) VALUES ('President', 1, NULL), -- THE PRESIDENT HAS NO MANAGER. ('Vice President Engineering', 10, 1), ('Programmer', 100, 10), ('QA Engineer', 101, 10), ('Vice President HR', 20, 1), ('Health Insurance Analyst', 200, 20);
-- This is the WITH clause, of course. WITH RECURSIVE MANAGERS -- Column names for the "view"/CTE (INDENT, EMPLOYEE_ID, MANAGER_ID, EMPLOYEE_TITLE) AS -- Common Table Expression ( -- Anchor Clause SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' UNION ALL -- Recursive Clause SELECT INDENT || '--- ', EMPLOYEES.EMPLOYEE_ID, EMPLOYEES.MANAGER_ID, EMPLOYEES.TITLE FROM EMPLOYEES JOIN MANAGERS ON EMPLOYEES.MANAGER_ID = MANAGERS.EMPLOYEE_ID ) -- This is the "main select". SELECT INDENT || EMPLOYEE_TITLE AS TITLE, EMPLOYEE_ID, MANAGER_ID FROM MANAGERS; +----------------------------------+-------------+------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------------------------+-------------+------------| | President | 1 | NULL | | --- Vice President Engineering | 10 | 1 | | --- Vice President HR | 20 | 1 | | --- --- Programmer | 100 | 10 | | --- --- QA Engineer | 101 | 10 | | --- --- Health Insurance Analyst | 200 | 20 | +----------------------------------+-------------+------------+
In this simple example, we can see that there are 3 levels in the hierarchy, so it is clear that there will be 3 JOINs combined with 2 unions, to concatenate the result of those 3 JOINs. Let’s see what happens here step by step.
Firstly, we need to find the top level employee as the starting point. In this case is the President. The query is obvious as we have seen from the example above:
SELECT TITLE AS EMPLOYEE_TITLE, EMPLOYEE_ID, MANAGER_ID FROM EMPLOYEES WHERE TITLE = 'President'; +----------------+-------------+------------+ | EMPLOYEE_TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------+-------------+------------| | President | 1 | NULL | +----------------+-------------+------------+
Secondly, after we have found the President, we need to find the employees who are immediately below President. This is done by a simple JOIN using employees table with the President data that we have got from step 1. Query as below:
SELECT '--- ' || VICE_PRESIDENT.TITLE AS EMPLOYEE_TITLE, VICE_PRESIDENT.EMPLOYEE_ID, VICE_PRESIDENT.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID; +--------------------------------+-------------+------------+ | EMPLOYEE_TITLE | EMPLOYEE_ID | MANAGER_ID | |--------------------------------+-------------+------------| | --- Vice President Engineering | 10 | 1 | | --- Vice President HR | 20 | 1 | +--------------------------------+-------------+------------+
Once this step is done, we have two data sets ready, one for the President, and another for Vice President, so Snowflake will combine them now with UNION ALL as below:
SELECT TITLE AS EMPLOYEE_TITLE, EMPLOYEE_ID, MANAGER_ID FROM EMPLOYEES WHERE TITLE = 'President' UNION ALL SELECT '--- ' || VICE_PRESIDENT.TITLE AS EMPLOYEE_TITLE, VICE_PRESIDENT.EMPLOYEE_ID, VICE_PRESIDENT.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID;
This combined resultset is then saved and ready to be combined again after next iteration finishes.
Now, we have found the Vice Presidents, we can then use it to find all the employees whom they manage. This, again, can be achieved by JOINing back with employees table to find out whose manager_ids are 10 and 20:
SELECT '------ ' || DEVELOPERS.TITLE AS EMPLOYEE_TITLE, DEVELOPERS.EMPLOYEE_ID, DEVELOPERS.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID JOIN EMPLOYEES DEVELOPERS ON DEVELOPERS.MANAGER_ID = VICE_PRESIDENT.EMPLOYEE_ID ; +---------------------------------+-------------+------------+ | EMPLOYEE_TITLE | EMPLOYEE_ID | MANAGER_ID | |---------------------------------+-------------+------------| | ------ Programmer | 100 | 10 | | ------ QA Engineer | 101 | 10 | | ------ Health Insurance Analyst | 200 | 20 | +---------------------------------+-------------+------------+
Again, the resultset returned at this stage will be combined with the previous resultset.
At this stage, we have got all the employees data, including President, Vice President and Developers. However, Snowflake does not know we have done yet. It will have to perform another call, so find the employees who are managed by those Programmer/QA Engineer and Analyst. So the query looks something like below:
SELECT '------ ' || DEVELOPERS.TITLE AS EMPLOYEE_TITLE, DEVELOPERS.EMPLOYEE_ID, DEVELOPERS.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID JOIN EMPLOYEES DEVELOPERS ON DEVELOPERS.MANAGER_ID = VICE_PRESIDENT.EMPLOYEE_ID JOIN EMPLOYEES MORE_EMPLOYEES ON MORE_EMPLOYEES.MANAGER_ID = DEVELOPERS.EMPLOYEE_ID ; +----------------+-------------+------------+ | EMPLOYEE_TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------+-------------+------------| +----------------+-------------+------------+
We can see that this time, the resultset returned is empty, now Snowflake understands that we have reached the end of the recursive call. So the combined UNION ALL resultsets will be made available to the outside SELECT query.
Of course, this is not the exact query that will be run by Snowflake behind the scene, but you get the idea of what had happened.
So to conclude, the effect of the CTE recursive query that was used in the beginning of the post can be rewritten as below UNION ALL query:
SELECT TITLE AS EMPLOYEE_TITLE, EMPLOYEE_ID, MANAGER_ID FROM EMPLOYEES WHERE TITLE = 'President' UNION ALL SELECT '--- ' || VICE_PRESIDENT.TITLE AS EMPLOYEE_TITLE, VICE_PRESIDENT.EMPLOYEE_ID, VICE_PRESIDENT.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID UNION ALL SELECT '--- --- ' || DEVELOPERS.TITLE AS EMPLOYEE_TITLE, DEVELOPERS.EMPLOYEE_ID, DEVELOPERS.MANAGER_ID FROM ( SELECT '' AS INDENT, EMPLOYEE_ID, MANAGER_ID, TITLE AS EMPLOYEE_TITLE FROM EMPLOYEES WHERE TITLE = 'President' ) AS PRESIDENT JOIN EMPLOYEES VICE_PRESIDENT ON VICE_PRESIDENT.MANAGER_ID = PRESIDENT.EMPLOYEE_ID JOIN EMPLOYEES DEVELOPERS ON DEVELOPERS.MANAGER_ID = VICE_PRESIDENT.EMPLOYEE_ID ; +----------------------------------+-------------+------------+ | EMPLOYEE_TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------------------------+-------------+------------| | President | 1 | NULL | | --- Vice President Engineering | 10 | 1 | | --- Vice President HR | 20 | 1 | | --- --- Programmer | 100 | 10 | | --- --- QA Engineer | 101 | 10 | | --- --- Health Insurance Analyst | 200 | 20 | +----------------------------------+-------------+------------+
You can compare the result we got using the recursive query above to confirm the output:
+----------------------------------+-------------+------------+ | TITLE | EMPLOYEE_ID | MANAGER_ID | |----------------------------------+-------------+------------| | President | 1 | NULL | | --- Vice President Engineering | 10 | 1 | | --- Vice President HR | 20 | 1 | | --- --- Programmer | 100 | 10 | | --- --- QA Engineer | 101 | 10 | | --- --- Health Insurance Analyst | 200 | 20 | +----------------------------------+-------------+------------+
This concludes this post. In my next post, I will discuss using another type of recursive query using START WITH …. CONNECT BY to achieve the same goal.
Comments are welcome!