Using recursive SQL to solve the parent-child problem
I recently came across an interesting SQL problem:
Suppose you are given a table with two columns. The first column contains names of parents, and the second column contains names of their children.
For example, you have the following:
In this example, you have Cornelius as the father of Billy, who is the father of Sarah, who is the Mother of Casey.
Therefore, Cornelius is the great grandfather, or ultimate ancestor of Casey. But how would you determine this if you had a table with thousands or millions of rows in length?
Would it be possible to write SQL code that generates the name of the ultimate parent in the first column and the corresponding child in the second column?
Intuitively, you would want to find instances where the child column has commonalities with the parent column. For example, if you duplicated the above table and joined parent column of the first table with the child column of the second table, you would find that Billy and Sarah are both parents and children. Therefore, Billy nor Sarah cannot be an ultimate ancestor nor an ultimate child.
Notice that joining these two columns links a child with its grandparent. Now if you did this recursively, you would eventually be able to find the relationship between the child and the ultimate parent.
Here is the solution that I came up with using Recursive SQL:
Suppose you are given a table with two columns. The first column contains names of parents, and the second column contains names of their children.
For example, you have the following:
Parent
|
Child
|
Billy
|
Sarah
|
Sarah
|
Casey
|
Cornelius
|
Billy
|
In this example, you have Cornelius as the father of Billy, who is the father of Sarah, who is the Mother of Casey.
Therefore, Cornelius is the great grandfather, or ultimate ancestor of Casey. But how would you determine this if you had a table with thousands or millions of rows in length?
Would it be possible to write SQL code that generates the name of the ultimate parent in the first column and the corresponding child in the second column?
Intuitively, you would want to find instances where the child column has commonalities with the parent column. For example, if you duplicated the above table and joined parent column of the first table with the child column of the second table, you would find that Billy and Sarah are both parents and children. Therefore, Billy nor Sarah cannot be an ultimate ancestor nor an ultimate child.
Notice that joining these two columns links a child with its grandparent. Now if you did this recursively, you would eventually be able to find the relationship between the child and the ultimate parent.
Here is the solution that I came up with using Recursive SQL:
DROP TABLE IF EXISTS ancestor; WITH RECURSIVE ancestor(child, parent,n) AS (( SELECT child ,parent ,1 -- track the number of iterations FROM parent2 ) UNION ( SELECT a1.child ,a2.parent ,n+1 -- add one for each iteration FROM parent2 a1 JOIN ancestor a2 on a1.parent = a2.child -- find cases where a parent is also a child WHERE a1.child <> a2.parent -- keep on iterating until child = parent )) SELECT * INTO TEMP ancestor FROM ancestor ; DROP TABLE IF EXISTS temp1; SELECT a1.child ,max(a1.n) as max_n -- only return max iterations for a given child, in order to find ultimate parent INTO temp1 FROM ancestor a1 GROUP BY a1.child ; SELECT t1.child ,a1.parent ,t1.max_n FROM temp1 t1 join ancestor a1 on t1.child = a1.child and t1.max_n = a1.n ;
Comments
Post a Comment