Using recursive SQL to solve the parent-child problem

By Luwei March 22, 2013

I recently came across an interesting SQL problem:

Suppose you are given a table with two columns. The first column contains names of parents, and the second column contains names of their children.

For example, you have the following:

Parent	Child
Billy	Sarah
Sarah	Casey
Cornelius	Billy

In this example, you have Cornelius as the father of Billy, who is the father of Sarah, who is the Mother of Casey.

Therefore, Cornelius is the great grandfather, or ultimate ancestor of Casey. But how would you determine this if you had a table with thousands or millions of rows in length?

Would it be possible to write SQL code that generates the name of the ultimate parent in the first column and the corresponding child in the second column?

Intuitively, you would want to find instances where the child column has commonalities with the parent column. For example, if you duplicated the above table and joined parent column of the first table with the child column of the second table, you would find that Billy and Sarah are both parents and children. Therefore, Billy nor Sarah cannot be an ultimate ancestor nor an ultimate child.

Notice that joining these two columns links a child with its grandparent. Now if you did this recursively, you would eventually be able to find the relationship between the child and the ultimate parent.

Here is the solution that I came up with using Recursive SQL:

DROP TABLE IF EXISTS ancestor;
WITH RECURSIVE ancestor(child, parent,n) AS
((
SELECT 
   child
  ,parent
  ,1                                              -- track the number of iterations
FROM 
   parent2
)

UNION

(
SELECT
   a1.child
  ,a2.parent
  ,n+1                                            -- add one for each iteration
FROM parent2 a1
   JOIN ancestor a2 on a1.parent = a2.child       -- find cases where a parent is also a child
WHERE a1.child <> a2.parent                       -- keep on iterating until child = parent
))
SELECT * INTO TEMP ancestor FROM ancestor
;

DROP TABLE IF EXISTS temp1;
SELECT 
   a1.child
  ,max(a1.n) as max_n                             -- only return max iterations for a given child, in order to find ultimate parent
INTO temp1
FROM ancestor a1
GROUP BY a1.child
;
SELECT
   t1.child
  ,a1.parent
  ,t1.max_n
FROM temp1 t1 join ancestor a1 on t1.child = a1.child and t1.max_n = a1.n
;

luwei likes data science

Using recursive SQL to solve the parent-child problem

Comments

Post a Comment

Popular posts from this blog

building a chess ai - part 4: learning an evaluation function using deep learning (keras)

Recommendation Systems - Brief overview of Matrix Factorization

Brief intro to recurrent neural networks