Hierarchical SQL
Pages: 1, 2
Searching for Superiors
Given a node, find all of its superiors. This requires disassembling the path back into the identifiers that constructed it. We can use a table of sequential integers to find the required substrings:
SELECT SUBSTRING (P1.path_string
FROM (seq * CHAR_LENGTH(P1.emp_id))
FOR CHAR_LENGTH(P1.emp_id)) AS emp_id
FROM Personnel_OrgChart AS P1,
Sequence AS S1
WHERE P1.emp_id = :search_emp_id
AND S1.seq <= CHAR_LENGTH(path_string)/CHAR_LENGTH(emp_id);
The problem is that this does not tell you the relationships among the superiors, only who they are. Those relationships are actually easier to report.
SELECT P2.*
FROM Personnel_OrgChart AS P1,
Personnel_OrgChart AS P2
WHERE P1.emp_id = :search_emp_id
AND POSITION (P2.path_string IN P1.path_string) = 1;
Deleting a Subtree
Given a node, delete the subtree rooted at that node. This can use the same predicate as for finding the subordinates.
DELETE FROM Personnel_OrgChart
WHERE path_string LIKE (SELECT path_string FROM Personnel_OrgChart
WHERE emp_id = :dead_guy) || '%';
Deleting a Single Node
Once more we have to face the problem that removing a non-leaf node from a tree means that it is no longer a tree. We need to have rules for changing the structure.
Assuming that we simply move everyone up a level in the tree, we can first
remove that node emp_id from the Personnel_OrgChart
table and then remove that emp_id from the paths of the other
nodes.
BEGIN ATOMIC
DELETE FROM Personnel_OrgChart
WHERE emp_id = :dead_guy;
UPDATE Personnel_OrgChart
SET path_string = REPLACE (path_string, :dead_guy, '');
END;
There are other methods of rebuilding the tree structure after deleting a
node: promoting a subordinate based on some criteria to the newly vacant
position or leaving a vacancy (a dummy employee named {{vacant}})
in the organizational chart are options. These options usually require some
combination of node deletions and insertions.
Inserting a New Node
When inserting a new node, simply concatenate the new emp_id to
the end of the path of its parent node.
INSERT INTO Personnel_OrgChart
VALUES (:new_guy, :new_emp_id,
(SELECT path_string
FROM Personnel_OrgChart
WHERE emp_id = :new_guy_boss)
|| :new_emp_id);
This basic statement can work for insertion of a subtree with some modification:
INSERT INTO Personnel_OrgChart
SELECT N1.emp, N1.emp_id,
(SELECT path_string
FROM Personnel_OrgChart
WHERE emp_id = :new_tree_boss)
|| N1.emp_id
FROM NewTree AS N1;
Splitting Up a Path String
The path string contains information about the nodes in the path it
represents, so you will often want to split it back into the nodes it
represents. This is easier to do if the path string uses a separator character, such as a comma or slash. I will use a slash so this will look like a directory
path in UNIX. You will also need a table called Sequence that contains a set of integers from 1 to (n).
CharIndex(<search string>, <target
string>, <starting position>) is a vendor version
of the Standard SQL function POSITION(<search string> IN
<target string>). It begins the search at a position in
the target string, thus when the <starting position< = 1, the
two are equivalent. One definition is:
CREATE FUNCTION CharIndex (IN search_str VARCHAR(1000),
IN target VARCHAR(1000), IN start_point INTEGER) RETURNS INTEGER
RETURN
(POSITION (search_str
IN SUBSTRING (target FROM start_point)) + start_point -1);
Version 1
SELECT CASE WHEN SUBSTRING('/' || P1.path_string ||
'/' FROM S1.seq FOR 1) = '/'
THEN SUBSTRING('/' || P1.path_string || '/' FROM (S1.seq +1)
FOR CharIndex('/', '/' || P1.path_string || '/', S1.seq +1)
- S1.seq - 1)
ELSE NULL END AS emp_id
FROM Sequence AS S1, Personnel_OrgChart AS P1
WHERE S1.seq BETWEEN 1 AND CHAR_LENGTH('/' || P1.path_string || '/') - 1
AND SUBSTRING('/' || P1.path_string || '/' FROM S1.seq FOR 1) = '/'
Version 2
This uses the same idea, but with two sequence numbers to bracket the
emp_id embedded in the path string. It also returns the position
of the subordinate emp_id in the path.
CREATE VIEW Breakdown (emp_id, step_nbr, subordinate_emp_id)
AS
SELECT emp_id,
COUNT(S2.seq),
SUBSTRING ('/' || P1.path_string || '/', MAX(S1.seq || 1)
FROM (S2.seq - MAX(S1.seq || 1))
FROM Personnel_OrgChart AS P1, Sequence AS S1, Sequence AS S2
WHERE SUBSTRING ('/' || P1.path_string || '/', S1.seq, 1) = '/'
AND SUBSTRING ('/' || P1.path_string || '/', S2.seq, 1) = '/'
AND S1.seq < S2.seq
AND S2.seq <= CHAR_LENGTH(P1.path_string) +1
GROUP BY P1.emp_id, P1.path_string, S2.seq;
The S1 and S2 copies of Sequence help to locate bracketing pairs of
separators. The entire set of substrings located between them is extracted in
one step. The trick is to be sure that the left-hand separator of the
bracketing pair is the closest one to the second separator. The
step_nbr column tells you the relative position of the subordinate
employee to the employee in the path.
Version 3
This version is the same as above, but more concise and easy to comprehend.
SELECT SUBSTRING('/' || P1.path_string || '/'
FROM S1.seq +1
FOR CharIndex('/',
'/' || P1.path_string || '/',
S1.seq +1)- S1.seq - 1) AS node
FROM Sequence AS S1, Personnel_OrgChart AS P1
WHERE SUBSTRING('/' || P1.path_string || '/'
FROM S1.seq FOR 1) = '/'
AND seq < CHAR_LENGTH('/' || P1.path_string || '/');
Version 4
Here's another way, using the LIKE predicate:
SELECT SUBSTRING(P1.path_string
FROM seq +1
FOR CharIndex('/', P1.path_string, S1.seq +1) - (S1.seq +1))
FROM Sequence AS S1
INNER JOIN
(SELECT '/' || path_string || '/'
FROM Personnel_OrgChart) AS P1.(path_string)
ON S1.seq <= CHAR_LENGTH(P1.path_string)
AND SUBSTRING(P1.path_string
FROM S1.seq
FOR CHAR_LENGTH(P1.path_string))
LIKE '/_%';
The Edge Enumeration Model
So far, we have seen the node enumeration version of the path enumeration
model. In the edge enumeration model, the "driving directions" for following
the path from the root to each node are integers. The path column contains a
string of the edges that make up a path from the root (Albert) to
each node, numbering them from left to right at each level in the tree.
Personnel_OrgChart
emp_name edge_path
==================
'Albert' '1.'
'Bert' '1.1.'
'Chuck' '1.2.'
'Donna' '1.2.1.'
'Eddie' '1.2.2.'
'Fred' '1.2.3.'
For example, Donna is the second child of the first child
(Chuck) of the root (Albert). This assigns a partial
ordering to the nodes of the trees. The main advantage of this notation is
that you do not have to worry about long strings, but there is no real
difference in the manipulations. The numbering does give an implied ordering
to siblings that might have meaning.
Summary
Path enumeration models have problems with deeper trees and with trees that do not have a natural way of naming the nodes or edges. Maintaining proper constraints can be difficult in actual SQL products because of the lack of support for Full SQL-92 features.
On the plus side, path enumeration is a very fast and natural technique for trees without great depth and frequent changes to the structure. If you perform most of your searches and aggregations from the root down, it can handle surprisingly wide tree structures.
Joe Celko was a member of the ANSI X3H2 Database Standards Committee from 1987 to 1997 and helped write the ANSI/ISO SQL-89 and SQL-92 standards.
Return to ONLamp.com.
-
Find ALL parents
2010-07-15 13:58:36 clownbaby [View]
-
Great but Some More Commentary Would Be Helpful
2009-01-19 06:43:01 anywaste [View]
-
path strings
2004-11-13 06:37:26 r937 [View]
-
Hierarchical SQL
2004-10-26 14:45:13 dev@ [View]
-
How about using integer type for path?
2004-08-19 01:44:10 Carfield Yim [View]
-
tangentially related....
2004-08-09 04:09:42 bazzargh [View]
-
You might want to run the numbers ...
2004-08-07 10:14:24 --CELKO-- [View]
-
Same old non-solution
2004-08-06 19:07:45 poodlebum [View]



