Conference on Applied Mathematics
University of Central Oklahoma
Spring 1995
Abstract
Implementing the Heap Sort using Dynamically Allocated Storage.
Bill McDaniel
Heaps, a type of binary tree, are used for priority scheduling algorithms,
because the highest priority item is always at the top of the tree.
In a conventional Heap Sort, numbers are inserted into a virtual binary tree.
The nodes are inserted into the tree in the same fashion as in a breadth
first traverssal. Historically, the Heap Sort has been implemented using
arrays. This paper describes a variation of the Heap Sort that does NOT use
arrays, but uses dynamic storage and pointer variables. Since storage is not
always allocated sequentially, the algorithm for determining the addresses of
the nodes has been changed.
=============================================================================
Introduction:
Heaps, a type of binary tree, are used for priority scheduling algorithms,
because the highest priority item is always at the top of the tree.
The Heap Sort is described as follows:
In a conventional Heap Sort, numbers are inserted into a virtual binary tree.
The nodes are inserted in the same fashion as in a breadth first traverssal.
The following figure...
1
/ \
2 3
/ \ / \
4 5 6 7
/ \ / \ / \ / \
8 9 10 11 12 13 14 15
... demonstrates that a given level will become full before the next level
is started.
As a number (the son) is inserted, it is inserted into the tree at the bottom
and compared with the closest number above (the father), and if it is less
than the father then the numbers are swapped. The swapping process
continues until the new number reaches the top or until the father
is less than the current number.
Historically, the Heap Sort has been implemented using arrays. Calculation
of the addresses (subscripts) of the father and son nodes is straightforward.
Given the location of a son node 's', the location of the father is s / 2.
Given the location of the father 'f', the location of the first son is f*2.
(The location of the other son is f*2+1.)
In the figure above, the father of node 12 is node 6, the father of node 6
is node 3, and the father of node 3 is node 1. The sons of node 3 are 6 & 7.
The sons of node 6 are 12 and 13.
Properties of a Heap
A Heap always has the following properties:
þ The smallest (or the largest) item it always at the root. This paper always
assumes 'smallest at the top', but the algorithm can be easily changed if the
other type tree is desired.
þ One level is filled before another level is started.
þ The location of the first number to be deleted is the same as the location
of the last number inserted.
Heap Sort Using Dynamic Storage
This paper describes a variation of the Heap Sort that does NOT use arrays,
but uses dynamic storage and pointer variables. Since storage is not always
allocated sequentially, the algorithm for determining the addresses of the
sons has been changed.
Differences between the Heap Sort using arrays and using Pointers.
The 2 column text, shown below, list some of the differences between
the two methods of building a heap.
þ The numbers are inserted into the tree as shown below.
Arrays Pointers
1 1
/ \ / \
2 3 2 3
/ \ / \ / \ / \
4 5 6 7 4 6 5 7
/ \ / \ / \ / \ / \ / \ / \ / \
8 9 10 11 12 13 14 15 8 12 10 14 9 13 11 15
þ An array is used to hold the þ Records are used with pointers
numbers. to the different nodes.
þ The location of the father and þ For any given node, the location
son nodes are calculated using of the father cannot be
the formulas given above. determined. The son is chosen
using the 'direction' field (during
insertion), or by using the
link fields (during deletion).
þ The numbers are inserted at the þ The number is compared to the
bottom and 'bubble' up the path root and either swapped or not.
to the root. The insertion In either case, it is then
algorithm proceeds from the bottom compared with the designated
of the tree to the top. son. The insertion algorithm
proceeds from the top of the
tree to the bottom.
þ Only the 'info' part is kept. þ The 'info' is kept in a node
in a tree. To maintain the
structure of the tree, link
fields are also necessary.
Structures Needed
With arrays, the only information that is needed is the information field that
holds the keys. With the pointer method there are several fields that are
necessary in order to maintain the tree. The tree is made up of nodes, and
the structure of a given node, in Pascal, is as follows:
node =
record
links : array [false..true] of node_ptr;
info : integer;
direction : boolean;
end;
Since, there are 2 values that are in the range false..true, then
there will be 2 links in each node: 'links[false]' will serve as the left
link and 'links[true]' will serve as the right link. The 'info' field will
hold the actual data. The 'direction' field will hold the direction value
that is interrogated (and changed) during insertion. These direction values
determine the path that will be used when inserting a new node.
The Insertion Algorithm
The insertion algorighm has 3 basic steps:
1. If the 'info' field of the root (of this subtree) is greater than the
'info' field of the new node, then swap the 2 numbers.
2. Change the current value of the direction field of the root.
3. If the link corresponding to the current direction is null
Then attach the new node to the corresponding link
Else Recurse (passing the link corresponding to the current direction)
The insertion algorithm, in Pascal, follows. The parameter 'root' points to
the root of a subtree, and the parameter 'p' points to the new node to be
inserted.
procedure insert_node_into_heap(var root,p : node_ptr);
begin
if root^.info > p^.info then swap_infos(root^.info , p^.info);
root^.direction := not root^.direction;
if root^.links[root^.direction] = nil then
root^.links[root^.direction] := p
else
insert_node_into_heap(root^.links[root^.direction],p);
end;
The essence of the algorithm is shown in the figures below. The numbers
show the position in the tree where the data will be inserted. (The actual
data may move up in the tree, but that is of no concern at this point.) The
dots beside the numbers represent the value of the direction field. When a
new node is inserted into the tree, the direction field is set to the value
'right'. When a node is involved in the insertion of a new node (ie. in
the path), then the direction field will be changed.
The first number is inserted and the direction field is set to 'right'.
1
The insertion algorithm, for a given node, is: change the direction field,
then GO THAT DIRECTION down the tree. To insert the 2'nd number, change the
direction field of 1'st number (since it is involved in the insertion) to
the value left, then insert the 2'nd number to the left of the 1'st
number. (Note: the direction field of the 2'nd node is 'right'.)
1
/
2
To insert the 3'rd number, change the direction field of 1'st number (since
it is involved in the insertion) to the value right, then insert the 3'rd
number to the right of the 1'st number.
1
/ \
2 3
To insert the 4'th number, change the direction field of 1'st number (since
it is involved in the insertion) to the value left, recurs and drop down a
level, change the value field of 2'nd number to 'left', then insert the 4'th
number to the left of the 2'nd number.
1
/ \
2 3
/
4
To insert the 5'th number, change the direction field of 1'st number (since
it is involved in the insertion) to the value right, recurs and drop down a
level, change the value field of 3'rd number (since it is involved in the
inserted) to 'left', then insert the 4'th number to the left of the
3'rd number.
1
/ \
2 3
/ /
4 5
To insert the 6'th number, change the direction field of 1'st number to the
value left, recurs and drop down a level, change the value field of 2'nd
number to 'right', then insert the 6'th number to the right of the
2'nd number.
1
/ \
/ \
2 3
/ \ /
4 6 5
To insert the 7'th number, change the direction field of 1'st number to the
value right, recurs and drop down a level, change the value field of 3'rd
number to 'right', then insert the 6'th number to the right of the
3'rd number.
1
/ \
/ \
2 3
/ \ / \
4 6 5 7
The reader will note that, starting at the root, the direction fields
determine the path to the last item inserted.
The Deletetion Algorithm
If a deletion (of the last item inserted) is required, then the algorithm
can be described by the following.
Starting at the root, and for each node, change the direction field, then go
the OTHER directon down the tree. The last number inserted is the 7. To find
the 7'th number (for purposes of deletion), and starting at the root, change
the direction of 1 to left, then go right. Change the direction of the 3'rd
number to left then look to the right for the 7. Delete the 7'th number.
1
/ \
/ \
2 3
/ \ /
4 6 5
The status of the tree is EXACTLY the way it was before the 7'th number
was inserted.
The deletion algorighm has 2 basic steps.
1. Change the current value of the direction field of the root.
3. If the link corresponding to the OTHER direction is null
Then delete the node corresponding to the other dirction
Else Recurse (passing the link corresponding to the other direction)
The deletion algorithm, in Pascal, follows. The parameter 'root' points to
the root of a subtree.
function delete_last_one(var root : node_ptr) : integer;
var
item : integer;
begin
root^.direction := not root^.direction;
if root^.links[not root^.direction] = nil then
begin
item := root^.info;
dispose(root);
root := nil;
delete_last_one := item;
end
else
delete_last_one := delete_last_one(root^.links[not root^.direction]);
end;
Restructuring After Deletion
Normally, when a node is deleted, the information portion of the root is
processed and then replaced by the last number inserted into the tree. The
last node is then deleted (as described in the previous paragraphs). The
tree is (probably) no longer a heap, since the value in the root node is
(probably) greater than one of the sons. Before any more insertion or
deletion operations can be performed on the tree, the values in the tree
must be moved so the heap is maintained. The restructuring algorithm
involves swapping the number (that is currently in the info portion of the
root), down the tree until the heap is restored. The restructuring algorithm
is as follows:
If the root of the current subtree (a parameter) has no descendents then
exit (we're done), else assign q to point to the smaller of the 2 sons.
If the info of the node pointerd to by the root is greater then the info
of the node pointed to by q then swap the 2 info's pointed to by root and
q, and then recurse, passing q.
Using the restructuring algorithm, the number that is in the root of the
tree is swapped with other numbers below it until it's father is less than and
both sons are greater than the number.
Conclusion
Heaps, using pointer variables, can be a viable alternative when the size
of arrays are limited, as in Turbo Pascal when the size of the data space
is less then 64K.
If one wanted to build a heap of strings (with 256 characters in each string),
the following declaration would generate a compile time error (in Turbo
Pascal) because the total space used is 256000 bytes, which exceeds the 64K
limit.
var
string_array : array [1..1000] of string;
However, since storage can be allocated dynamically, the entire tree can be
stored in 265000 bytes. (Each node would require 256 bytes for the string,
8 bytes for the links, and 1 byte for the direction field.)
The Pascal Code
A complete program implementing dynamic heaps written in Turbo Pascal follows.
program heaptr;
{ Build a heap using pointer variables }
uses
crt;
const
ll = false;
rl = true;
type
node_ptr = ^ node;
node =
record
links : array [false..true] of node_ptr;
info : integer;
direction : boolean;
end;
procedure swap_integers(var i,j : integer);
var
t : integer;
begin
t := i; i := j; j := t
end;
procedure insert_node_into_heap(var root,p : node_ptr);
{ insert a node ( p->node ) into a heap whose root is pointed to by root }
begin
if root^.info > p^.info then swap_integers(root^.info , p^.info);
root^.direction := not root^.direction;
if root^.links[root^.direction] = nil then
root^.links[root^.direction] := p
else
insert_node_into_heap(root^.links[root^.direction],p);
end;
function delete_last_one(var root : node_ptr) : integer;
{ delete the last node inserted into the tree, and change the direction
fields so that tree is the status of the tree (before the insertion
occurred) is restored }
var
item : integer;
begin
root^.direction := not root^.direction;
if root^.links[not root^.direction] = nil then
begin
item := root^.info;
dispose(root);
root := nil;
delete_last_one := item;
end
else
delete_last_one := delete_last_one(root^.links[not root^.direction]);
end;
procedure restructure(var root : node_ptr);
{ This routine is invoked when a node is deleted }
var
q : node_ptr;
begin
if root^.links[ll] <> nil then
begin
{ assign q to point to the smaller of the 2 sons }
if root^.links[rl] = nil then
{ only 1 son - point q to it }
q := root^.links[ll]
else
{ 2 sons - point q to the smaller }
q := root^.links[root^.links[ll]^.info >= root^.links[rl]^.info];
{ if the number in the current node is out of place then swap it out
and go down the tree }
if root^.info > q^.info then
begin
swap_integers(root^.info,q^.info);
restructure(q)
end
end
end;
var
i,j : integer;
p,root,last : node_ptr;
begin
ClrScr;
{ generate some random numbers and insert them into the tree }
root := nil;
for i := 1 to 15000 do
begin
j := random(1000);
{write(j:4);}
new(p);
with p^ do
begin
links[false] := nil;
links[true] := nil;
info := j;
direction := true
end;
if root = nil then
root := p
else
begin
insert_node_into_heap(root,p);
end;
end;
writeln;
{ display the numbers in order }
writeln('output');
while root <> nil do
begin
write(root^.info:4);
if root^.links[root^.direction] = nil then
begin
j := root^.info;
root := nil
end
else
j := delete_last_one(root);
if root <> nil then
begin
root^.info := j;
restructure(root);
end
end;
writeln;
end.
Further Research
An obvious extension to Heaps is Tri-Heaps and Quad-Heaps using pointers.
A Tri-Heap (Quad-Heap)is a heap where each node can have 3 (4) sons. The
problem of designing an algorithm for determining the locations of the sons
in Tri_Heaps and Quad Heaps has been left for the reader.
References:
Heaps (using arrays) is a well-known topic in Computing Science and
descriptions of the algorithms can be found in nearly any book about
Data Structures.