B-trees—Database Indexes

© 2005 Martin Rinehart


Indexing was invented by libraries. You have Joan Jones classic book, Jones on Widgets shelved under the correct Dewey Decimal number for the "widgets" topic. You have three index cards, one for topic, another for author and a third for title. They are filed in the appropriate indexes. Anyone who knows what book they want goes to the title index. If you are researching widgets you go to the topic index. In any case, the card tells you the correct Dewey Decimal number for this classic book and you can then go to the stacks (or submit a request for someone to go to the stacks) to get the book.

To a database, an index serves the same purpose as a card file serves in a library. Just as a library will have indexes on title, author and subject, a database table will also have multiple indexes. There will usually be two to three indexes for each table, and there may be more. Consider this table.

Customer Table

...
10005432 Jones Thomas 147 North Maple St.   SomeTown AState 54321 800 555 1212
...

The most heavily used index is one based on the record ID. Each record in, for example, a customer table has an ID assigned (typically, an integer number) that is unique to the customer. Find that ID and you find the customer's other data. Insist that your telemarketers ask the customer for their ID number and you are commiting the worst sin in business: customer abuse.

Most tables also have indexes on at least one natural order field. Customer files will be indexed on last name. These will not be unique (unless you have very few customers) so other criteria, such as postal code, will also need to be indexed. You want your telemarketers asking friendly questions, such as, "What is your ZIP code, Mr. Smith?" (That's the U. S. postal code.) And then, "Thank you. Do you still live at ... ?" (At this point the telemarketer should be looking at the data from the customer table, showing Smith's name, address and other data. If your system is first rate it has also retrieved the last few orders placed by Smith, which it gets almost instantly because the sales table is indexed on customer IDs, among others.)

The classic data structure used for indexes is the B-tree (where the "B" stands for balanced, or for its inventor Rudolph Bayer, or for Bayer's employer, Boeing). B-trees (and various improved forms, such as B+trees and B*trees) are the form chosen to index most Relational Database Management Systems (RDBMSs) such as Oracle, DB2 and MySQL. They excel when the amount of data is more than can fit in memory and disk accesses will prove the limiting speed factor. With a B-tree you can find one record in a billion with just 2 or 3 disk reads.

The Classic B-tree

The classic B-tree organizes sequential records into groups called nodes. The node is a combination of keys and pointers to other nodes. The name "tree" is misleading. It is taken from other data processing examples, where a tree grows downward, having a root node at its top. Botanists and back packers do not share this view of trees.

Will the Real Tree Please Stand Up?

picture of large oak tree in field drawing of geometric pyramid, labeling apex and base
Courtesy Wikipedia definitions of "tree" and "pyramid."

A better analogy would be pyramid. A B-tree is a stack of nodes. Let's look at an example using 4-item nodes. (Real nodes might hold 64k items or some other rather large number. They work the same, regardless of number.) In my book Java Database Development I start with an algorithm for building a pyramid from blocks. It went this way.

Pyramid Building

Following Rule 1:

A

With the second block, Rule 3B applies:

A B

With a third block, rule 3A applies:

  C  
A B

Now you can see the pattern. Rule 3B applies, then rule 3A applies once for every level in the pyramid:

  C  
A B D
 
  C E  
A B D
 
  F  
  C E  
A B D

You'll see that this also models the construction of B-tree pyramids.

The B-tree's Nodes

Each node alternates pointers and values, this way:

pointer value pointer value pointer value pointer value pointer

The number of pointers is called the node's order. Since a node starts and ends with a pointer, its order is one more than its number of values, which I call a node's size.

Within each node, the values are stored in ascending sequence. The pointer to the left of the first value points to the node containing values less than the first. The pointer to the right of the first value points to a node containing the values greater than the first, but less than the second. At the bottom level the pointers are null (point to node -1).

The B-tree's Operations

There are three fundamental operations:
  1. Find—search for a value in the B-tree
  2. Insert—add a new value to the B-tree
  3. Delete—delete a value from the B-tree
As the Find operation is part of the Insert and Delete operations, we should cover it first, but it won't make sense until we have a B-tree within which to Find something. So we'll start with insert.

The Insert Operation

Let's look at the creation of a node which has a size of four.

# 1 -1 -1 -1 -1 -1

Add value "Sally":
# 1 -1 Sally -1 -1 -1 -1

Add value "Joan":
# 1 -1 Joan -1 Sally -1 -1 -1

Note that you had to move "Sally" when you entered "Joan". Next add "Alice" and "Zoe":
# 1 -1 Alice -1 Joan -1 Sally -1 Zoe -1

Now your node is filled. Assume you next add "Thelma" to your B-tree. This begins the process of growing the pyramid. The goal is to place the middle value on top of two nodes holding the lower and higher values, this way:

Sally
AliceJoan
Thelma Zoe

First you decide which value will be the middle value after the addition of the new value. In this case, "Sally" is the middle value. Then you follow a three step procedure:

  1. Create another node. It gets all the values above the middle value.
  2. Clear the middle and higher values from the existing node.
  3. Create a third node. It gets the middle value, surrounded by pointers to the node with the lower values and the node with the higher values.
That will end with these three nodes:
# 1 -1 Alice -1 Joan -1 -1 -1

# 2 -1 Thelma -1 Zoe -1 -1 -1

# 3 1 Sally 2 -1 -1 -1

And now we can interrupt this section to talk about the Find operation.

The Find Operation

The topmost node of the pyramid (our node #3) is called the root node. It is where you start searching. You simply scan, left-to-right, for a value greater than the one you want to insert. The new value goes to the left. If you are in a ground-level node (the left pointer is null) you have found the location for an insert. If you are not in a ground-level node, you grab the node pointed to by the pointer and repeat this process. If there is no value greater than the one for which you search, you follow the rightmost pointer.

That's almost all there is to it. What about the case where you are not inserting, but are searching for a value? The procedure is the same but you stop when your comparison returns an exact match. (For some requirements you may want to return the number of exact matches, so a bit more searching is needed. Remember that the first "Smith" in a node at level N may have more "Smith"s both to its left and to its right at level N-1 and right on down to the ground level.)

What about the case where you are inserting and you find an exact match? The typical rule is that you insert after the last exact match (maintaining the original order) but your application may call for something else.

Now that we know how to find things, we can return to inserting, which is much more interesting.

The Insert Operation, Resumed

If you want to add "Anne" you compare her to "Sally" and decide that "Anne" goes to Sally's left. You follow the pointer to the ground-level node, where you see that "Anne" goes between "Alice" and "Joan" giving you three values. Let's add "Arlene", too. Now our pyramid looks like this:

Sally
AliceAnneArleneJoan
Thelma Zoe

Our next insertion is "Barbara" who also goes into the node left of Sally, but that node is full. What now? Again, find the new middle value: "Arlene". We'll need another ground-level node to hold the higher values, "Barbara" and "Joan". We'll leave just the two smaller values, "Alice" and "Ann" in the original ground node and we'll pop the middle value, "Arlene", up into the root node. When we're done it will look like this:

ArleneSally
AliceAnne
BarbaraJoan
Thelma Zoe

What do we do if the root node is full? Actually, we've been there already. Our original ground node was also the root node (as it was the only node). You split the root node into two and push up the middle value into a new node that then becomes the root. Here's the full algorithm.

The Insert Algorithm

This is from Java Database Development, by me.

Assume that we are inserting value V into a B-tree. Ground level is the number of pointers followed from the root to the bottom level.

The Delete Operation

At ground level, you search for the value V that you want to delete. When you find it, you delete it. Above ground level you have to adjust the pointers of V's left and right siblings before you delete. It's almost that simple.

The basic operation of the B-tree (remember that it may stand for balanced tree) has all nodes from 50% to 100% full. Deletions may cause a drop below 50%. If that would happen you do one of two things.

That will maintain the tree's balance, but it may not be very smart. Compared to simply inserting a value into a non-full node, splitting a full node is an expensive operation. By combining two half-full nodes into a single full node, you maximized the probability that the new node will need to do a split operation to accommodate an insert operation.

So in the above discussion replace "half" with a tuning parameter that you can test at, for instance, 40%, 35%, and so on, running your mix of finds, inserts and deletes to see what gives the best overall performance.

The Value Field

So far we have spoken of the value. In truth, we want the value and we need to know what record it comes from. It does no good to report, "I found Smith" in response to a search for "Smith". You have to say, "Smith is at record number 123456."

So the value field is really a value/record pointer pair. That's a pointer to the physical record.

For names there is another consideration. Should we have a fixed length value, or should we have a pointer into a table of variable-length values? You really have to program both and test for best performance with your application. Let the database administrator have a nice set of performance tuning tools.

For a more capable B-tree, see my article, "The B++tree."


java consultant home page icon