Trees vs Webs (Apr 26)

Goldenseal currently manages its accounting data with NeoAccess (a popular object database library in the 1990s). To locate records, it uses a binary tree.

You may have used binary search while looking for something in an old-fashioned paper phone book or dictionary. Open at the middle, and compare it to what you want. Go left or right depending, then open the middle of that half. Repeat with smaller and smaller divisions until you get to the right page.

Binary search is very efficient, especially when the data is huge. It takes 10 steps to cover a thousand items, 20 steps for a million, 30 for a billion, and so on.

It’s easy to set up a binary search in sorted items of identical size. However, that is rarely the case for database records. For those, it’s hard to find the middle, so they are better managed with a binary tree. The tree has a single root, which grows increasingly ‘branchy’ as you go up. To find something you start at the bottom, compare at each node, then move up through the tree. Eventually you arrive at a leaf node, which links to the actual record. This kind of navigational tree is also called an index.

NeoAccess was optimized for speed and small size. Each node included a list of file addresses for its children, and nothing more. Sparseness made sense, back when memory was expensive and processors were slow.

Unfortunately, a lean index is also very fragile. If a NeoAccess node is corrupted, everything above it becomes invisible. Those records are still on the hard drive, but they lack a path that says where they are. NeoAccess had no alternate route. Any damage was fatal.

These days, memory is cheap and processors are very fast. As a result, it’s nearly always better to build software for reliability rather than speed. Probably the best example for bullet-proof design is the Internet. It began as Arpanet, which was meant to keep running even after nuclear annihilation. Turns out, that kind of sturdiness helps, even in a world of small, normal, ordinary snafus.

Instead of a tree, the Internet is a web. Lots of connections from here to there. There is always a way to find something, even if one path breaks. Cell towers work the same way.

Goldenseal Pro still use a tree structure for normal finds. But it also includes more web-like connections and alternate routes. With their help, it should be able to rescue records, even if a tree branch is damaged. The accounting database can even be self-repairing.

Meanwhile, we still need to use the old NeoAccess code to read data from the current Goldenseal accounting software, so it can move into the new Pro format. Unfortunately, the Sales breakdowns in our TurtleSoft data have some corrupted index nodes. More than half can’t be found. Without them, we won’t know what people bought.

Fortunately, coming to the rescue is the File Manager, which we added to NeoAccess in 2002. Its main function is to prevent records from overwriting each other, but it also provides an alternate way to find record locations on the disk. We just added some code to it, so it can read and recover lost records when the tree can’t find them. VoilĂ , it rescued our missing data! The new code will also help any other users with damaged trees.

There is a cost for the extra connections, webbishness and redundancy. Our old TurtleSoft company file was 51 megabytes, and the new one weighs in at 127. It will probably grow even larger, as we add more safety features.

3x the file size for 3x the reliability seems a worthy trade.

Dennis Kolva
Programming Director



Author: Dennis Kolva

Programming Director for Turtle Creek Software. Design & planning of accounting and estimating software.