Goldenseal Pro Database (June 28)

Goldenseal stores a lot of data. Many users have more than 100,000 records in their company file, and some have millions. The code that keeps track of everything is an important part of our ‘back end’. It needs to access records quickly, and store them reliably on disk.

For the original Goldenseal, we licensed an object database called NeoAccess. It ran on both Mac and Windows, and was very popular in the 1990s. Unfortunately, the source code had serious bugs that were never fixed, and we ended up rewriting a lot of it (especially after its developer folded their business).

For Goldenseal Pro, we looked at a couple dozen open-source and commercial database libraries, both object and relational. Using a 3rd-party library has its advantages: someone else did all the work, maybe they are an expert, and maybe they will keep it updated. However, as we found with NeoAccess, other people’s code also has disadvantages. For one thing, it’s usually easier to fix one’s own bugs and design flaws. Every library also has a learning curve, plus quirks and limitations. There is no perfect way to manage a database.

After a few trials, we finally decided to write our own database code for Goldenseal Pro. Since we already rebuilt half of someone else’s database, designing a whole new one didn’t seem too risky. We kept some of the design from NeoAccess, but replaced the parts that didn’t work well.

The main thing any database needs is a list of where each record is in the file. NeoAccess used a ‘binary tree’, which is theoretically the fastest way to find things, using the least amount of memory. However, modern computers are designed to move relatively big blocks of memory, so it makes sense to have a tree that is ‘wide’ (with more records in each branch, and fewer branches). A wide tree gives faster record access on modern machines, and also uses simpler, cleaner code.

We wrote the first part of the database last summer, then tested and debugged it over the winter. The past few months we finished the remaining ‘tree’ code, to handle larger numbers of records.

If you want to see what C++ source code looks like, here is our function that looks for an empty spot to index a new record, and creates a new index if the current one is full:

SInt64 DB_RecordIndex::AddObjectToIndex(const DB_PersistentObject *aObject, CTCS_FileStream *inStream)

{

TCS_FailNILMsgID(aObject, errID_BadObject);

TCS_FailNILMsgID(inStream, errID_BadStream);

SRecordIndexInfo info;

if (mTreeLevel > 0)

{ // add to the last subindex

DB_RecordIndex *lastIndex = FetchLastSubindex();

DB_ObjectWatcher watcher (lastIndex);

SInt64 subMark = lastIndex->AddObjectToIndex(aObject, inStream);

if (subMark != 0) return subMark; // was added to subindex

if (mRecordArray.GetAvailableSpace() > 0 && !IsFilled())

{ // still room for more subindexes here, so add one

return AddSubindex(aObject, inStream);

}

else if (IsRootIndex())

{ // Need to increase tree level by one

mRecordArray.FetchFirstItem(info);

subMark = ShiftItemsToSubindex(inStream, info.startID);

IncrementTreeLevel();

return AddSubindex(aObject, inStream);

}

else

{ // subindex is full. Pass back to parent

SetIsFilled();

return 0;

}

else if (mRecordArray.GetAvailableSpace() > 0 && !IsFilled())

{ // base level index with room here.

info.fileMark = aObject->getFileMark();

info.itemID = aObject->GetDBID();

mRecordArray.Append(info);

MakeDirty();

return info.fileMark;

}

else if (IsRootIndex())

{ // the array is full, so it’s time to become a new parent!

mRecordArray.FetchFirstItem(info);

ShiftItemsToSubindex(inStream, info.itemID);

IncrementTreeLevel();

return AddSubindex(aObject, inStream);

}

else return 0; // this base index is full. Pass back to the parent

}

There are plenty of quirks to managing a database. We need to keep track of empty spaces, so the file stays compact. We also store each record’s location in two different places, so it will be possible to fix a corrupted index (something that NeoAccess could not do).

To keep the database code simpler, we take advantage of the fact that record IDs are always increasing, so we can just add them at the end and have a sorted list. We also just leave empty spaces when you delete a record, rather than rearrange the whole tree. Our primary goal is to make the code reliable and easy to maintain. So far, that approach has been successful.

Dennis Kolva
Programming Director
TurtleSoft.com

Author: Dennis Kolva

Programming Director for Turtle Creek Software. Design & planning of accounting and estimating software. View all posts by Dennis Kolva