Database Reliability- Part 3 (Dec 5)

The most important task for any database is to remember where each record is, within its file. To accomplish that, our accounting and estimating software uses a binary tree (b-tree) to store record locations. It has a different tree for each class of records. Every tree has a root (starting point) and nodes (branches). Nodes at the end are “leaf nodes”, which store the record locations.

A true binary tree has 2 branches at each node, which makes a tall tree (11 levels for 1,000 records, or 21 levels for a million). The NeoAccess database we’ve used for Goldenseal versions 1.0 to 4.9x came with a default of 8 branches per node. We upped it to 32. For unknown reasons NeoAccess also had two levels at each node, so it took 8 levels for 1,000 records, or 12 levels for a million. Still a rather complex tree.

Back in the 1980s and 90s, RAM and hard drives were cramped, expensive, and relatively slow. Because of that, it was best to keep nodes small. To save space, NeoAccess also didn’t create nodes until there were records to fill them. As a result, it scattered nodes randomly throughout the file (a bit more than 1 node for every 32 records). To save space, NeoAccess also stored very little data in each node. They only contained 4 or 8 bytes for each branch: just a file position, plus a record ID in leaf nodes.

We started to use Goldenseal to run TurtleSoft in 2000, just before releasing version 1.0. Unfortunately, a few NeoAccess nodes in our company file are corrupted, and have been for more than 10 years. When we try to look at the affected records, the node points to a very wrong file location. The OS is not happy with that, so it crashes.

We’ve spent dozens of hours stepping through code and looking at our raw file data, trying to find the errors. The problem is, our company file contains over 9,000 NeoAccess nodes, and they all look identical. If we could find the lost or damaged nodes in a raw file editor, we might be able to fix them. Unfortunately, it’s the proverbial needle in haystack.

Over the years, we also have looked at 30 or 40 corrupted Goldenseal files sent in by users.  When we scanned them with a raw file editor, about half contained data from other programs entirely. Our code has no access to anything beyond our file, so we didn’t put it there. Those files probably were damaged when the OS or hard drive firmware got confused, and wrote someone else’s data on top of ours. A few bad files were caused by TurtleSoft bugs, but we haven’t seen any of those since 2006. The rest were probably caused by small NeoAccess errors: possibly due to unknown bugs in their code, possibly from “bit rot” screwing up file addresses. Change one bit in the middle of a tree, and the whole thing stops working.

For Goldenseal Pro, our goal is to make the database less fragile, and more repairable. Your hardware (and the OS) will never be 100% perfect, but we can at least make it possible to diagnose and repair any database errors.

As a first step, Goldenseal Pro creates all its basic tree structures right at the beginning. Doing so has a cost. New, empty Goldenseal Pro files start out at 6+ megabytes (compared to 106K for original Goldenseal). A file that big would have been absurdly huge in the 1980s or 90s, but now it’s just a few millionths of a hard drive. The new setup puts the most important database structures in the same places in every file, and they won’t ever move. A uniform file structure will be much easier to diagnose and repair.

Goldenseal Pro also uses bigger nodes: with 128 to 4,096 branches, depending on how many items you are likely to have there. Almost all trees will only be one level deep. If they go two levels deep, they’ll hold a few million records before needing a third. The result is less clutter, and easier debugging.

The new nodes include text for their name and contents, so they are easily visible in the raw file. They also include a short text tag for each branch, and ‘safety tags’ at the beginning and end. All these ‘file navigation markers’ help us now during development and debugging. They also will help anyone looking at damaged files, in the future.

For extra security, all the important database structures are duplicated somewhere else in the file. At least in theory, we can rebuild anything that suffers damage.  Most likely we’ll wait for actual damaged files before writing the repair tools, since it’s hard to predict in advance, how data may go wrong.

All of this extra security adds more than 100 bytes per record, so Goldenseal Pro files will be significantly larger. At $60 to $300 per terabyte, we figure you probably won’t mind spending a penny or two for a few extra megabytes, to get more reliable data.

You still should have a good backup system, using multiple locations and methods. There are plenty of ways for data to die, and we can’t prevent all of them. However, we can at least make our files stronger and more survivable. As a bonus, it makes our programming and debugging easier.

Dennis Kolva
Programming Director
TurtleSoft.com

Database Reliability- Part 2 (Nov 28)

When writing new code for our estimating and accounting software, we have a rule of thumb. When it first works, it’s 1/3 done. When it works for everything the programmer can think of to break it, it’s 2/3 done.  After it survives testing by people actually using it, it (eventually) reaches 100%.

We built the new database for Goldenseal Pro more than a year ago, but it was bare bones, and really just the first 1/3 of the work. It was sufficient for initial interface testing, and to make sure the basic approach was valid. The past couple weeks we added the rest of what a database needs, so it’s now at the 2/3 mark. The final 1/3 will be polishing, tweaking and testing, done gradually over the next few months. The final test will be using it to handle our TurtleSoft business accounting for a while.

Our primary goal for the new database code is to make it extremely reliable. We suspect that there are still a few subtle bugs remaining in the NeoAccess database that runs Goldenseal 4.9x, and we want to avoid that for Goldenseal Pro.

One specific way to increase reliability is to simplify the basic structure of the database.

NeoAccess used something called a binary tree (or b-tree) to store the location of each record within the file. B-trees are optimized to access records via binary search, which is the most efficient way to find anything. It only takes log-2 time to find any record in a sorted list. That means a binary search in a thousand records only takes 10 steps, a million records only takes 20 steps, and a billion takes 30. Binary search is amazingly fast when you are indexing, say, all humans on Earth.

The problem is, b-trees are complicated. The branches need to be balanced and trimmed, and there are many ways to go wrong. If there are still bugs left in our modified version of NeoAccess, they are probably buried in the tree management code. It’s just too complex, so we never touched it.

For Goldenseal Pro we use simple lists, rather than b-trees. They are slower to search, but on our scale it’s only a matter of microseconds.  B-trees are gross overkill for small businesses with mere hundreds or thousands of records. Simple linear searches are easy to debug, and much less fragile. Even when a b-tree is appropriate, it’s far better to use a well-tested library, rather than “roll your own” code as in NeoAccess.

For large numbers of records, we still use a tree structure. But it is a very “broad” tree that rarely grows more than two levels deep. It’s more like a shrub or low-maintenance ground cover. Theoretically, the new database won’t be as efficient as a true binary tree. However, we much prefer increased simplicity and reliability, over a tiny performance gain.

It’s possible to do binary searches without a b-tree, and binary search is even built into the C++ standard library that we use. However, binary searches are harder to debug, so we will only add them in places where linear search is visibly slow. That’s part of the tweaking process.

************************

Another part of making code reliable is to write cautious code. That means we frequently “sanity check”, and give an error message if anything is amiss. It makes our development process much easier, since most bugs give a message that takes us right to the problem. If any bugs survive into the final version, users will get a warning, rather than a hidden problem that bites them later.

The main reason that Goldenseal 4.9x rarely crashes, is because it sanity checks first. There are 6,700 places in our code where we confirm that something really exists, and 1,600 places where we check for a reasonable value. There are more thousands of sanity checks built into the code logic, but we can’t count those easily.

Along with our usual cautious coding, Goldenseal Pro includes several layers of sanity checking to make sure the database stays healthy. I’ll talk more about that in a future post.

Dennis Kolva
Programming Director
TurtleSoft.com

 

Database Reliability (Nov 20)

TurtleSoft started out in 1987 with MacNail, a set of Excel templates for construction estimating. Later we added accounting and scheduling features. The MacNail software was extremely popular, and it still has a few hundred die-hard users. Unfortunately, we could only do so much with Excel macros and spreadsheets. MacNail was complicated to use, and difficult to support.

In 1989 we released a second estimating program, built atop Apple’s HyperCard. It had a much nicer human interface, but severe limitations on the back end. Among other problems, HyperCard introduced us to “race conditions”. That’s where the app has intermittent bugs, depending on which message path completes first (unpredictably). They are extremely hard to track down.

In the early 90s, Microsoft replaced the Excel macro language with Visual Basic, and Apple halted development on HyperCard. It was clearly time for us to move on.

For the next software generation, we looked at many database programs to build upon: FileMaker, Omnis, Double Helix, Sirius Developer, Prograph. Each was better in some ways than what we had already, but in some ways each was worse. Rather than build a mediocre app on top of some other software, we finally decided to write a “real” application in C++. Coding from the ground up takes longer, but allows full control. As it happens, it also prevented us from riding a platform to its death, since all but FileMaker are now long gone.

We were too inexperienced to write our own database code, so we tested several C++ database libraries, and settled on one called NeoAccess. It was quite popular in the mid 1990s. The database performed well, and wasn’t too difficult to build upon. Unfortunately, when Goldenseal was almost completed, we discovered that NeoAccess had serious bugs. Files eventually became corrupted, and all data was lost. The developers at NeoLogic never fixed the bugs, and eventually stopped answering emails. 

AOL 4.0 and Netscape Communicator also used NeoAccess. AOL abandoned it in their version 5.0. Netscape retired their software prematurely (and was soon swallowed by AOL). Dozens of other companies went through similar drama, thanks to an unreliable database engine.

We also considered abandoning NeoAccess, but the alternatives were just as buggy. So we put about 2 programmer-years into rewriting their code to make it more reliable. We learned a lot in the process, which is why we decided to build our own database for Goldenseal Pro. You might say that bugs that you create yourself are much easier to fix, than ones made by other people.

The first step in writing reliable software is to make the code maintainable. That means simple logic, clear comments, and easily readable code that anyone can understand. NeoAccess was completely the opposite, and that was its primary flaw. To fix it, we stepped through their code hundreds of times, and gradually rewrote it to make more sense. A few of the bugs were simple logic errors that finally stood out when the code was less cryptic. Some of the bugs were in code that was so convoluted that we just gave up, and rewrote it entirely.

BTW most open-source code is equally unreadable. Whenever you hear in the news about a major security breach, the root cause was probably some important open-source code that nobody understands. If other programmers can’t read it, then they can’t fix it or improve it.

There are more specific methods we are using to make the database more reliable in Goldenseal Pro. NeoAccess taught us a lot about database design, both from its good parts and its bad. I’ll cover them in more detail, in a future post.

Meanwhile, the past couple weeks we have been working on the new database code for Goldenseal Pro. It is mostly working, but probably still needs another week to finish.

Dennis Kolva
Programming Director
TurtleSoft.com

 

Goldenseal Pro Progress Report (Nov 10)

Last week, Goldenseal Pro saved record changes to disk for the first time. It was very exciting to enter data, quit, reopen the file, and see the changes still there. This may not seem like much, but it is a major milestone. It’s the last big connection needed between the new human interface code, and the business logic that runs our estimating and accounting software.

Since then, we have been working on the database code, which manages how records are stored on the hard drive.  The new code we wrote last year has performed well while importing existing Goldenseal files, and then viewing records. However, it needs more bells and whistles to properly manage changes, deletions, and other daily usage.

We often complain about the NeoAccess database library used in Goldenseal, but it did have some very good features. We had to fix a few serious bugs in the late 1990s, but since then it has run thousands of company’s files, with few problems.  As we write the code that replaces it, our goal is to keep the good features from the old database, but make it more reliable, and easier to maintain.

When Goldenseal 4.x opens any record, it reads the data from hard drive to RAM. It then stays “cached”, either until memory got low, or until our code deliberately purges the cache. Reducing disk accesses makes many operations run much faster.

Goldenseal Pro uses a similar cache system. The main difference is that we simplified the list of cached records, so it is easier to troubleshoot. The same list also handles “dirty” records, as described in the previous post.

Every database has to decide where to put new and changed records during a save. The easiest place is at the end of the file. Unfortunately, that eventually results in a huge file that is mostly holes. For example, FileMaker used to save everything at the end (and maybe still does). Files can grow to 10s or 100s of megabytes, and eventually need a compression step to get back to a reasonable size.

NeoAccess solved that problem with a list of gaps between records. That way, new or changed records could go into the empty spaces. Unfortunately, their code was hard to debug, and we suspected that it sometimes wrote records on top of existing ones (extremely bad news). In 2002 we replaced their code with a File Manager to track every record and every empty space. That system worked well, but it used an extremely large record that sometimes crashed, if memory was too low.

For Goldenseal Pro, we’ve split the original file manager into separate, smaller Sector Managers. Each tracks records in just one portion of the file (currently 32,000 records). There is also a separate Gap Manager that tracks all empty spaces. It decides where to put each record, when it is saved. The sector managers have been working great for over a year, and we are refining the gap manager now. It’s tricky, because the manager needs to combine gaps that touch, and some records work better with empty space before or after.

Goldenseal Pro stores the location of every record in two different places: in the main indexing system, and in the sector managers. That will allow the new database to recover from file corruption, by fixing the one that is wrong.  There are places in Goldenseal 4.x files where a single bit change can totally trash the file. In Pro, recovery will be possible from almost all types of “bit rot”.

Dennis Kolva
Programming Director
TurtleSoft.com

 

 

 

 

 

 

Dirty Records (Nov 2)

At heart, Goldenseal is a database program. It stores your company data on a hard drive so you can use it later. One important part of that process is knowing which data is “dirty”. In programmer talk, that means a record has changed, so it needs to be saved. After it’s saved, it becomes “clean” again. It’s hard to say whether the origins of the term are from laundry, or religion!

The old NeoAccess database had a true/false “dirty bit” in each record, to mark whether it had changed. It also had a “busy bit” for items currently in use. At every save, it looked through every record stored in RAM, and saved anything that was dirty but not busy. The system usually worked OK, but it was extremely hard to troubleshoot. If something didn’t work right, it meant a half an hour in the debugger, stepping through many hundreds of code lines.

We ended up adding internal lists of busy and dirty records, just so we could see what was going on more easily. Those lists worked so well that they now replace the old NeoAccess system entirely, in our new database code.

When you first look at an existing record, it starts out clean. Then it turns dirty the moment you change anything. To make that work, each data entry field has to send a message, and change the record status to dirty. Unfortunately, the existing code is a hodge-podge. There is makeDirty, setDirty and MakeDirty (case matters), and there are bits stored in 3 different places. It’s partly because different programmers reinvented the wheel, partly because NeoAccess was ugly, and partly because PowerPlant didn’t separate view and controller functions.

When we “refactor” something like this, it’s a bit like knocking out support members one by one, to see if the building collapses. If it does, we hit undo and decide whether to keep it, or rebuild it. If not, we probably can remove that code, or merge it with something else.

This all started because we need a quick way to decide whether the Save button should be grayed-out, or not. However, the dirty status has important consequences in the multi-user version, where more than one person might change the same record. It’s worth spending a day or two to redesign the system, so it’s more understandable.

Dennis Kolva
Programming Director
TurtleSoft.com

Goldenseal Pro Progress Report (Oct 24)

The Windows version of Goldenseal Pro is now caught up with the Mac version. Both show data entry layouts, load records, and move through them with the browser controls.

The Windows MFC library includes something called a combo box. It turned out to be perfect for our clairvoyant fields, which show a list of accounts (or whatever else) that goes into a field. You can click and choose from an alphabetical list, or start typing the first few letters until it jumps to the correct item.

Combo boxes also work for lists that doesn’t change (for example, status options or account classes). In the original Goldenseal, those are in a popup menu, which requires a mouse click. With the Windows combo box you can still do that, but you can also tab into the field, and type to choose. It makes data entry faster, since everything can be done from the keyboard.

When we wrote the Macintosh interface last winter, we used popup menus that require a mouse click, similar to Goldenseal 4.x. But, now that the Windows version works so much better, we will revisit that code and see if we can make it more similar.

Up until now, we have leap-frogged every 2 or 3 months between the Mac and Windows versions. As we move forward on one platform, we often discover improvements that apply to the other. We will continue to leap-frog, but on a faster cycle. The big stuff is out of the way, and the remaining tasks usually take days apiece, instead of months.

Programming for modern computers is turning out to be significantly easier than what we went through, during the 1990s. The libraries are more mature, and the hardware is more powerful. Back then, desktops typically had 4 megabytes of RAM, so we had to be very frugal about what we loaded into memory. Now that the norm is 4 gigabytes, life is much easier.

For example, in the current version of Goldenseal, we look up the data for popups and clairvoyant fields when we first load each record, so we know what text to display. But we don’t store the whole list, since it’s several hundred kilobytes of memory. When you click in a field, the list is retrieved a second time, to build the menu. It means a fraction of a second delay.

For Goldenseal Pro, it’s no big deal if we use an extra megabyte of RAM per record tab or window. Since we need to fetch the lists near the beginning anyhow, we might as well keep them around. That saves some milliseconds, if you click in the field later. We do have to worry about updating issues (what happens if you add an account?), but at least we have more options, now.

Next on the agenda: saving records, and the Find commands.

Dennis Kolva
Programming Director
TurtleSoft.com

Goldenseal(s) and High Sierra (Oct 17)

We recently tested Goldenseal 4.96 with the new Mac OS version 10.13, High Sierra. It worked properly, with one possible exception.

High Sierra uses a new file system called APFS (Apple File System). It is optimized to run on SSD drives, as well as hard drives. We were concerned about how well the original Goldenseal would handle it, but the results were uncertain. The first time we used the Save As Text button to create a file, it gave an error message and did not save. That also happened a second time. After that, it worked fine, and the problem did not repeat again for any of the file-save operations.

The error message sounded like it was related to user permissions or app code signing, rather than a file system error. Unfortunately we did not write down the exact text (and then never saw it again). The error occurred with a fresh install of the OS and a freshly downloaded copy of the Goldenseal app, so it may have been related to some quirk in Apple’s code-signing system. Whatever it was, it went away on its own.

We decided to try testing with a fresh install on a different computer, with hopes of seeing the error message again, and actually writing it down. Unfortunately, High Sierra would not install on an external drive, and we didn’t want to replace the OS on the internal. People are reporting freezes and other problems with High Sierra 10.13.0, and we don’t want to get too committed to the new OS until 10.13.1 or later.

So, there may be obscure problems when the current Goldenseal app first saves files on Mac High Sierra, or there may not be. We will appreciate hearing from any users about this possible problem, whether or not it occurs.

Meanwhile, we installed the latest Mac development software (Xcode 9.0) and used it to build Goldenseal Pro. That went smoothly, and it ran fine on High Sierra after zero changes.

When we build any version of Goldenseal, we use something called an SDK (Software Development Kit). The Mac SDK includes Cocoa and other Apple code. Newer SDK versions add features, but also prevent the app from running on older OS versions. For each release we need to decide which SDK to use, since it limits the range of computers that can run the final product.

For now, we will continue to build Goldenseal Pro on slightly faster machines running OS 10.11 El Capitan. Before releasing Pro, there will be a user survey, to help make a better decision on which SDK to use.

Dennis Kolva
Programming Director
TurtleSoft.com

 

Goldenseal Pro Progress Report (Oct 10)

For each Goldenseal software update, we use a fairly standardized development flow. At the beginning, we schedule large changes and major redesigns. Those are most likely to add subtle new bugs, and it’s good to have plenty of time to catch them before users do. As work progresses, we gradually become less daring. Near the end, we only make the smallest of changes, proceeding with extreme caution.

Goldenseal Pro is by far our biggest upgrade to date, so it has been an opportunity to make very large changes to the guts of the software. Some are fixes for design choices that we later regretted. Some is refactoring, to take advantage of modern C++ and modern hardware. Some is just rewriting mediocre code so it is simpler, more reliable, better organized, and/or easier to maintain.

Last week, we merged the two classes that managed the basic data entry interface, and then split off five helper classes. It took some futzing to get everything working together, but by mid-week the code ran just like it did before. The new setup will be much easier for our staff to navigate, over the next few months.

After that, we started on the Windows interface, which was soon ready to load record data onto the screen. It then needed some real data so we could actually run the code and see something.

More than a year ago, we wrote a translator that converts existing Mac files to the new Pro format. It is mostly there so users can transfer their existing data, but it also helps us. We use a converted version of the Sample Company File for testing, which is much easier than creating temporary records. So, we ran the translator code on Windows for the first time, but it gave a zillion errors.

Goldenseal Pro uses a new database system to manage records. However, the translator still uses NeoAccess (our former database engine) to read existing data files. It is old, 1980s-style code that is the C++ equivalent to a crawl space filled with spider webs and mummified rodents. Sadly, that’s what was breaking. Our staff spent a couple days slithering around in the murk, converting obscure 32-bit code to 64-bit. Fortunately, that was all it needed, and the Windows translator now runs OK.

With the prep work finished, we can move on to interface programming. It is a lot more fun! There are frequent small triumphs and visible changes, which makes the effort seem more rewarding. The past few months have mostly been spent slogging through libraries and groundwork, so it’s a treat to switch to more tangible work.

Dennis Kolva
Programming Director
TurtleSoft.com

 

Goldenseal Pro- Farewell to DB_Editor (Oct 2)

 

Goldenseal’s source code uses object-oriented programming (OOP). That means its C++ code is divided into about 600 object classes, each in separate text files. For example, CEstimate manages all the data for estimates, and CEstimateViewer handles their screen display. Those files are the first places to look, when fixing a bug in Estimates or adding a feature.

OOP is a fantastic organizational tool, akin to sorting your construction stuff into tool boxes, bags, buckets and totes. Goldenseal Pro’s source code contains over 350,000 lines of C++ code, and 150,000 lines of comments. There is no way we could ever navigate it all, without OOP.

Inside the original Goldenseal software, we had two main controller classes to run data entry windows. DB_Editor handled basic window functions, and  everything in the gray regions on the left and top. DB_RecordViewer managed everything in the big colored rectangle on the right. They worked together for loading, scrolling, editing, saving, finding, and anything else involving data records.

Last Spring we connected DB_RecordViewer to the Cocoa and MFC interfaces, and it went very smoothly. We started on DB_Editor a couple weeks ago, but it proved to be more complicated. We fought with it for a while, then decided that the editor class really isn’t needed for Goldenseal Pro. All of its functions can be handled just as easily by other classes, and its basic concept doesn’t make sense any more.

We gradually moved all the code out of DB_Editor last week, but soon discovered a deeper problem. The best place to put most of its code is in the record viewer, since they already work together to manage the data entry process. Unfortunately, DB_RecordViewer is an enormous file. It  had 275 functions, when the ideal is more like 10 or 20. If OOP classes are tool boxes, this is a van stuffed solid with equipment and construction materials. It needs less clutter, not more.

To accomplish that, we will split the record viewer class into smaller pieces. All the find/skip/replace code is now in a new DB_RecordFinder class, and we are starting on others. It’s mostly just copying and pasting code from one file to another, but we will also reorganize and modernize as we go along. The design may evolve.

This code is pretty much the heart of our software, so it’s worth taking some time to make it more understandable. Linking in the new interfaces will be much easier if they connect to a structure that is solidly built, and well prepared. Spending a week or two now can easily save us many weeks of work, later on.

Dennis Kolva
Programming Director
TurtleSoft.com

Goldenseal Pro- Saving & Posting (Sept 25)

At heart, Goldenseal is a database program. It shows you screens to enter different types of company info, which it then saves to your hard drive. After that, Goldenseal helps you to use all that hard-earned data. You can look up past records, print business forms, view reports, reconcile bank statements, run special operations like Pay Bills and Write Payroll, etc.

Business data is more complicated than your usual address book database, because it is very interrelated. For example, when you enter a material purchase, it needs to link with a vendor account (for Accounts Payable), a project account (for job costing and T&M billing), and a bank account (when you pay for it, now or later). It may also link to Cost Items and Assembles, to update their prices for future estimates.

When you save a new record, we also update all the related records. That process is called posting. It happens during the “thunk” sound you hear, after a save. During posting, we first open and revise all the linked items. Then we write everything to disk at the same time, to reduce the risk they get out of sync.

Goldenseal currently posts and saves whenever you close a window, print it, or move to a different record. You also can force a save when you hit the Enter key, or choose Save Record from the Edit menu. Before the save happens, we check to make sure everything makes sense. If important data is missing, you’ll see a warning message. That prevents half-finished records that won’t post properly.

When you switch from one window to another, we don’t post or save. That way, if you are in the middle of an estimate and the phone rings, you can just leave it unfinished, and go look at something else. It can stay that way until you switch estimates, close that window, or quit/exit.

Goldenseal Pro uses a single-window interface, with each type of record in a tab, rather than a separate window. The saving and posting process is exactly the same. We probably can mark the tabs that have unfinished records.

We are currently working on the code that saves data from new or changed records. We wrote that part of Goldenseal almost 20 years ago, so half the battle is remembering how the current version does it. At the moment we are stepping through the code, simultaneously and slowly, on an old Mac, a new Mac and a Windows machine. It could easily take another week or two to fully understand the old system, so we can write new ones.

Fortunately, the existing database ‘back end’ and the posting process still work fine, and won’t need any changes.

Dennis Kolva
Programming Director
TurtleSoft.com