The CTextString update is almost complete. Everything runs properly with the new text strings (so far). Some of the changed code is still untested, but that will shake out along with our regular app testing. We added break points to possible problem areas, so it will jump into the debugger automatically the first time we use them.
Besides making the class more reliable, the update was a chance to remove obsolete code. No sense in rewriting stuff that is no longer needed.
One thing that went is Pascal strings. They were the standard way to handle text in 68K Macs. 32-bit Mac OS X still used them for a few things. Pascal is totally gone from 64-bit OS versions. Now it’s totally gone from our code too. A bit less clutter.
We also decided to toss the complex code that converts text to money. Instead, we will rewrite CMoney to make its data storage simpler, better and faster. It’s a change that has been on our to-do list for years, and this is the perfect time to do it. More about CMoney in the next post.
We also cleared out some Microsoft-specific code that handled their many text formats. It’s left over from we tried using MFC to build a Windows version (and ultimately failed). Their string classes drove us nuts. There was BSTR, CString, CStringA, CStringT, CStringW, LPSTR, LPCSTR, LPCTSTR, LPWSTR, LPCWSTR and LPCTSTR. I probably missed a few. Use the wrong one or convert improperly, and the app crashed.
Clearing out that obsolete string cruft was a reminder that TurtleSoft dodged a bullet by procrastinating on the text upgrade.
Since 1844, people have used many formats to move text over wires. The first digital one was a 5-bit system that replaced Morse code in the late 1800s. It had 32 possible values: enough for an all-cap alphabet, spaces, and a few control characters. You may have seen that style of telegram in old movies: HELP STOP NO LOWERCASE LETTERS OR PUNCTUATION EXCLAMATION MARK
ASCII was developed in the 1960s for teletypes and computers. It uses 7 bits, enough for 128 characters. That’s enough for upper and lower case alphabets, numerals, punctuation similar to a typewriter, plus control characters (tab, carriage return, line feed, etc). For a few decades, ASCII was the primary way to display and store text. It’s still very common.
Apple extended ASCII to 8 bits in the Macintosh. The extra 128 characters included vowels with accents, tildes, upper/lowercase Greek letters, smart quotes, and a fancier set of punctuation marks. Command-shift-K is , #140 in Apple extended ASCII.
Even 8-bit ASCII was not big enough to include Russian, Arabic, Sanskrit, Thai and other alphabets. As they grew more international, Microsoft and Apple switched to “wide” 16-bit characters for a while. Those can handle 64,000 different characters. Good enough to cover almost all languages and alphabets.
Switching between standard and wide ASCII text was hard. Wide strings are the reason for half those weird text types listed above. Having two different character sizes is why crashes happened. It also made ordinary text twice as big. Fortunately, TurtleSoft won’t need to deal with wide characters. They were a dead end. We jumped right over them.
Problem was, Chinese and a few neighbors use a different symbol for every word. Already that’s 80,000+ symbols. Ethnographers wanted support for niche alphabets. Historians wanted support for dead languages. Texters wanted emojis. Other folks wanted their own special character sets. 16 bits was not enough room for everyone.
That’s why Unicode happened. It’s expandable to as many bits and bytes as it needs. Unicode supports every possible language. It also includes geometric shapes, dingbats, musical notes, math symbols, emojis, and more. If someone discovers hieroglyphs on Mars, Unicode will write them.
Qt text fields can display full Unicode. They store its data in a format called UTF-8, the same size as ASCII characters. Also the same size as text inside a CTextString. Our code doesn’t need to know anything about Unicode. We just store a string of data, and Qt will convert it into plain text, 故 or 🍇or whatever.
The first release of TurtleSoft Pro probably won’t display Unicode. There are plenty of other things to worry about first. However, we can easily slip it in later.
Dennis Kolva
Programming Director
TurtleSoft.com