View Full Version : BASIC program for smart quotes

01-23-2007, 08:25 AM
I have been reformatting Gutenberg texts, and wanted to change straight quote marks (CHR$34 and CHR$39) to the curly ones (ASCII codes vary).

I do not have MS Word on my PC, only OpenOffice and WordPad. On my elderly Mac, which does have Word, using find-and-replace to turn plain quotation marks into curly ones gives problems. Single quote-marks next to double-quotes are sometimes wrong. Also, smart-quote algorithms don't know how to handle an open-single-quote at the start of a word, when that quote indicates a missing syllable.

So I wrote a very simple, not to say clunky, BASIC program to help. It definitely runs under QBasic in a DOS window under Windows XP, but will probably be OK in QuickBasic, etc., as well.

Before you start, give your text file a DOS-friendly (i.e. 8+3) name.

The program produces a new version of it, called OUTPUT.TXT, in the current directory. The quote marks are converted as follows:

Opening double quote: {odq}
Closing double quote: {cdq}
Opening single quote: {osq}
Closing single quote: unchanged (')
Single quote needing human attention: {?}

It also flags (as "{[char]}"characters with an ASCII code greater than 126; this is handy when converting other text files which may contain accents and whatnot.

Then all you have to do is find-and-replace on {odq}, {cdq}, {osq}, and '. When you're done, a search for "{" or "}" will reveal anything else that needs your attention.

[Note for people even less computer-literate than me ... there must be a few out there ... Cut and paste the text from "REM Quotes.bas" down to "END" and save it as "QUOTES.BAS"]

I hope this is found useful!

REM Quotes.bas
REM Flags text files for quotes
REM 23 Jan 07

PRINT "Flags text files for quotation marks"
INPUT "Which file do you want to process? ", file$

tempfile$ = "output.txt"

OPEN tempfile$ FOR OUTPUT AS #2


prevchar$ = ""

char$ = INPUT$(1, #1)
IF char$ > "~" THEN char$ = "{"+char$+"}"
IF char$ = CHR$(34) AND prevchar$ = CHR$(10) THEN char$ = "{odq}"
IF char$ = CHR$(34) AND prevchar$ = " " THEN char$ = "{odq}"
IF char$ = CHR$(34) THEN char$ = "{cdq}"
IF char$ = CHR$(39) AND prevchar$ = CHR$(10) THEN char$ = "{osq}"
IF char$ = CHR$(39) AND prevchar$ = " " THEN char$ = "{osq}"
IF char$ = CHR$(39) AND prevchar$ = "{odq}" THEN char$ = "{?}"
PRINT #2, char$;
prevchar$ = char$


PRINT "Done"

01-24-2007, 02:26 AM
Thanks Puffball! You got me excited remembering my command-line days. Back when, I was able to cook up a filter program in C to do something like what you did with the QBASIC code. I have missed having that ability in Windows. I guess I am just too lazy to install a C compiler.

But I recently ran across WSH. That stands for Windows Scripting Host. You can read about it here: Basically it allows you to create Visual Basic scripts ".vbs" and javascripts ".js" and simply execute them as programs at the command line. You can also drag the file to the desktop as a shortcut and execute it directly. I am using Windows XP Pro. It has WSH already installed.

This: site is a good source of scripts. I downloaded a small set of text-manipulation vb scripts. Attached here is one that filters text. You must rename it to eliminate the ".txt" extension. Then you can modify it as you wish. Beware that it will overwrite your input file.

This: link will tell you everything you might wish to know about vbscript.

Now we can deal with some of the annoyances we find in text files. I recently used Mozilla Composer to export an html ebook as a text file. I found it had many non-ASCII characters in it that confused the MP4 player I use for reading. I spent much time using macros in an emacs editor to replace the odd characters and to reformat the text to my liking.

Best WSHes.

01-24-2007, 05:13 AM
Good stuff, Mogui!

I'm afraid my computer abilities are stuck somewhere around 1990, so BASIC is my tool of choice. But I do like EMACS, which I used way back to do some heavyweight editing of Elizabethan poetry -- the results are here:

I'm thinking about writing a BASIC program to reformat raw Gutenberg files so they look better for reading. That mainly involves removing certain carriage returns (so that the text flows smoothly), without interfering with other carriage returns which are used for layout. Along the way it could deal with quote marks, en and em dashes, and maybe italics too.

I'll try to make time to have a go at that, unless something like it already exists, which I suspect it does! But if not I'll post the results here for criticism & testing.

Bob Russell
01-24-2007, 07:42 AM
Thanks Mogui! I didn't know about that scripting possibility. That would be very useful for many things! Is it very much changed in Vista? (If so, I might want to wait to learn it.)

01-24-2007, 10:43 AM
With Windows Vista there are two meanings for the WSH acronym. First is the old Windows Scripting Host. The second is Windows Service Hardening! Doesn't that just make you want to ROTFL?

I don't expect much change, if any, in the WSH because change would be more likely to appear in the scripting languages, VBScript and jscript (Microsoft's javascript), themselves. Such change would be slow and asynchronous with Vista. Basically WSH provides an interface that lets you make Windows jump through hoops using vbscript, jscript and a host of others. The first two languages are already in place. The others need to be installed if you want them.

You can make WSH host script languages like perl. See The ActiveState site: for perl, python or tcl. ActiveState lets you download distributions of perl, python and tcl complete with documentation -- for free!

Other languages are availble to be hosted by WSH. Just google "Rexx WSH" for example. For more ideas about WSH, look here:

What is great about WSH is that it gives you a way to write simple programs to accomplish the tasks you need done. It is a blessing for those of us who grew up on BASIC. Jscript is close enough to C that the C programmers will be happy with it.

To get started, just get some sample code in your favorite editor. Open the devguru site mentioned before, and start changing the code little by little. I keep a command prompt open to execute the script. WSH puts a complaint in a requester box if I make a mistake. It tells me the offending line and character number and a hint at the nature of my offense.

There are development GUIs available too if you care to try them. I was in a hurry, so I just used the native tools. I like it simple.

If WSHes were HRSes . . .