Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-23-2007, 07:25 AM   #1
Puffball
Member
Puffball doesn't litterPuffball doesn't litter
 
Puffball's Avatar
 
Posts: 20
Karma: 181
Join Date: Jan 2007
Location: Scotland, UK
Device: Sony Reader
BASIC program for smart quotes

I have been reformatting Gutenberg texts, and wanted to change straight quote marks (CHR$34 and CHR$39) to the curly ones (ASCII codes vary).

I do not have MS Word on my PC, only OpenOffice and WordPad. On my elderly Mac, which does have Word, using find-and-replace to turn plain quotation marks into curly ones gives problems. Single quote-marks next to double-quotes are sometimes wrong. Also, smart-quote algorithms don't know how to handle an open-single-quote at the start of a word, when that quote indicates a missing syllable.

So I wrote a very simple, not to say clunky, BASIC program to help. It definitely runs under QBasic in a DOS window under Windows XP, but will probably be OK in QuickBasic, etc., as well.

Before you start, give your text file a DOS-friendly (i.e. 8+3) name.

The program produces a new version of it, called OUTPUT.TXT, in the current directory. The quote marks are converted as follows:

Opening double quote: {odq}
Closing double quote: {cdq}
Opening single quote: {osq}
Closing single quote: unchanged (')
Single quote needing human attention: {?}

It also flags (as "{[char]}"characters with an ASCII code greater than 126; this is handy when converting other text files which may contain accents and whatnot.

Then all you have to do is find-and-replace on {odq}, {cdq}, {osq}, and '. When you're done, a search for "{" or "}" will reveal anything else that needs your attention.

[Note for people even less computer-literate than me ... there must be a few out there ... Cut and paste the text from "REM Quotes.bas" down to "END" and save it as "QUOTES.BAS"]

I hope this is found useful!

REM Quotes.bas
REM Flags text files for quotes
REM 23 Jan 07

CLS
PRINT "QUOTES"
PRINT
PRINT "Flags text files for quotation marks"
PRINT
INPUT "Which file do you want to process? ", file$

tempfile$ = "output.txt"

OPEN tempfile$ FOR OUTPUT AS #2

OPEN file$ FOR INPUT AS #1

prevchar$ = ""

DO UNTIL EOF(1)
char$ = INPUT$(1, #1)
IF char$ > "~" THEN char$ = "{"+char$+"}"
IF char$ = CHR$(34) AND prevchar$ = CHR$(10) THEN char$ = "{odq}"
IF char$ = CHR$(34) AND prevchar$ = " " THEN char$ = "{odq}"
IF char$ = CHR$(34) THEN char$ = "{cdq}"
IF char$ = CHR$(39) AND prevchar$ = CHR$(10) THEN char$ = "{osq}"
IF char$ = CHR$(39) AND prevchar$ = " " THEN char$ = "{osq}"
IF char$ = CHR$(39) AND prevchar$ = "{odq}" THEN char$ = "{?}"
PRINT #2, char$;
prevchar$ = char$
LOOP

CLOSE #1
CLOSE #2

PRINT "Done"
END

Last edited by Puffball; 01-23-2007 at 05:01 PM. Reason: Forgot something ...
Puffball is offline   Reply With Quote
Old 01-24-2007, 01:26 AM   #2
mogui
eNigma
mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.
 
mogui's Avatar
 
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
cool idea -- a filter program

Thanks Puffball! You got me excited remembering my command-line days. Back when, I was able to cook up a filter program in C to do something like what you did with the QBASIC code. I have missed having that ability in Windows. I guess I am just too lazy to install a C compiler.

But I recently ran across WSH. That stands for Windows Scripting Host. You can read about it here: http://support.microsoft.com/kb/232211 Basically it allows you to create Visual Basic scripts ".vbs" and javascripts ".js" and simply execute them as programs at the command line. You can also drag the file to the desktop as a shortcut and execute it directly. I am using Windows XP Pro. It has WSH already installed.

This: http://www.jsware.net/jsware/scripts.php3 site is a good source of scripts. I downloaded a small set of text-manipulation vb scripts. Attached here is one that filters text. You must rename it to eliminate the ".txt" extension. Then you can modify it as you wish. Beware that it will overwrite your input file.

This: http://www.devguru.com/Technologies/...ipt_intro.html link will tell you everything you might wish to know about vbscript.

Now we can deal with some of the annoyances we find in text files. I recently used Mozilla Composer to export an html ebook as a text file. I found it had many non-ASCII characters in it that confused the MP4 player I use for reading. I spent much time using macros in an emacs editor to replace the odd characters and to reformat the text to my liking.

Best WSHes.
Attached Files
File Type: txt Replace.vbs.txt (1.2 KB, 516 views)

Last edited by mogui; 01-24-2007 at 03:54 AM.
mogui is offline   Reply With Quote
Advert
Old 01-24-2007, 04:13 AM   #3
Puffball
Member
Puffball doesn't litterPuffball doesn't litter
 
Puffball's Avatar
 
Posts: 20
Karma: 181
Join Date: Jan 2007
Location: Scotland, UK
Device: Sony Reader
Good stuff, Mogui!

I'm afraid my computer abilities are stuck somewhere around 1990, so BASIC is my tool of choice. But I do like EMACS, which I used way back to do some heavyweight editing of Elizabethan poetry -- the results are here:

http://www.gutenberg.org/etext/6930

I'm thinking about writing a BASIC program to reformat raw Gutenberg files so they look better for reading. That mainly involves removing certain carriage returns (so that the text flows smoothly), without interfering with other carriage returns which are used for layout. Along the way it could deal with quote marks, en and em dashes, and maybe italics too.

I'll try to make time to have a go at that, unless something like it already exists, which I suspect it does! But if not I'll post the results here for criticism & testing.
Puffball is offline   Reply With Quote
Old 01-24-2007, 06:42 AM   #4
Bob Russell
Recovering Gadget Addict
Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.Bob Russell ought to be getting tired of karma fortunes by now.
 
Bob Russell's Avatar
 
Posts: 5,381
Karma: 676161
Join Date: May 2004
Location: Pittsburgh, PA
Device: iPad
Thanks Mogui! I didn't know about that scripting possibility. That would be very useful for many things! Is it very much changed in Vista? (If so, I might want to wait to learn it.)
Bob Russell is offline   Reply With Quote
Old 01-24-2007, 09:43 AM   #5
mogui
eNigma
mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.mogui is no ebook tyro.
 
mogui's Avatar
 
Posts: 503
Karma: 1335
Join Date: Dec 2006
Location: The Philippines
Device: HTC G1 Android FBReader
You get two WSHes

With Windows Vista there are two meanings for the WSH acronym. First is the old Windows Scripting Host. The second is Windows Service Hardening! Doesn't that just make you want to ROTFL?

I don't expect much change, if any, in the WSH because change would be more likely to appear in the scripting languages, VBScript and jscript (Microsoft's javascript), themselves. Such change would be slow and asynchronous with Vista. Basically WSH provides an interface that lets you make Windows jump through hoops using vbscript, jscript and a host of others. The first two languages are already in place. The others need to be installed if you want them.

You can make WSH host script languages like perl. See The ActiveState site: http://www.activestate.com/Products/ActivePerl/ for perl, python or tcl. ActiveState lets you download distributions of perl, python and tcl complete with documentation -- for free!

Other languages are availble to be hosted by WSH. Just google "Rexx WSH" for example. For more ideas about WSH, look here: http://www.robvanderwoude.com/index.html

What is great about WSH is that it gives you a way to write simple programs to accomplish the tasks you need done. It is a blessing for those of us who grew up on BASIC. Jscript is close enough to C that the C programmers will be happy with it.

To get started, just get some sample code in your favorite editor. Open the devguru site mentioned before, and start changing the code little by little. I keep a command prompt open to execute the script. WSH puts a complaint in a requester box if I make a mistake. It tells me the offending line and character number and a hint at the nature of my offense.

There are development GUIs available too if you care to try them. I was in a hurry, so I just used the native tools. I like it simple.

If WSHes were HRSes . . .
mogui is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre and Smart Quotes salasnet Calibre 9 09-27-2010 04:32 AM
Smart Quotes Toxaris ePub 2 05-31-2010 10:32 AM
Smart quotes in XHTML? MaggieScratch ePub 12 03-28-2009 05:53 PM
Removing smart quotes horseyride Workshop 8 03-06-2008 12:08 PM
Smart quotes in RTF? ogghead Sony Reader 8 01-23-2007 06:38 PM


All times are GMT -4. The time now is 04:30 AM.


MobileRead.com is a privately owned, operated and funded community.