View Single Post
Old 12-18-2010, 11:10 PM   #1
caleb72
Indie Advocate
caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.caleb72 ought to be getting tired of karma fortunes by now.
 
caleb72's Avatar
 
Posts: 2,863
Karma: 18794463
Join Date: Sep 2010
Location: Melbourne, Australia
Device: Kindle
An interesting experience converting formats

I recently got hold of a few freebie novels in PDF format. Nice and legal as well .

Anyway - PDF is no good for me so I thought I'd have a go at converting it to epub.

I had many passes at it until I got something almost useful:
  • I used a free pdftohtml program which dumped everything into one HTML file.
  • I opened the HTML in VIM and then used regular expressions to move all of the headers/page numbers and forced line breaks (except for supposed end of paragraph breaks)
  • There were quite a few passages in italics throughout the novel and the initial conversion left a superfluous amount of tags which I used regular expressions to tidy up.
  • I opened up the modified HTML in Sigil and then used search and replace to force 5 spaces at the start of each paragraph as the book looks stupid without indenting
  • I separated a couple of obvious front pages (foreward etc..) into HTML files so that I could have forced page breaks. Luckily the novel didn't actually have chapters so I didn't have to worry about that.

So by now I had something that was formatted OK. However, I remembered that sometimes conversion from PDF joins words together here and there - particularly in sections with italics. So I copy-pasted the entired text and moved to Word thinking that I would use the Grammar/Spell checker to identify anomalies which I could then tidy up.

This is where everything became unstuck. The grammar and spelling of this author were awful.

I had come quite a long way so I did the best I could to correct some glaring mistakes - but at the end of it I wondered why I bothered. Am I ever going to read a book like this?

I'm certainly not perfect - but if I were writing a novel I'd probably at least run it through a spell checker before publishing it.

Can anyone relate to this?
Have you picked up a free novel only to find yourself staggering under the weight of poor spelling and suspect grammar?

Regards
Caleb
caleb72 is offline   Reply With Quote