View Single Post
Old 09-03-2011, 12:33 PM   #1
therealjoeblow
Zealot
therealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfoldedtherealjoeblow reads XML... blindfolded
 
Posts: 106
Karma: 52102
Join Date: Jun 2010
Device: Samsung Android Tablet w/Moon+ Pro Reader
Looking for a tool to find/fix mis-matched quotes...

On a number of poorly OCR'd documents, often paragraphs with quotes where a character is speaking are either broken into 2 paragraphs, or there is a quote simply missing, either the first one or the second one.

I'm looking for a plugin or took of any kind that can identify these to make fixing them easier than reading through the entire book and fixing it as I go.

When I have to edit/fix documents, I normally convert to htmlz, then edit the html file with notepad++.

If the took did exist, here's what I believe it should do:

Paragraphs are generally (not always though, but in most cases) formatted like:

<p class-"calibre#">This is some text</p>

What it should do is look at each <p></p> grouping (ignoring the class), and count the " characters it finds within. In cases where proper slanted open/close quotes have been used, count each occurrence of these.

Then identify the paragraphs where there are an odd number of each of these so the user can review/fix it (I realize that part would have to be manual since no tool can tell where to put the missing quote properly, but at the very least, it would make finding these *much* easier).

...kind of like an extended search-and-replace - "Find mismatched quotes"

I'm not particularly hung up about whether it would work in notepad++ or not, any editor would work, and an extension/plugin for calibre would be even better if it could be integrated into a semi-automated conversion process where it prompts for user input on each mismatched set of quotes.

Anyone know if something like this exists?

If not - Kovid, is there any chance you could add this to calibre? It would save a *huge* amount of time and frustration!

Cheers
The REAL Joe
therealjoeblow is offline   Reply With Quote