View Single Post
Old 01-23-2011, 08:56 PM   #11
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Archon View Post
Hmm yeah I am not to that level just that quick.

I had a book that someone OCRd and it had replaced all the italics with _foo_.

I was quite pleased with myself for using:

Find
_(.+)_

and replacing with

\\i \1 \\i0

With some tweaking I was able to replace a couple hundred occurrences with spaces, commas and periods after the words.

Now the book is back closer to its original with italics in it.

Not bad for my first foray but I am sure I will learn how to search and replace with just one search instead of repeated with different variations.

For anyone interested Bare Bones software has a free program called Text Wrangler for Mac OSX that will do regular expressions search and replacing.
http://www.barebones.com/products/te.../download.html

I will keep at it though until I understand everything you typed.
Thanks for the help
Archon
You should also check out the 'Heuristics' section of Calibre's conversion settings. A lot of the things you want to do may be covered there already. For example, the _foo_ case for italics is already covered there under the 'italicize common cases' function. It was still a good learning exercise for you, so not all was lost, but in many common cases the work has been done.

TextWrangler is a good app - it supports most of the python syntax, I use it for testing most of my expressions when debugging.

Last edited by ldolse; 01-23-2011 at 10:57 PM.
ldolse is offline   Reply With Quote