MobileRead Forums - View Single Post

Tex2002ans · 01-08-2015, 09:28 PM

Quote:

Originally Posted by theducks

Among other things, * is a wildcard and will need to be escaped.

I think he was just trying to emphasize the portion he was speaking of with asterisks, not that the actual source document had them!

Quote:

Originally Posted by Johann Cat

I have a simple text-block book that has within it, as some gutenberg.org books do, page numbers within the text block (not coded footers, etc.).

Mind just linking to the specific Gutenberg example?

If I am understanding correctly, I am thinking it might just be using a Regex as simple as this:

Search: \s+― [0-9]+ ―\s+
Replace: (insert a single space here)

What this says in English is "look for one or more blank space characters" + "look for an em dash followed by a space" + "look for a number" + "look for a space followed by an em dash" + "look for one or more blank space characters". Replace with "a single space".

What I would then do is just clean up the file in a Text Editor using the above Regex, and then feed that document through Calibre for conversion.