View Single Post
Old 01-08-2015, 09:28 PM   #3
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by theducks View Post
Among other things, * is a wildcard and will need to be escaped.
I think he was just trying to emphasize the portion he was speaking of with asterisks, not that the actual source document had them!

Quote:
Originally Posted by Johann Cat View Post
I have a simple text-block book that has within it, as some gutenberg.org books do, page numbers within the text block (not coded footers, etc.).
Mind just linking to the specific Gutenberg example?

If I am understanding correctly, I am thinking it might just be using a Regex as simple as this:

Search: \s+― [0-9]+ ―\s+
Replace: (insert a single space here)

What this says in English is "look for one or more blank space characters" + "look for an em dash followed by a space" + "look for a number" + "look for a space followed by an em dash" + "look for one or more blank space characters". Replace with "a single space".

What I would then do is just clean up the file in a Text Editor using the above Regex, and then feed that document through Calibre for conversion.

Last edited by Tex2002ans; 01-08-2015 at 09:34 PM.
Tex2002ans is offline   Reply With Quote