|  01-07-2015, 01:49 PM | #1 | 
| Member  Posts: 21 Karma: 10 Join Date: Nov 2014 Device: Kobo Aura HD; Kindle III; Kindle PWII; Boyue T62D; Onyx Boox i86 | 
				
				Removing page numbers within text?
			 
			
			I think there is a way to do this using "search and replace" within Calibre conversion, but I do not know the shorthand or code for indicating these characters. I have a simple text-block book that has within it, as some gutenberg.org books do, page numbers within the text block (not coded footers, etc.). So some lines are interrupted like this: In chemistry, we find such assertions as that hydrogen being univalent while oxygen is bivalent, "makes it plain that we must expect to find no more than three compounds of those elements." It did not make the matter plain to those who held to the strict univalence of chlorine; and Dr. Williams says nothing about variable	 * *― 281 ― * *valencies, but rather implies their fixity. The history of opinion concerning Mendeléef's law is inexcusably inaccurate after the admirable history of the matter by Venable. I.e., I want to remove the space between "variable" and "valencies" and the page number and remove all similar page numbers throughout the text. I am not a code writer: can someone walk me through this or point me to some correct, precise (I need to know how to indicate the dash as a running character and those page numbers, especially) instructions? Last edited by Johann Cat; 01-09-2015 at 02:54 AM. | 
|   |   | 
|  01-07-2015, 03:22 PM | #2 | |
| Well trained by Cats            Posts: 31,240 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | Quote: 
  You are probably going to need to learn some (REGEX) editing skills. Among other things, * is a wildcard and will need to be escaped. Conversion search and replace is for more simple tasks BTW There are a few REGEX tutorials and REGEX help threads here at MR for when you do get stuck.  IMHO 100% : work in code view for this one. there are at least 4 lines of code involved in your example | |
|   |   | 
|  01-08-2015, 09:28 PM | #3 | ||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | Quote: 
 Quote: 
 If I am understanding correctly, I am thinking it might just be using a Regex as simple as this: Search: \s+― [0-9]+ ―\s+ Replace: (insert a single space here) What this says in English is "look for one or more blank space characters" + "look for an em dash followed by a space" + "look for a number" + "look for a space followed by an em dash" + "look for one or more blank space characters". Replace with "a single space". What I would then do is just clean up the file in a Text Editor using the above Regex, and then feed that document through Calibre for conversion. Last edited by Tex2002ans; 01-08-2015 at 09:34 PM. | ||
|   |   | 
|  01-09-2015, 02:50 AM | #4 | 
| Member  Posts: 21 Karma: 10 Join Date: Nov 2014 Device: Kobo Aura HD; Kindle III; Kindle PWII; Boyue T62D; Onyx Boox i86 | 
			
			Thanks for the suggestions.  The asterisks that show in the left margin did not appear in the original text, but somehow appeared when I pasted the text into this editor, so I should have deleted them for accuracy's sake. I think the asterisks may indicate paragraph or line-break symbols, but, again, were not apparent in the original.  I will try the regex suggestion; that code makes sense.  Do you know if I can do this using openoffice's text editor?  I have the "alternative" find-replace app installed.  Or should I use calibre?  If neither of those, what is the regex editor of choice?
		 | 
|   |   | 
|  01-09-2015, 07:40 AM | #5 | 
| Addict            Posts: 250 Karma: 1702156 Join Date: Nov 2010 Device: Kindle Voyage | |
|   |   | 
|  01-09-2015, 09:02 AM | #6 | |
| Well trained by Cats            Posts: 31,240 Karma: 61360164 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A | Quote: 
 The nice thing is OO/Calibre/Sigil has an interactive Editor where you can try-hone your REGEX. Avoid Replace All. Step through a few dozen perfect finds before even thinking of "lettn'er rip" Remember: File: DISCARD for when things go deep South | |
|   |   | 
|  01-09-2015, 03:45 PM | #7 | ||
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | 
			
			Hmmm, from what I was quickly able to find, OpenOffice Writer only allows you to search PER PARAGRAPH. It doesn't let you search across paragraphs. I would also avoid one of those fancy GUI Word Processors if you can, because they add A TON of cruft on top of the text (fonts, font sizes, spacing, etc. etc.). Hmmm, what is this "alternative" you speak of? Is this an addon for OpenOffice? I am not familiar at all... perhaps it allows that functionality. Quote: 
 For EPUB, you can also use Sigil or Calibre's "Edit Book" feature. It all depends on what the source format is of this document you are getting. If you got it from Gutenberg, I assume it is TXT or EPUB? If you link right over to the Gutenberg copy you are working on, perhaps we could figure out an even more specific answer to remove these page numbers. Quote: 
 The Regex I listed above should work in: Notepad++, Sigil, and Calibre (and whatever other program uses that same Regex engine). And good list of warnings, it always should be stressed that you should SAVE BACKUP COPIES BEFORE YOU DO HUGE REGEX CHANGES. | ||
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| [Old Thread] Removing page numbers. | ChaoZ | Calibre | 8 | 10-20-2014 03:02 PM | 
| RegEx: Removing Page Numbers that have Spaces | captainslow | Conversion | 2 | 02-27-2011 04:14 PM | 
| Removing headers/page numbers | greycobalt | Calibre | 3 | 10-10-2010 01:57 PM | 
| Removing Page Numbers | ManosHandsOfFate | Calibre | 6 | 09-28-2010 12:12 PM | 
| Removing page numbers? | Cap.T | Calibre | 1 | 02-21-2010 09:57 AM |