Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-03-2010, 01:43 PM   #1
ChaoZ
Connoisseur
ChaoZ doesn't litterChaoZ doesn't litter
 
ChaoZ's Avatar
 
Posts: 92
Karma: 192
Join Date: Jul 2010
Location: Toronto
Device: Kindle 3
[Old Thread] Removing page numbers.

I have an epub book that has page numbers hardcoded into it. Is it possible to have Calibre remove them automatically when I convert it to mobi for Kindle use?

It's a one or two digit number on its own paragraph, but it also breaks existing paragraph flow. I guess the file was converted from PDF at some point?

Last edited by ChaoZ; 10-03-2010 at 02:45 PM.
ChaoZ is offline   Reply With Quote
Old 10-03-2010, 03:16 PM   #2
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,596
Karma: 25170848
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
you could try converting it back to a pdf and using a program to crop the pages.
google crop pdf for utilties to do this
Works sometimes and sometimes not
Helen
speakingtohe is offline   Reply With Quote
Old 10-03-2010, 03:29 PM   #3
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 644
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
Open up the epub in Sigil, do a search for
Code:
<p>(\d+)</p>
regex + minimal

replace with blank string.

if your sure there's only page numbers in paragraphs, you can do a replace all, but to be safe, step through, by replacing each one individually, then if there is something odd you'll pick it up.

(such as date / year - numbers out of order from page order is a giveaway)
Perkin is offline   Reply With Quote
Old 10-03-2010, 04:00 PM   #4
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Calibre has the option to remove headers and/or footers in the conversion process. See the structure detection- part of the conversion settings.
Manichean is offline   Reply With Quote
Old 10-03-2010, 05:58 PM   #5
ChaoZ
Connoisseur
ChaoZ doesn't litterChaoZ doesn't litter
 
ChaoZ's Avatar
 
Posts: 92
Karma: 192
Join Date: Jul 2010
Location: Toronto
Device: Kindle 3
I don't think it's formatted as a header or a footer though.
I broke open the epub using the Tweak option and saw it was actually a paragraph tag.

I also noticed what seems like a bad OCR job. Looks like the file is just bad.
ChaoZ is offline   Reply With Quote
Old 10-03-2010, 06:10 PM   #6
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by ChaoZ View Post
I don't think it's formatted as a header or a footer though.
I broke open the epub using the Tweak option and saw it was actually a paragraph tag.
It doesn't matter how it is formatted, if it can be described by a regexp, Calibre can remove it. But be careful, you could easily remove something you don't want to remove.
Manichean is offline   Reply With Quote
Old 05-28-2013, 03:21 PM   #7
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 225
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
Is there a Regex crack who could help?
I have some ebooks where, in the current of the text, appear page numbers (probably referring to the original printed version), sometimes even with hyperlink referring to the original TOC. I would like to delete them, but have no clue on regex matters. In one particular book, the numbers appear in squared brackets, such as [Pg 4]. Those numbers have up to three digits. The tags are like this: <span class="pagenum"><a class="pcalibre pcalibre1" id="Page_4">[Pg 4]</a></span>. Is there a way of removing them by one single regex command in Sigil or Calibre?

Thanks in advance! And please, remember, I'm completely ignorant in this field.

Hope someone reads this, the thread beeing quite old.
Leonatus is offline   Reply With Quote
Old 10-14-2014, 08:52 PM   #8
WhiteAbeLincoln
Junior Member
WhiteAbeLincoln began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2014
Device: none
I know that this thread is reallly old, and Leonatus's request is also really old, but heres a regex for you and a break down
Quote:
<span class="\w*"><a class="[A-Za-z_ \d]*" id="[A-Za-z_ \d]*">\[[A-Za-z_ \d]*\]<\/a><\/span>
.

\w* matches zero or more word characters (alphanumeric and _)
[A-Za-z_ \d]* matches zero or more characters between A-Z, a-z or _ (underscore) or a digit or a space

everything else is pretty much an exact match. the random backslashes are escape characters (so the following char isn't interpreted as a regex part) e.g \/ escapes / char, and \[ escapes [ char

http://www.regexr.com/ is a great website for quickly and easily building regexs, and they have a reference sidebar so you can look up syntax etc.

Last edited by WhiteAbeLincoln; 10-14-2014 at 08:54 PM.
WhiteAbeLincoln is offline   Reply With Quote
Old 10-20-2014, 03:02 PM   #9
Leonatus
Addict
Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.Leonatus can grok the meaning of the universe.
 
Leonatus's Avatar
 
Posts: 225
Karma: 157352
Join Date: Mar 2013
Location: Berlin, Germany
Device: Kobo Touch
WhiteAbeLincoln, thanks heartily for your help, having passed some time, though. By hasard, the problem that I referred above came to me again just last week, and I removed the items manually. Next time, however, I shall be pleased to test your proposal. Luckily, I'm no longer thus ignorant in the business as I wrote a year ago - but still far away from beeing expert.
Leonatus is offline   Reply With Quote
Reply

Tags
epub, page numbers

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing page numbers and spaces from Mobi files rabbischle Conversion 4 06-10-2011 03:03 AM
RegEx: Removing Page Numbers that have Spaces captainslow Conversion 2 02-27-2011 04:14 PM
Removing headers/page numbers greycobalt Calibre 3 10-10-2010 01:57 PM
Removing Page Numbers ManosHandsOfFate Calibre 6 09-28-2010 12:12 PM
Removing page numbers? Cap.T Calibre 1 02-21-2010 09:57 AM


All times are GMT -4. The time now is 06:26 AM.


MobileRead.com is a privately owned, operated and funded community.