![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2015
Device: iPhone
|
Getting rid of the footer text and page count
I'm converting a book from pdf, so I have a page footer with page count I want to loose.
I came up with: Code:
<p class="calibre1"> <i class="calibre4">Title of the Book by Author</i> <i class="calibre4">[0-9]*</i></p> It would seem Calibre didn't remove even one of these. Why? Doesn't Calibre support the regex above? How can I make sure every occurrence throughout the book is removed? |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Are you sure you are matching the right stuff?
You need to match against the input content, not the post-processed result. In the S&R tab of the conversion dialog, click the wand button to get a preview of the pdftohtml result, which is what the regex will operate on. Or in the Editor, you can S&R the EPUB (?) with more granularity. Last edited by eschwartz; 12-13-2015 at 07:45 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2015
Device: iPhone
|
Quote:
Code:
Title of the Book by Author [0-9]* |
|
![]() |
![]() |
![]() |
#4 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Well, once again check the pdftohtml intermediate content. Use the Regex Builder wizard to make sure you match the right stuff.
There will be HTML, not just text. pdftohtml is a third-party utility that comes from poppler, and it should be predictable enough -- calibre performs the S&R before stomping all over the markup with its CSS-flattening algorithm. ![]() Normally the regex is applied to the raw contents of the input format, i.e. unzipped EPUB/AZW3 (X)HTML. But PDF is, ah, complicated, so it has to be turned into HTML before you can convert that HTML to something else. Last edited by eschwartz; 12-14-2015 at 02:31 PM. |
![]() |
![]() |
![]() |
#5 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2015
Device: iPhone
|
Quote:
This was quite hard to discover in Calibre. Hopefully I learned something for my next title, |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
That last bit I said was just an explanation as to why you cannot merely open the book in calibre's Editor to see what needs changing, etc.
You ALWAYS need to use the Regex Builder in calibre in order to find out what to change, or do something else to examine the SOURCE INPUT. PDF is slightly more complicated because the source input comes from pdftohtml. But you still need to look at the SOURCE INPUT. I don't know how many times I need to say this, but calibre's S&R operates on the SOURCE INPUT! Looking at the converted book in the Editor is worthless, because the book has already been converted and therefore it is no longer the SOURCE INPUT. Here is a picture, in case it helps.... |
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2015
Device: iPhone
|
Quote:
I'll try the wizard again for my next title. It's wonderful to get my fav pdfs as e-books just as with the books that came also in e-book format from the beginning. Calibre is a useful tool. |
|
![]() |
![]() |
![]() |
#8 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Oh, well if you just need help with using the right regex to match specific text, why didn't you say so?
Post some code examples and we can help you write a regex. |
![]() |
![]() |
![]() |
#9 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,959
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
MIB
I find it Oh So Much easier to do these kinds of tasks with the E-book Editor, where you can see exactly what (and works) you get searching. Also, you can do multiple passes, rather than trying for a catchall on a single pass. ![]() ![]() |
![]() |
![]() |
![]() |
#10 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
@theducks
You cannot edit PDF in the Editor. Or in Sigil. You can dump the PDF with pdftohtml and import that into the Editor I guess... |
![]() |
![]() |
![]() |
#11 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,959
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
|
![]() |
![]() |
![]() |
#12 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Yeah, after CSS flattening.
![]() ![]() Conversion is ideally the last of all possible steps... |
![]() |
![]() |
![]() |
#13 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Dec 2015
Device: iPhone
|
Quote:
The real problem is that after this I have to find out why it didn't work. It would be better if I interactively could see how many matches I'd get for a pattern already in pre-conversion in Calibre. It was way easier to clean the converted results in Sublime, but this does feel like a lot of work with copy and paste. A side issue is how you can influence the html generation, as much uses non-optimal tag structures. For example I'd love if you could help conversion with suggesting what in the current title signifies a heading, what's a list and so on. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cannot get rid of text-indent | dawood | Conversion | 3 | 10-21-2013 10:10 PM |
Removing the Footer with Page numbers and book title? | omro | Kobo Reader | 24 | 11-11-2012 03:47 AM |
Regexp and Alternate Page Header/Footer | adad | Calibre | 5 | 01-15-2011 09:03 PM |
PDF Conversion - Removing Header / Footer Text | heb | Sony Reader | 9 | 07-11-2010 11:02 PM |
PRS-500 What should the Total Page Count text display? | Nogg | Sony Reader Dev Corner | 8 | 09-07-2007 07:04 PM |