![]() |
#16 | ||
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
Quote:
Eventually the new PDF input engine will be complete and removing the header and footer will be automatic. This regex based system will be renamed to "remove content". |
||
![]() |
![]() |
![]() |
#17 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
It is just a bit confusing since it's not shown anywhere in the debug-folder (probably not well documented or explained).
Probably an (easy) example with explanations somewhere near the place to insert the regex would help finding the right regex (I've had to check out various sites to find the right regex-syntax, and probably not everyone has the patience or time to do so). I actually like the posibility to check or uncheck the "remove footer" - checkbox (or even a "remove content-checkbox"), since not on every book i like to apply the regex (some have pagenumbers, others don't), and the way it is I don't have to copy/delete the regex I generally use, I just have to check or uncheck that box. |
![]() |
![]() |
![]() |
#18 |
Connoisseur
![]() Posts: 61
Karma: 36
Join Date: Jan 2010
Location: Reston, Virginia, US
Device: ipad
|
I'm having problems with this as well. Here are the details:
Using calibre 0.6.33, I'm trying to convert a pdf to an epub. In the pdf the last line of the page is a line number. I'm trying to write a regex to remove this. Setting the debug for conversion I've been able to look at the input, parsed and processed directories. An example from the input directory shows the last couple of lines of a page: Code:
The stranger had clambered through the ditch and up the bank,<br> 8<br> Code:
The stranger had clambered through the ditch and up the bank, 8</p><p> The processed directory shows: Code:
The stranger had clambered through the ditch and up the bank, 8</p><p class="calibre1"> Code:
\d+<br> Code:
\d+</p><p> It's not clear to me when the regex processing is done. That is to say, whether it is before or after the conversion to xhtml and line unwrapping have occurred. Speculating, I'd say it's after. The problem is that it appears impossible to refer to P tags in the regex. They never work. I've tried everything suggested in this thread and so far nothing has worked. Anyone have any ideas? |
![]() |
![]() |
![]() |
#19 |
Member
![]() Posts: 10
Karma: 10
Join Date: Jan 2010
Device: Sony PRS 600
|
![]() |
![]() |
![]() |
![]() |
#20 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
Hello ac4lt,
Try this one, i had a similar problem, and trying it out it worked for me (that was still in a previous version, but i dont think it changed yet) Code:
\d+<p> Code:
\s*\d+<p> i hope it still works, but can't test it at the moment because i don't have calibre installed at work. Last edited by matthias; 01-14-2010 at 03:01 AM. |
![]() |
![]() |
![]() |
#21 |
Member
![]() Posts: 10
Karma: 10
Join Date: Jan 2010
Device: Sony PRS 600
|
![]()
I have been applying the above to one of my ebpub and it starts off highlighting the appropriate areas in the book but when I convert it still doesn't work... I have the remove footers checked so what am I skipping?
|
![]() |
![]() |
![]() |
#22 |
Connoisseur
![]() Posts: 61
Karma: 36
Join Date: Jan 2010
Location: Reston, Virginia, US
Device: ipad
|
As with poodlemama neither works for me.
|
![]() |
![]() |
![]() |
#23 |
Member
![]() Posts: 10
Karma: 10
Join Date: Jan 2010
Device: Sony PRS 600
|
OK let me ask if this is the correct process...
I have a PDF book. . . I add it to my Calibre library... I go to convert ebook and update all metadata and the go to Page set up... make sure input and output are correct.. go to structure detection.. click on "Remove Footer" and click on the wizard tool... type in (\d+\s*</p><p>) and see the highlighted page numbers and codes... push ok and then click ok to start the conversion.. what am I missing? |
![]() |
![]() |
![]() |
#24 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,899
Karma: 6995721
Join Date: Dec 2008
Location: Idaho, on the side of a mountain
Device: Kindle Oasis, Fire 3d Gen and 5th Gen and Samsung Tab S
|
I have not been able to get pdf conversion to work with headers and footers. I convert to prc with mobipocket creator, and them import into Calibre.
|
![]() |
![]() |
![]() |
#25 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 4212
Join Date: Nov 2009
Location: South Tyrol, Italy
Device: Sony Reader PRS-505
|
Try leaving the </p> away, it won't be highlighted in the preview, but for me it worked in the conversion.
|
![]() |
![]() |
![]() |
#26 |
Connoisseur
![]() Posts: 61
Karma: 36
Join Date: Jan 2010
Location: Reston, Virginia, US
Device: ipad
|
|
![]() |
![]() |
![]() |
#27 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
I can't help you get it to work in Calibre, I use a work around.
I import the pdf into the free download of Mobipocket Creator. Importing it strips out some trashy headers and creates a html file. UPDATE: It was too long ago for my memory ![]() Mobipocket Creator removed many atrocious headers for me in the past so I fooled my old memory into thinking it worked all the time. Even when it removed a bad text header I had to manually remove page numbers or other junk using wordpad or MS Word. I primarily used Mobipocket Creator to remove some trash headers and change a PDF to HTML so I could do a quick clean up of the html file. Sorry for the confusion.... now where else did I post this error? Last edited by DoctorOhh; 01-17-2010 at 09:49 PM. Reason: My Info was wrong |
![]() |
![]() |
![]() |
#28 |
Connoisseur
![]() Posts: 61
Karma: 36
Join Date: Jan 2010
Location: Reston, Virginia, US
Device: ipad
|
I'll take a look at that option. Thanks for the info!
|
![]() |
![]() |
![]() |
#29 | |
Junior Member
![]() Posts: 6
Karma: 10
Join Date: Nov 2009
Device: sony ebook PR-505
|
An easy solution at last?
Quote:
I have no idea what a "header regular expression" is, "regex", or how to work out the wand button, although i have tried many times. Same as everyone, im trying to convert pdf, but in my case is to lrf, the sony ebook format. I've read what dwanthny said about that Mobipocket Creator, and i think it sounds much easier than removing headers and footers with Calibre. My questions are: What exactly is to "build an ebook"?, once i create an html with mobipocket from a pdf file, can i just drag that html into Calibre and convert it into an lrf for my sony ebook? or do i have to convert html into a prc first and then to lrf?. I apologize if my questions are not very clear and i thank you for any help in this! |
|
![]() |
![]() |
![]() |
#30 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
![]() Mobipocket Creator removed many atrocious headers for me in the past so I fooled my old memory into thinking it worked all the time. Even when it removed a bad text header I had to manually remove page numbers or other junk using wordpad or MS Word. I primarily used Mobipocket Creator to remove some trash headers and change a PDF to HTML so I could do a quick clean up of the html file. Sorry for the confusion.... now where else did I post this error? |
|
![]() |
![]() |
![]() |
Tags |
calibre pdf footer remove |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Structure Detection - Remove Header (or Footer) Regex | DarkKipper | Conversion | 69 | 11-09-2013 12:21 PM |
Regex help to remove HTML footer | neonbible | Calibre | 4 | 09-09-2010 09:42 AM |
footer removal help | icy | Calibre | 7 | 08-27-2010 01:21 PM |
remove PDF footer containing variable? | irisclara | Calibre | 10 | 03-06-2010 10:53 PM |
RFE: Remove remove tags in bulk edit | magphil | Calibre | 0 | 08-11-2009 10:37 AM |