10-31-2007, 04:29 PM | #1 |
Enthusiast
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
|
PDF Conversion - Removing Header / Footer Text
I am trying to convert a PDF file which has the title and page number information in the header and footer. These are being converted when I use pdftohtml and although I can crop the PDF in pdftohtml the text is to small.
Is there any way to find and replace a chunk of text several lines long? In Word as the header contains the books Title it goes and removes those words from within the text as well. For the footer I suppose it would have to support wildcards for the page numbers. Its driving me nuts and I reckon there must be a nice simple solution! Thanks, Richard. |
10-31-2007, 04:35 PM | #2 |
creator of calibre
Posts: 44,333
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use pdftohtml in text mode so it generates a text based HTML file and then use a regular expression to remove the offending text.
|
10-31-2007, 05:17 PM | #3 |
Enthusiast
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
|
What do you use to actually manipulate the text though?
Sorry but my experience has been limited to things I can edit in a text editor. Last edited by heb; 10-31-2007 at 05:22 PM. |
10-31-2007, 05:39 PM | #4 |
Enthusiast
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
|
OK found a free program called ReplaceEM that let me select and replace a range based on the start and end of the text. I also supports regular expressions.
Thanks for your help! |
11-05-2007, 01:51 PM | #5 |
Connoisseur
Posts: 95
Karma: 38
Join Date: Jul 2007
Device: Android tablets and phones, Windows tablet, Kobo Aura One
|
I've been using MS Word when I can to cleanup, and then exporting to HTML or RTF and converting from there.
Word has a simple search and replace syntax(although you can do regex if you like) so you can say something like "[File:*2007]" and replace it with nothing, which is a lot easier then regex IMO. That said, if you were doing something more complex RegEx would definately be the way to go, it just takes me 20 minutes to get a regex for something simple even right. |
12-23-2007, 06:49 AM | #6 |
dstampe
Posts: 50
Karma: 17
Join Date: Jan 2007
Location: Canada
Device: Sony PRS-500
|
If you happen to have a full version of Adobe, you can crop the top and bottom of the pages in the document before conversion.
|
01-10-2009, 11:11 PM | #7 |
Bookaholic
Posts: 72
Karma: 142
Join Date: Dec 2008
Device: Kindle 1, 3, Ipad
|
Adobe pdf printer which is apart of adobe full version takes it header and footer info from IE 7 so if you open and setup page in ie 7 and delete header info from set up page than pdf printer will not print headers and footers, may be other pdf printers may also do the same, I have not tried those yet
|
01-22-2009, 12:59 PM | #8 | |
Member
Posts: 14
Karma: 12
Join Date: Jan 2009
Location: Lima, Perú
Device: Kindle 2 and Sony Reader PRS 505
|
Quote:
Use Nitro PDF (www.nitropdf.com/) Use the crop option under the edit menu. Just select the part you want to preserve ... leave outside the header an footers and then double click on the selection. Then, select to apply on all pages and thats all. Now there is a new pdf, but no header and footer PS: Sorry about my english :P |
|
07-11-2010, 01:22 PM | #9 |
Junior Member
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
|
I have the newest version of Calibre and they told me to click remove header and footer; however, I needed to write a regular expression to match them. Does anyone know what to write?
|
07-11-2010, 11:02 PM | #10 |
Curmudgeon
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
|
The place to ask this is in the Calibre forum, not necroposting in a years-old thread (one which, by the way, predates Caliber) in a device forum for some unrelated device.
If nobody is answering your question, the odds are that it's because you haven't explained the question enough for them to realize it's something they can answer, not because you haven't posted in enough random forums. Take a look at your other posts and see what you could do to make it easier for someone to understand what you need. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Old Thread] Removing ABBYY header in a PDF | robertlc | Conversion | 33 | 09-09-2011 12:12 AM |
Removing header and footer | radicalnomad | Calibre | 2 | 08-26-2010 10:34 AM |
Cropping a header and footer from a PDF (Page numbers etc) | NickS | 2 | 06-09-2010 11:31 AM | |
Header/Footer removal | Solicitous | Calibre | 2 | 03-30-2010 05:53 AM |
Header/Footer Problems with conversion | Sydney's Mom | Calibre | 4 | 01-05-2010 11:04 AM |