Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-31-2007, 04:29 PM   #1
heb
Enthusiast
heb began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
Question PDF Conversion - Removing Header / Footer Text

I am trying to convert a PDF file which has the title and page number information in the header and footer. These are being converted when I use pdftohtml and although I can crop the PDF in pdftohtml the text is to small.

Is there any way to find and replace a chunk of text several lines long? In Word as the header contains the books Title it goes and removes those words from within the text as well. For the footer I suppose it would have to support wildcards for the page numbers.

Its driving me nuts and I reckon there must be a nice simple solution!

Thanks,

Richard.
heb is offline   Reply With Quote
Old 10-31-2007, 04:35 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,744
Karma: 22446736
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use pdftohtml in text mode so it generates a text based HTML file and then use a regular expression to remove the offending text.
kovidgoyal is offline   Reply With Quote
Advert
Old 10-31-2007, 05:17 PM   #3
heb
Enthusiast
heb began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
What do you use to actually manipulate the text though?

Sorry but my experience has been limited to things I can edit in a text editor.

Last edited by heb; 10-31-2007 at 05:22 PM.
heb is offline   Reply With Quote
Old 10-31-2007, 05:39 PM   #4
heb
Enthusiast
heb began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Aug 2007
Device: none yet
OK found a free program called ReplaceEM that let me select and replace a range based on the start and end of the text. I also supports regular expressions.

Thanks for your help!
heb is offline   Reply With Quote
Old 11-05-2007, 01:51 PM   #5
coleman
Connoisseur
coleman began at the beginning.
 
Posts: 95
Karma: 38
Join Date: Jul 2007
Device: Android tablets and phones, Windows tablet, Kobo Aura One
I've been using MS Word when I can to cleanup, and then exporting to HTML or RTF and converting from there.

Word has a simple search and replace syntax(although you can do regex if you like) so you can say something like "[File:*2007]" and replace it with nothing, which is a lot easier then regex IMO. That said, if you were doing something more complex RegEx would definately be the way to go, it just takes me 20 minutes to get a regex for something simple even right.
coleman is offline   Reply With Quote
Advert
Old 12-23-2007, 06:49 AM   #6
dstampe
dstampe
dstampe began at the beginning.
 
Posts: 50
Karma: 17
Join Date: Jan 2007
Location: Canada
Device: Sony PRS-500
If you happen to have a full version of Adobe, you can crop the top and bottom of the pages in the document before conversion.
dstampe is offline   Reply With Quote
Old 01-10-2009, 11:11 PM   #7
Bookathon
Bookaholic
Bookathon doesn't litterBookathon doesn't litter
 
Posts: 72
Karma: 142
Join Date: Dec 2008
Device: Kindle 1, 3, Ipad
Adobe pdf printer which is apart of adobe full version takes it header and footer info from IE 7 so if you open and setup page in ie 7 and delete header info from set up page than pdf printer will not print headers and footers, may be other pdf printers may also do the same, I have not tried those yet
Bookathon is offline   Reply With Quote
Old 01-22-2009, 12:59 PM   #8
jefferson_frantz
Member
jefferson_frantz began at the beginning.
 
jefferson_frantz's Avatar
 
Posts: 14
Karma: 12
Join Date: Jan 2009
Location: Lima, Perú
Device: Kindle 2 and Sony Reader PRS 505
Quote:
Originally Posted by heb View Post
I am trying to convert a PDF file which has the title and page number information in the header and footer. These are being converted when I use pdftohtml and although I can crop the PDF in pdftohtml the text is to small.

Is there any way to find and replace a chunk of text several lines long? In Word as the header contains the books Title it goes and removes those words from within the text as well. For the footer I suppose it would have to support wildcards for the page numbers.

Its driving me nuts and I reckon there must be a nice simple solution!

Thanks,

Richard.
It's easy to do.
Use Nitro PDF (www.nitropdf.com/)
Use the crop option under the edit menu.
Just select the part you want to preserve ... leave outside the header an footers and then double click on the selection. Then, select to apply on all pages and thats all.
Now there is a new pdf, but no header and footer

PS: Sorry about my english :P
jefferson_frantz is offline   Reply With Quote
Old 07-11-2010, 01:22 PM   #9
preciousferret
Junior Member
preciousferret began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jul 2010
Device: Nook
I have the newest version of Calibre and they told me to click remove header and footer; however, I needed to write a regular expression to match them. Does anyone know what to write?
preciousferret is offline   Reply With Quote
Old 07-11-2010, 11:02 PM   #10
Worldwalker
Curmudgeon
Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.Worldwalker ought to be getting tired of karma fortunes by now.
 
Posts: 3,085
Karma: 722357
Join Date: Feb 2010
Device: PRS-505
The place to ask this is in the Calibre forum, not necroposting in a years-old thread (one which, by the way, predates Caliber) in a device forum for some unrelated device.

If nobody is answering your question, the odds are that it's because you haven't explained the question enough for them to realize it's something they can answer, not because you haven't posted in enough random forums. Take a look at your other posts and see what you could do to make it easier for someone to understand what you need.
Worldwalker is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Old Thread] Removing ABBYY header in a PDF robertlc Conversion 33 09-09-2011 12:12 AM
Removing header and footer radicalnomad Calibre 2 08-26-2010 10:34 AM
Cropping a header and footer from a PDF (Page numbers etc) NickS PDF 2 06-09-2010 11:31 AM
Header/Footer removal Solicitous Calibre 2 03-30-2010 05:53 AM
Header/Footer Problems with conversion Sydney's Mom Calibre 4 01-05-2010 11:04 AM


All times are GMT -4. The time now is 01:44 AM.


MobileRead.com is a privately owned, operated and funded community.