![]() |
#1 |
Member
![]() Posts: 13
Karma: 60
Join Date: Feb 2009
Device: PRS-700
|
Evading headers in PDF->EPUB conversion
Some things are just only available in PDF. (boo!)
Calibre does a pretty good job converting basic text PDFs to EPUBs. (yea!) Except for the freakin' headers/footers. (boo!) If it was just the author's name over and over... Or the book title over and over.. I could easily do a global replace in the EPUB source and get rid of 'em. But there always have to be page numbers which makes them all different. Just eliminating the author/title leaves these numbers strewn through the book. So I wondered... is there some way to get Calibre to ignore anything it finds in the top (or bottom) inch (or so) of the PDF page? Or some other approach... Thanks! Dave |
![]() |
![]() |
![]() |
#2 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
In the latest 0.6.x series of Calibre there is the facility to provide a regular expression to identify the headers and/or footers.
This is bit esoteric in that many people do not udnerstand regular expressions, but I believe that there are plans to provide a GUI interfact to this at some time in the future. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383043
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
Another approach would be to use a tool like Book Designer (which does a better job at PDF import than Calibre, anyway). Import the PDF into BD, get rid of all the extraneous stuff, and then save as HTML, and import the HTML into Calibre.
|
![]() |
![]() |
![]() |
#4 |
Member
![]() Posts: 13
Karma: 60
Join Date: Feb 2009
Device: PRS-700
|
Thanks to both!
I downloaded Book Designer and will have to find some time to play. I also grabbed Harry's Book Designer tutorial which I'm sure will be a huge help. Thanks again... |
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
If you can afford it, get ABBYY Finereader or PDF Transformer. They do a great job of convering PDFs, both text- and image-based.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Zealot
![]() ![]() Posts: 115
Karma: 150
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
|
Another good one is from Nuance (http://www.nuance.com/imaging/pdfcon...ofessional.asp). This one managed to convert a pdf to something that actually resembled the original. Tried the same pdf with Acrobat and it ended up as one big paragraph :-((
Regards, Joop |
![]() |
![]() |
![]() |
#7 |
Zealot
![]() ![]() Posts: 115
Karma: 150
Join Date: Jul 2008
Location: Netherlands Veenendaal
Device: Palm T5, Sony PRS-505, Nook Color
|
Just to let you know that I might have found something that might help you too regarding the removal of headers/footers.
The following is what I copied from the debug output of Calibre (.6.10) and that I want removed: Code:
<br> 5<br> <hr> <A name=7></a> Code:
(?ims)<br>\s*\d{1,3}\s*<br>\s<hr>\s<a name=\d{1,3}></a> It isn't perfect because sentences that continue on the other page aren't always strung together but it beats manually removing pagenumbers ;-) Googling for some help I found two programs that really helped me, YMMV: Regex Coach : http://weitz.de/regex-coach/ Kodos : http://kodos.sourceforge.net/ Where I found Regex Coach the better one with more possibilities and better info on what is happening. Regards, Joop |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF to ePub conversion issue - headers getting left in | deadSkip | Calibre | 7 | 07-09-2010 02:07 AM |
HTML Conversion - Multiline Headers | prky | Calibre | 1 | 07-03-2010 09:24 AM |
PDF to EPUB conversion | jfontana | Calibre | 2 | 03-17-2010 03:09 AM |
pdf to epub conversion | mediax | Sigil | 16 | 11-19-2009 03:48 PM |
Help with conversion from PDF to EPUB | Fizz | Calibre | 5 | 10-25-2009 11:48 AM |