![]() |
#1 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Jun 2010
Device: iPhone
|
PDF to ePub conversion issue - headers getting left in
I'm hoping someone can give me some pointers on where I'm going wrong here. I'm trying to convert a PDF into ePub, but it seems that no matter what I do the header text is left in. According to both the wizard and the regexbuddy software, both headers are matched, but when I do the conversion they're still there.
Here's an example of the debug code. Input\Index.html Text: Code:
will soon manifest themselves.” <br> “I sense nothing.” <br> <hr> <A name=12></a>2 <br> Richard A. Knaak <br> “Your skills are not honed as mine are, my lord, but that <br> Code:
(?i)(?<=)<hr>\s*<A name=/d+></a>(/d+ <br>\sRichard A\. Knaak|Moon of the Spider <br>\s/d+) <br> Parsed\Index.html: Code:
themselves.” </p><p> “I sense nothing.” </p><p> 2 </p><p> Richard A. Knaak </p><p> “Your skills are not honed as mine are, my lord, but that shall be remedied soon enough, yes?” </p><p> Code:
(?i)(?<=)(Moon of the Spider\s*</p><p>\s\d+\s*</p><p>|\s\d+\s</p><p>\sRichard A\. Knaak\s*</p><p>) ![]() |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
|
Well, I don't know how to do it using regex. But I have an easier way to convert pdf to epub. I use Nitro PDF.
here we go: 1. Crop PDF Document using Nitro PDF to eliminate page numbers and repeated titles, save the cropped PDF and use this for the next step. Remember, even though the page number and repeated titles in the cropped PDF is gone, but the are still there safely hidden. To permanently remove them, go to step 2. 2. Convert the cropped PDF to word (.doc). Now they are finally gone. But you have to go to step 3. 3. Convert the .doc document back to PDF(you may use cutePDF, PrimoPDF, etc). Now you have a new PDF with pages number and repeated title completely gone. 4. And finally convert the new PDF to ePub. and voala.. the page numbers are gone, the reapeated titles are gone.. It worked like charm to me, you may try this. regards |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
If you have got a good word file, I would simply save from Word as "web page (filtered)", and use this as the basis of conversion to ePub. It is likely to convert far more reliably than a PDF file.
|
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
|
web page (filtered) means convert .doc to .htm or .html? I never try this way, but will try it. thanks itimpi.
|
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
|
I just tried it. Is this normal the .htm to .epub conversion takes 9 minutes to complete?The book contain 210 of pages in both PDF and Word format and when I convert pdf -> epub it took less than 1 minute.
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Most of the time if you saved the doc file as html (filtered), not as html, the resultant html file converts very quickly. When calibre converts PDF the first step is converting the PDF to html then to epub.
|
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
|
Hi dwanthny, I am certainly sure I saved the document to .htm(filtered) before I added it to calibre. It's recognized by calibre as ZIP file and when I tried to convert it for the second time I found no improvement in the converting speed, it still took 9 minutes, as I tried before.
|
![]() |
![]() |
![]() |
#8 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,890
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
pdf to mobi conversion issue | dkritso109 | Calibre | 16 | 10-08-2010 06:10 AM |
New conversion questions: Getting rid of huge left margin Epub to Mobi | geekgeek | Calibre | 2 | 08-31-2010 11:00 PM |
Wide left margins in epub to kindle conversion | jchrist | Calibre | 0 | 02-02-2010 09:13 PM |
Evading headers in PDF->EPUB conversion | davef | Calibre | 6 | 08-29-2009 03:26 PM |
ePub conversion issue | phunkysai | Calibre | 17 | 01-07-2009 03:39 PM |