Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-25-2010, 02:55 PM   #1
deadSkip
Junior Member
deadSkip began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2010
Device: iPhone
PDF to ePub conversion issue - headers getting left in

I'm hoping someone can give me some pointers on where I'm going wrong here. I'm trying to convert a PDF into ePub, but it seems that no matter what I do the header text is left in. According to both the wizard and the regexbuddy software, both headers are matched, but when I do the conversion they're still there.

Here's an example of the debug code.

Input\Index.html Text:
Code:
will soon manifest themselves.”&nbsp;<br>
“I sense nothing.”&nbsp;<br>
<hr>
<A name=12></a>2&nbsp;<br>
Richard A. Knaak&nbsp;<br>
“Your skills are not honed as mine are, my lord, but that&nbsp;<br>
Regex:
Code:
(?i)(?<=)<hr>\s*<A name=/d+></a>(/d+&nbsp;<br>\sRichard A\. Knaak|Moon of the Spider&nbsp;<br>\s/d+)&nbsp;<br>
Doing this leaves the text when. I do the similar thing with the parsed file and it's still left in.

Parsed\Index.html:
Code:
themselves.” </p><p>
“I sense nothing.” </p><p>
2 </p><p>
Richard A. Knaak </p><p>
“Your skills are not honed as mine are, my lord, but that shall be remedied soon enough, yes?” </p><p>
Regex:
Code:
(?i)(?<=)(Moon of the Spider\s*</p><p>\s\d+\s*</p><p>|\s\d+\s</p><p>\sRichard A\. Knaak\s*</p><p>)
And yes, I've remembered to check the Remove Header boxes
deadSkip is offline   Reply With Quote
Old 07-08-2010, 10:02 PM   #2
tirtarevolusi
Junior Member
tirtarevolusi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
Well, I don't know how to do it using regex. But I have an easier way to convert pdf to epub. I use Nitro PDF.

here we go:

1. Crop PDF Document using Nitro PDF to eliminate page numbers and repeated titles, save the cropped PDF and use this for the next step. Remember, even though the page number and repeated titles in the cropped PDF is gone, but the are still there safely hidden. To permanently remove them, go to step 2.

2. Convert the cropped PDF to word (.doc). Now they are finally gone. But you have to go to step 3.

3. Convert the .doc document back to PDF(you may use cutePDF, PrimoPDF, etc). Now you have a new PDF with pages number and repeated title completely gone.

4. And finally convert the new PDF to ePub. and voala.. the page numbers are gone, the reapeated titles are gone..

It worked like charm to me, you may try this.

regards
tirtarevolusi is offline   Reply With Quote
Old 07-08-2010, 10:24 PM   #3
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
If you have got a good word file, I would simply save from Word as "web page (filtered)", and use this as the basis of conversion to ePub. It is likely to convert far more reliably than a PDF file.
itimpi is offline   Reply With Quote
Old 07-08-2010, 10:48 PM   #4
tirtarevolusi
Junior Member
tirtarevolusi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
web page (filtered) means convert .doc to .htm or .html? I never try this way, but will try it. thanks itimpi.
tirtarevolusi is offline   Reply With Quote
Old 07-08-2010, 11:16 PM   #5
tirtarevolusi
Junior Member
tirtarevolusi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
I just tried it. Is this normal the .htm to .epub conversion takes 9 minutes to complete?The book contain 210 of pages in both PDF and Word format and when I convert pdf -> epub it took less than 1 minute.
tirtarevolusi is offline   Reply With Quote
Old 07-09-2010, 12:26 AM   #6
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by tirtarevolusi View Post
I just tried it. Is this normal the .htm to .epub conversion takes 9 minutes to complete?The book contain 210 of pages in both PDF and Word format and when I convert pdf -> epub it took less than 1 minute.
Most of the time if you saved the doc file as html (filtered), not as html, the resultant html file converts very quickly. When calibre converts PDF the first step is converting the PDF to html then to epub.
DoctorOhh is offline   Reply With Quote
Old 07-09-2010, 01:57 AM   #7
tirtarevolusi
Junior Member
tirtarevolusi began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jun 2010
Location: Jakarta
Device: Nook
Hi dwanthny, I am certainly sure I saved the document to .htm(filtered) before I added it to calibre. It's recognized by calibre as ZIP file and when I tried to convert it for the second time I found no improvement in the converting speed, it still took 9 minutes, as I tried before.
tirtarevolusi is offline   Reply With Quote
Old 07-09-2010, 02:07 AM   #8
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by tirtarevolusi View Post
Hi dwanthny, I am certainly sure I saved the document to .htm(filtered) before I added it to calibre. It's recognized by calibre as ZIP file and when I tried to convert it for the second time I found no improvement in the converting speed, it still took 9 minutes, as I tried before.
Every book is unique. All I can say is that this example isn't the norm.
DoctorOhh is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf to mobi conversion issue dkritso109 Calibre 16 10-08-2010 06:10 AM
New conversion questions: Getting rid of huge left margin Epub to Mobi geekgeek Calibre 2 08-31-2010 11:00 PM
Wide left margins in epub to kindle conversion jchrist Calibre 0 02-02-2010 09:13 PM
Evading headers in PDF->EPUB conversion davef Calibre 6 08-29-2009 03:26 PM
ePub conversion issue phunkysai Calibre 17 01-07-2009 03:39 PM


All times are GMT -4. The time now is 06:38 AM.


MobileRead.com is a privately owned, operated and funded community.