02-24-2012, 01:18 AM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Feb 2012
Device: Kindle 4 NT
|
Problem while converting pdf to epub using calibre
whenever i am converting a pdf to epub using calibre, extra information is added inbetween the epub format.
for example, the following text is added in between the pages in epub format. P:\010Comp\Begin8\189-0\ch02.vp Friday, February 11, 2005 7:28:17 AM Color profile: Generic CMYK printer profile Begin8 Java: A Beginner’s Guide, 3rd Ed Schildt 3189-0 2 Composite Default screen Blind Folio 2:38 38 And this text differs everytime, so i cannot use search and replace function Is there any way to get rid of this text while converting? Also, when converting from pdf to epub, the tables in the epub format are not displayed, instead they are represented sequentially in the epub format?Is there any other solution? |
02-24-2012, 06:17 AM | #2 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
PDF conversion is always going to be unpredictable. You can try using Adobe Acrobat to get text or Mobipocket Creator to get HTML to run through calibre to get to epub. You might want to download Sigil as well. You can use it to clean up the various problems that will still remain with the best conversion.
You can use Sigil's search and replace function and regular expressions (regex) to try to get rid of the extra text in what you have now. It takes some study. Be sure you use more current postings as you if you try to learn about it because the particular flavor of regex in Sigil has changed recently. You might be able to look at the current epub you have in code view and see that although the text always changes, the lead into it and out of it is always the same. That could give you a handle to locate it while searching and replacing. |
02-24-2012, 06:25 AM | #3 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
Just in case it is not obvious from the previous reply - caliber is not adding the extra information - it will be present in the PDF, typically as headers and footers.
As was mentioned it is possible to use the Calibre serach and replace feature in regex mode to get rid of this during conversion. However that involves working out a regex expression to dientify the offending text that is specific to this book - as all PDF files differ slightly it has not proved possible to get calibre to work out the required regex and completely automate such removal. |
02-24-2012, 08:43 AM | #4 |
Well trained by Cats
Posts: 29,801
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Adding to itimpi
Sigil is great for this type of cleanup. You can see what code (in CV) is needed for your REGEX. |
02-24-2012, 08:48 AM | #5 |
MR Drone
Posts: 1,613
Karma: 15612282
Join Date: Oct 2007
Location: DRONEZONE
Device: PB360+, Huawei MP5, Libra H20
|
As the venerable JSWolf...would say....(not sure what happened to him? self-exile or ostracized)... PDF to epub on calibre is like the roulette wheel....sigil might be the way to go.... or Abby PDF transformer...great program PDF to RTF or PDF to word or PDF to searchable pdf..... it used to cost about 100USD but well worth the cost... also converts two column to one column pdfs...
Last edited by hidari; 02-28-2012 at 05:38 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem converting pdf to epub (size) using calibre | abadguy | 6 | 03-23-2012 05:33 AM | |
problem converting PDF to epub | Herbstwind | Conversion | 10 | 11-10-2011 05:18 AM |
Problem with accents converting PDF to EPUB | madeira | Calibre | 0 | 07-09-2010 05:15 PM |
Problem converting PDF to EPUB in calibre | adgpro | Calibre | 2 | 07-09-2010 01:10 AM |
Problem converting pdf to epub | smartin | Calibre | 3 | 05-02-2010 06:55 AM |