|02-24-2012, 02:18 AM||#1|
Join Date: Feb 2012
Device: Kindle 4 NT
Problem while converting pdf to epub using calibre
whenever i am converting a pdf to epub using calibre, extra information is added inbetween the epub format.
for example, the following text is added in between the pages in epub format.
Friday, February 11, 2005 7:28:17 AM
Color profile: Generic CMYK printer profile
Begin8 Java: A Beginner’s Guide, 3rd Ed Schildt 3189-0 2
Composite Default screen
Blind Folio 2:38
And this text differs everytime, so i cannot use search and replace function
Is there any way to get rid of this text while converting?
Also, when converting from pdf to epub, the tables in the epub format are not displayed, instead they are represented sequentially in the epub format?Is there any other solution?
|02-24-2012, 07:17 AM||#2|
Join Date: Apr 2008
Location: Central Oregon Coast
PDF conversion is always going to be unpredictable. You can try using Adobe Acrobat to get text or Mobipocket Creator to get HTML to run through calibre to get to epub. You might want to download Sigil as well. You can use it to clean up the various problems that will still remain with the best conversion.
You can use Sigil's search and replace function and regular expressions (regex) to try to get rid of the extra text in what you have now. It takes some study. Be sure you use more current postings as you if you try to learn about it because the particular flavor of regex in Sigil has changed recently.
You might be able to look at the current epub you have in code view and see that although the text always changes, the lead into it and out of it is always the same. That could give you a handle to locate it while searching and replacing.
|02-24-2012, 07:25 AM||#3|
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Just in case it is not obvious from the previous reply - caliber is not adding the extra information - it will be present in the PDF, typically as headers and footers.
As was mentioned it is possible to use the Calibre serach and replace feature in regex mode to get rid of this during conversion. However that involves working out a regex expression to dientify the offending text that is specific to this book - as all PDF files differ slightly it has not proved possible to get calibre to work out the required regex and completely automate such removal.
|02-24-2012, 09:43 AM||#4|
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Adding to itimpi
Sigil is great for this type of cleanup.
You can see what code (in CV) is needed for your REGEX.
|02-24-2012, 09:48 AM||#5|
Join Date: Oct 2007
Device: OPUS/PB360,Nexus 7,GzONE, Kobo Mini
Last edited by hidari; 02-28-2012 at 06:38 PM.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Problem converting pdf to epub (size) using calibre||abadguy||6||03-23-2012 06:33 AM|
|problem converting PDF to epub||Herbstwind||Conversion||10||11-10-2011 06:18 AM|
|Problem with accents converting PDF to EPUB||madeira||Calibre||0||07-09-2010 06:15 PM|
|Problem converting PDF to EPUB in calibre||adgpro||Calibre||2||07-09-2010 02:10 AM|
|Problem converting pdf to epub||smartin||Calibre||3||05-02-2010 07:55 AM|