Problem while converting pdf to epub using calibre

hszforu · 02-24-2012, 01:18 AM

whenever i am converting a pdf to epub using calibre, extra information is added inbetween the epub format.

for example, the following text is added in between the pages in epub format.

P:\010Comp\Begin8\189-0\ch02.vp
Friday, February 11, 2005 7:28:17 AM

Color profile: Generic CMYK printer profile
Begin8 Java: A Beginner’s Guide, 3rd Ed Schildt 3189-0 2
Composite Default screen
Blind Folio 2:38
38

And this text differs everytime, so i cannot use search and replace function
Is there any way to get rid of this text while converting?

Also, when converting from pdf to epub, the tables in the epub format are not displayed, instead they are represented sequentially in the epub format?Is there any other solution?

mrmikel · 02-24-2012, 06:17 AM

PDF conversion is always going to be unpredictable. You can try using Adobe Acrobat to get text or Mobipocket Creator to get HTML to run through calibre to get to epub. You might want to download Sigil as well. You can use it to clean up the various problems that will still remain with the best conversion.

You can use Sigil's search and replace function and regular expressions (regex) to try to get rid of the extra text in what you have now. It takes some study. Be sure you use more current postings as you if you try to learn about it because the particular flavor of regex in Sigil has changed recently.

You might be able to look at the current epub you have in code view and see that although the text always changes, the lead into it and out of it is always the same. That could give you a handle to locate it while searching and replacing.

itimpi · 02-24-2012, 06:25 AM

Just in case it is not obvious from the previous reply - caliber is not adding the extra information - it will be present in the PDF, typically as headers and footers.

As was mentioned it is possible to use the Calibre serach and replace feature in regex mode to get rid of this during conversion. However that involves working out a regex expression to dientify the offending text that is specific to this book - as all PDF files differ slightly it has not proved possible to get calibre to work out the required regex and completely automate such removal.

theducks · 02-24-2012, 08:43 AM

Adding to itimpi

Sigil is great for this type of cleanup.
You can see what code (in CV) is needed for your REGEX.

hidari · 02-24-2012, 08:48 AM

Quote:

Originally Posted by theducks

Adding to itimpi

Sigil is great for this type of cleanup.
You can see what code (in CV) is needed for your REGEX.

As the venerable JSWolf...would say....(not sure what happened to him? self-exile or ostracized)... PDF to epub on calibre is like the roulette wheel....sigil might be the way to go.... or Abby PDF transformer...great program PDF to RTF or PDF to word or PDF to searchable pdf..... it used to cost about 100USD but well worth the cost... also converts two column to one column pdfs...

02-24-2012, 01:18 AM	#1
hszforu Junior Member Posts: 2 Karma: 10 Join Date: Feb 2012 Device: Kindle 4 NT	Problem while converting pdf to epub using calibre whenever i am converting a pdf to epub using calibre, extra information is added inbetween the epub format. for example, the following text is added in between the pages in epub format. P:\010Comp\Begin8\189-0\ch02.vp Friday, February 11, 2005 7:28:17 AM Color profile: Generic CMYK printer profile Begin8 Java: A Beginner’s Guide, 3rd Ed Schildt 3189-0 2 Composite Default screen Blind Folio 2:38 38 And this text differs everytime, so i cannot use search and replace function Is there any way to get rid of this text while converting? Also, when converting from pdf to epub, the tables in the epub format are not displayed, instead they are represented sequentially in the epub format?Is there any other solution?

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Problem converting pdf to epub (size) using calibre	abadguy	PDF	6	03-23-2012 05:33 AM
problem converting PDF to epub	Herbstwind	Conversion	10	11-10-2011 05:18 AM
Problem with accents converting PDF to EPUB	madeira	Calibre	0	07-09-2010 05:15 PM
Problem converting PDF to EPUB in calibre	adgpro	Calibre	2	07-09-2010 01:10 AM
Problem converting pdf to epub	smartin	Calibre	3	05-02-2010 06:55 AM

02-24-2012, 06:17 AM	#2
mrmikel Color me gone Posts: 2,089 Karma: 1445295 Join Date: Apr 2008 Location: Central Oregon Coast Device: PRS-300	PDF conversion is always going to be unpredictable. You can try using Adobe Acrobat to get text or Mobipocket Creator to get HTML to run through calibre to get to epub. You might want to download Sigil as well. You can use it to clean up the various problems that will still remain with the best conversion. You can use Sigil's search and replace function and regular expressions (regex) to try to get rid of the extra text in what you have now. It takes some study. Be sure you use more current postings as you if you try to learn about it because the particular flavor of regex in Sigil has changed recently. You might be able to look at the current epub you have in code view and see that although the text always changes, the lead into it and out of it is always the same. That could give you a handle to locate it while searching and replacing.

02-24-2012, 06:25 AM	#3
itimpi Wizard Posts: 4,552 Karma: 950151 Join Date: Nov 2008 Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)	Just in case it is not obvious from the previous reply - caliber is not adding the extra information - it will be present in the PDF, typically as headers and footers. As was mentioned it is possible to use the Calibre serach and replace feature in regex mode to get rid of this during conversion. However that involves working out a regex expression to dientify the offending text that is specific to this book - as all PDF files differ slightly it has not proved possible to get calibre to work out the required regex and completely automate such removal.

02-24-2012, 08:43 AM	#4
theducks Well trained by Cats Posts: 29,801 Karma: 54830978 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	Adding to itimpi Sigil is great for this type of cleanup. You can see what code (in CV) is needed for your REGEX.