Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-08-2011, 01:22 AM   #1
tjung
Junior Member
tjung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
Non-Convert for PDF

I have a very strange problem with converting several PDF books. Yes I know PDF conversion isn't perfect and has issues. I have read all the stuff on PDF conversions and can't figure out the problems I am having.

I have several PDF books in a series that are done in stylized font. I only mention this in case this is somehow a known issue. The book has been OCR-ed for sure. It is a perfect OCR and matches the graphic image of the PDF without any problem. The PDF books are roughly 3.5 megs. The PDF does not have any DRM so that isn't an issue. I can copy all the text I want from the PDF file via Adobe PDF reader on Windows 7 (64-Bit). I have copied several chapters from the book this way and posted them in Sigil to create an EPUB version of the books manually. This takes forever given the headers and footers to be removed manually.

When I try and use Calibre to convert from PDF to EPUB all it ever does is convert each page into a resized graphic image for my iPhone format. This ends up making the book 35megs or so compared to the 3.5megs of the PDF.

It absolutely is not an issue of there not being any OCR-ed text, as I said I can copy and paste the text without any problem. So the actual ASCII text is in the PDF but for some reason Calibre is not able to see it and thinks it is just images. So it converts the PDF to a 35meg EPUB with images for each page.

Any idea why I can copy text from the PDF but it won't convert to EPUB as text/html?
tjung is offline   Reply With Quote
Old 02-08-2011, 01:30 AM   #2
CazMar
Book Geek
CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.CazMar ought to be getting tired of karma fortunes by now.
 
Posts: 596
Karma: 1499085
Join Date: Aug 2010
Location: Adelaide, Australia
Device: Kobo Touch, Asus MemPad 7" tablet, Nexus 5, Asus 10" tablet
This sounds really strange - usually it is the PDF file being an "image" not text that does this.
There is a program called Writer2EPUB ((and has a section in the forums) which will convert OpenOffice documents to EPUB. As both are free programs perhaps you could paste the text into Open Office and then convert it. I wonder if the fancy font is causing the problem? Maybe it is confusing Calbre.
CazMar is offline   Reply With Quote
Old 02-08-2011, 02:08 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by tjung View Post
When I try and use Calibre to convert from PDF to EPUB all it ever does is convert each page into a resized graphic image for my iPhone format. This ends up making the book 35megs or so compared to the 3.5megs of the PDF.

~~~

Any idea why I can copy text from the PDF but it won't convert to EPUB as text/html?
No idea. What OS and version of calibre are you using?
DoctorOhh is offline   Reply With Quote
Old 02-08-2011, 02:13 AM   #4
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
This is covered in the PDF faq... This is a common occurrence with OCR pdfs that use images. It's true that if there is good OCR text that this is normally what's extracted, but there are many ways to define a pdf, and clearly whatever way yours is defined is not compatible with Calibre.

If the underlying OCR is good and you want that text, then the only option I can think of is for you is to use Acrobat Professional to try and change the way the pdf is formatted. There are a bunch of pdf 'optimization' options in Acrobat - for different pdf version compatibility/better compression - sometimes optimizing/re-compressing the pdf using Acrobat will make it compatible with Calibre's libraries (but not always).

Other freeware/open source pdf tools may do the same thing, but I don't have any experience with those.
ldolse is offline   Reply With Quote
Old 02-08-2011, 03:17 PM   #5
tjung
Junior Member
tjung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
Quote:
Originally Posted by dwanthny View Post
No idea. What OS and version of calibre are you using?
I am on Windows 7 (64-Bit) with all the updated installed. I am using Calibre 0.7.44 which is the latest version posted on the Calibre website. In case it matters, which I doubt, I am using Sigil 0.3.4 to edit the EPUB.
tjung is offline   Reply With Quote
Old 02-08-2011, 04:16 PM   #6
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
If you can select and copy text and pictures maybe you could just create a new document in Sigil or your favorite program and copy and past everything into it.

Then Sigil will create the epub from the raw text and pictures instead of having Calibre do the conversion.

Archon
Archon is offline   Reply With Quote
Old 02-08-2011, 04:53 PM   #7
tjung
Junior Member
tjung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
Quote:
Originally Posted by ldolse View Post
This is covered in the PDF faq... This is a common occurrence with OCR pdfs that use images. It's true that if there is good OCR text that this is normally what's extracted, but there are many ways to define a pdf, and clearly whatever way yours is defined is not compatible with Calibre.

If the underlying OCR is good and you want that text, then the only option I can think of is for you is to use Acrobat Professional to try and change the way the pdf is formatted. There are a bunch of pdf 'optimization' options in Acrobat - for different pdf version compatibility/better compression - sometimes optimizing/re-compressing the pdf using Acrobat will make it compatible with Calibre's libraries (but not always).

Other freeware/open source pdf tools may do the same thing, but I don't have any experience with those.

I tried doing what you suggested and told it to reformat/save in it's "mobile" format which made it Acrobat 7.X compatible. Acrobat Pro said the original file was Acrobat 2.3 compatible. Anyway I didn't have any luck with that suggestion. So I guess I am stuck copy and pasting the text in to Sigil and then editing the whole thing by hand. Not what I wanted but at least it will get me the book in EPUB format or any other format I want in the future.

Thanks everyone for the suggestions and help.
tjung is offline   Reply With Quote
Old 02-09-2011, 11:53 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by tjung View Post
I tried doing what you suggested and told it to reformat/save in it's "mobile" format which made it Acrobat 7.X compatible. Acrobat Pro said the original file was Acrobat 2.3 compatible. Anyway I didn't have any luck with that suggestion. So I guess I am stuck copy and pasting the text in to Sigil and then editing the whole thing by hand. Not what I wanted but at least it will get me the book in EPUB format or any other format I want in the future.

Thanks everyone for the suggestions and help.
It's odd that a pdf with good underlying OCR text can't be saved as text from inside Acrobat. It isn't clear if your attempt to "reformat" the pdf included simply asking Acrobat to save the document as text, but if you haven't tried that, it's worth looking at again.
Starson17 is offline   Reply With Quote
Old 02-09-2011, 06:33 PM   #9
tjung
Junior Member
tjung began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
Quote:
Originally Posted by Starson17 View Post
It's odd that a pdf with good underlying OCR text can't be saved as text from inside Acrobat. It isn't clear if your attempt to "reformat" the pdf included simply asking Acrobat to save the document as text, but if you haven't tried that, it's worth looking at again.
I didn't see in Acrobat Pro the option to save as text. I will go back and check for that. I was hoping to pull out a few of the graphics in the PDF but given the issues I don't see how I can do that.

I should mention that I investigated the PDF again and there is no graphic text. There is a font in the PDF Acrobat Pro says. The only graphics I think are like the publisher logo and one or two illustration/pictures. The pages look like there is a background image to make it look like old paper/parchment.

I have like 6 PDFs like this and they all behave the same. It's just annoying because Calibre has handled everything else so well. It would be nice to be able to pull out the graphics but I don't recall Calibre doing that before on PDFs. I also mentioned all this in case it could be used to find the problem with Calibre or help it get better.
tjung is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert PDF to XML? Ambar Other formats 3 01-12-2012 12:48 PM
pdf convert artifacts cybmole Calibre 6 01-16-2011 08:03 AM
best way to convert PDF to ePUB - what do you think? easyrider Calibre 50 12-29-2010 12:07 PM
Should I convert pdf? sammsmom Sony Reader 1 02-23-2009 02:05 PM
Convert PDF to what??? astrodad Workshop 2 12-28-2007 04:54 PM


All times are GMT -4. The time now is 07:33 AM.


MobileRead.com is a privately owned, operated and funded community.