![]() |
#1 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
|
Non-Convert for PDF
I have a very strange problem with converting several PDF books. Yes I know PDF conversion isn't perfect and has issues. I have read all the stuff on PDF conversions and can't figure out the problems I am having.
I have several PDF books in a series that are done in stylized font. I only mention this in case this is somehow a known issue. The book has been OCR-ed for sure. It is a perfect OCR and matches the graphic image of the PDF without any problem. The PDF books are roughly 3.5 megs. The PDF does not have any DRM so that isn't an issue. I can copy all the text I want from the PDF file via Adobe PDF reader on Windows 7 (64-Bit). I have copied several chapters from the book this way and posted them in Sigil to create an EPUB version of the books manually. This takes forever given the headers and footers to be removed manually. When I try and use Calibre to convert from PDF to EPUB all it ever does is convert each page into a resized graphic image for my iPhone format. This ends up making the book 35megs or so compared to the 3.5megs of the PDF. It absolutely is not an issue of there not being any OCR-ed text, as I said I can copy and paste the text without any problem. So the actual ASCII text is in the PDF but for some reason Calibre is not able to see it and thinks it is just images. So it converts the PDF to a 35meg EPUB with images for each page. Any idea why I can copy text from the PDF but it won't convert to EPUB as text/html? |
![]() |
![]() |
![]() |
#2 |
Book Geek
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 596
Karma: 1499085
Join Date: Aug 2010
Location: Adelaide, Australia
Device: Kobo Touch, Asus MemPad 7" tablet, Nexus 5, Asus 10" tablet
|
This sounds really strange - usually it is the PDF file being an "image" not text that does this.
There is a program called Writer2EPUB ((and has a section in the forums) which will convert OpenOffice documents to EPUB. As both are free programs perhaps you could paste the text into Open Office and then convert it. I wonder if the fancy font is causing the problem? Maybe it is confusing Calbre. |
![]() |
![]() |
![]() |
#3 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,888
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
This is covered in the PDF faq... This is a common occurrence with OCR pdfs that use images. It's true that if there is good OCR text that this is normally what's extracted, but there are many ways to define a pdf, and clearly whatever way yours is defined is not compatible with Calibre.
If the underlying OCR is good and you want that text, then the only option I can think of is for you is to use Acrobat Professional to try and change the way the pdf is formatted. There are a bunch of pdf 'optimization' options in Acrobat - for different pdf version compatibility/better compression - sometimes optimizing/re-compressing the pdf using Acrobat will make it compatible with Calibre's libraries (but not always). Other freeware/open source pdf tools may do the same thing, but I don't have any experience with those. |
![]() |
![]() |
![]() |
#5 |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
|
I am on Windows 7 (64-Bit) with all the updated installed. I am using Calibre 0.7.44 which is the latest version posted on the Calibre website. In case it matters, which I doubt, I am using Sigil 0.3.4 to edit the EPUB.
|
![]() |
![]() |
![]() |
#6 |
Zealot
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
|
If you can select and copy text and pictures maybe you could just create a new document in Sigil or your favorite program and copy and past everything into it.
Then Sigil will create the epub from the raw text and pictures instead of having Calibre do the conversion. Archon |
![]() |
![]() |
![]() |
#7 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
|
Quote:
I tried doing what you suggested and told it to reformat/save in it's "mobile" format which made it Acrobat 7.X compatible. Acrobat Pro said the original file was Acrobat 2.3 compatible. Anyway I didn't have any luck with that suggestion. So I guess I am stuck copy and pasting the text in to Sigil and then editing the whole thing by hand. Not what I wanted but at least it will get me the book in EPUB format or any other format I want in the future. Thanks everyone for the suggestions and help. |
|
![]() |
![]() |
![]() |
#8 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
|
|
![]() |
![]() |
![]() |
#9 | |
Junior Member
![]() Posts: 8
Karma: 10
Join Date: Feb 2011
Device: iPhone
|
Quote:
I should mention that I investigated the PDF again and there is no graphic text. There is a font in the PDF Acrobat Pro says. The only graphics I think are like the publisher logo and one or two illustration/pictures. The pages look like there is a background image to make it look like old paper/parchment. I have like 6 PDFs like this and they all behave the same. It's just annoying because Calibre has handled everything else so well. It would be nice to be able to pull out the graphics but I don't recall Calibre doing that before on PDFs. I also mentioned all this in case it could be used to find the problem with Calibre or help it get better. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to convert PDF to XML? | Ambar | Other formats | 3 | 01-12-2012 12:48 PM |
pdf convert artifacts | cybmole | Calibre | 6 | 01-16-2011 08:03 AM |
best way to convert PDF to ePUB - what do you think? | easyrider | Calibre | 50 | 12-29-2010 12:07 PM |
Should I convert pdf? | sammsmom | Sony Reader | 1 | 02-23-2009 02:05 PM |
Convert PDF to what??? | astrodad | Workshop | 2 | 12-28-2007 04:54 PM |