12-20-2010, 02:37 PM | #1 |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
pdf to mobi problem
I posted this problem sometime ago but never really got a good explanation so I will try again. I have a 382 page PDF when opened in Adobe shows up no problem. There is only one graphic in the document and that is the cover. When I import this document into Calibre and try to convert to mobi - the document converts with only 2 pages. It's as if the conversion process doesn't see the accompanying text associated with the file.
I am so frustrated and thought perhaps someone or someone(s) could help me with this. I have even tried some online conversion websites to see if perhaps I was doing something wrong in Calibre. When the websites produce the documents (I tried to convert the PDF into RTF, HTML, LIT and EPUB just on the off chance they would produce the entire doc) it still only shows 2 pages. Has anyone else experienced this and can someone tell me what I might do to get the ENTIRE PDF file to convert? |
12-20-2010, 02:39 PM | #2 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
It's most likely that every page is actually a graphic image, with no underlying text hints. You would need to either use the OCR in Acrobat pro to create underlying text, or extract all the pages to images and use a full fledged OCR program like ABBYY.
|
Advert | |
|
12-20-2010, 03:38 PM | #3 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
https://www.mobileread.com/forums/showthread.php?t=99400 You were told it was probably an image of a page of text, you came back and said it was text and you were asked if you were sure - would you 1) check the size (images are bigger) and 2) try to select individual text with the selection tool. You didn't respond. Quote:
Last edited by Starson17; 12-20-2010 at 03:40 PM. |
||
12-20-2010, 06:24 PM | #4 |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
Well Thanks Starson17 for making me feel like an idiot - didn't realize this wasn't a forum to express frustration. And clearly I asked the same question because obviously I didn't understand the answer the first time. However, I will say that Idolse answer shed some light on the fact that perhaps as was clearly pointed out before...this is a PDF OCR file.
I come to this board often because I find that most of the people here are very versed with many different type of applications and methods to do things related to ebooks. I have learned a great deal from this particular board which I find more technical than other boards. Most people have a way of explaining complex issues simply to laypersons like myself. My apologies for not being technical savvy - I will refrain from annoying people with the same old questions. Appreciate your answer. Last edited by chilady1; 12-20-2010 at 06:29 PM. |
12-20-2010, 11:41 PM | #5 | ||||||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
Folks here tried to help you and you snubbed them. Quote:
Quote:
Quote:
I agree that Idolse' insight is most likely correct but if you want to know for sure and understand how to tell in the future you might want to consider the questions that Starson17 presented you. Last edited by DoctorOhh; 12-20-2010 at 11:44 PM. |
||||||
Advert | |
|
12-21-2010, 11:08 AM | #6 |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
Thank you all for the help!
|
12-21-2010, 11:17 AM | #7 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
So do I, but it would be nice to know. I meant no offense, but I felt it was unfair to say you didn't get "a good explanation" when you didn't answer the questions that would have confirmed the explanation you were given (at this point by 5 different people).
People who provide help here don't ask for much - a simple thank you, and/or confirmation that the problem was solved is fine. Your case looked like the problem Idolse diagnosed, but none of us can be sure. It looked like the "scanned images of text in a pdf" problem when I answered you (20 minutes after you asked for help in the previous thread). You came back and said I was wrong: you had only text, not images of text. I suspected that you simply didn't understand what I wrote, but I didn't have to post my suspicion, since itimpi and Perkin had already done so. It looked like the same problem to them and they wanted you to doublecheck your answer to me. Since you never answered their questions, we didn't know if Calibre had some kind of problem, or if our initial diagnosis was correct. Be at peace. |
12-21-2010, 04:22 PM | #8 | |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
Quote:
|
|
12-21-2010, 04:45 PM | #9 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
So what is the answer? Do you have a normal PDF problem caused by having scanned images of pages with text on them, or do you have something unusual going on where your documents have true text, but won't convert because of a bug or bad character in the text?
|
12-21-2010, 05:19 PM | #10 |
Guru
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
|
@chilady, how large in MB is the pdf and how many pages are in it, and how many pictures (covers etc.)?
|
12-21-2010, 06:14 PM | #11 |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
It is scanned images of pages with text which I believe is OCR and for whatever reason - won't convert using Calibre.
|
12-22-2010, 04:26 AM | #12 |
Addict
Posts: 206
Karma: 547516
Join Date: Mar 2008
Location: Berlin, Germany
Device: KObo Clara, Kobo Aura, PRS-T1, PB602, CyBook Gen3
|
OCR (Optical Character Recognition) is a method to turn text on scanned images into actual text.
The OCR software tries to connect the shape of a letter (seen on the image) to a letter. Depending on the quality of the scan and the font used in the original book this can work well or quite horrible. For example the letters "h" and "b" are often mixed up. So are some other letter combinations. The process of character recognition is rather complicated. That is why good OCR software is often very pricey and why Calibre does not provide it. As far as I understand the PDF conversion in Calibre, it tries to first decide if the PDF is text based or image based. If it encounters an image based PDF, it creates an output of the images. If it encounters a text based PDF, it tries its best to convert the text to a good text based output. During that images that are still in the text based PDF get lost. In your case I think you have a mainly image based PDF that contains some text probably at the beginning. Calibre encounters the text in the PDF and decides that the PDF is text based and produces an output of the available text. It can neither know that the images are the actual important content, nor could it convert them into text if it did. I hope this explanation is understandable, but if you or someone else got further questions I or someone else on this board will try to answer them. We just need to know what this questions are. |
12-22-2010, 09:21 AM | #13 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
It's not OCR. Your problem is that the scanned images of pages of text have not been processed with an OCR program to produce text. Calibre isn't an OCR program and can't convert pictures of text into text (by now I'm pretty sure you understand this) . Your only option is to use an OCR program for conversion or to keep the images and read those. The former can be done in Adobe Acrobat or ABBY and the latter can be done by keeping/reading the original PDF or by removing any leading text, so the document is pure images and converting it the way a comic is converted.
|
12-22-2010, 09:56 AM | #14 |
I devour books!
Posts: 789
Karma: 1285226
Join Date: Mar 2009
Device: iPad Air, Kindle 3/Kobo Aura HD, iPhone 6
|
Understood, this makes sense and I appreciate everyone's great info on the differences. Won't need to ask this question anymore. Thanks all!
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF-to-mobi conversion problem | ruffunatio | Calibre | 3 | 09-26-2010 03:01 PM |
calibre(kindle 3) pdf to mobi problem. | lutwey | Calibre | 17 | 09-23-2010 12:15 PM |
Epub/Mobi TO pdf conversion problem | Hitch | Calibre | 4 | 06-15-2010 05:28 PM |
Pdf to Mobi/Epub Format Problem. | dubmehard | Calibre | 4 | 02-19-2010 01:53 PM |
PDF to Mobi conversion problem | DavidJD | Calibre | 6 | 10-04-2009 11:27 AM |