![]() |
#16 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
|
I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why: ... so I have too a PDF coming out of scanner. In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images. Yet, calibre outputs a bunch of images, one per page. So, again, why calibre does not use the "hidden text"? Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?! |
![]() |
![]() |
![]() |
#17 | |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,444
Karma: 25244745
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Aura One, Aura H2O, Aura HD, Nexus 7 HD, iPad Air, Tolino epos
|
Quote:
* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards. |
|
![]() |
![]() |
![]() |
#18 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
|
Quote:
Yet, I would really like to know why calibre does not see that text, or doesn't want to use it. In my case it's a PhD theseis that was typewritten and the text (sort of Courier) is a piece of cake to OCR (and it was OCRed during the scanning). Maybe this is/was not clear: not because it's large (they have to, because they also have images or images only), but because of the PDF->EPUB conversion. I did not want to open a new thread for a problem that was "solved" in this manner: "DO NOT use PDFs!" ![]() |
|
![]() |
![]() |
![]() |
#19 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 34,047
Karma: 10261488
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
it's not calibre that decides to use or not the text, it's pdftohtml from the poppler project, which calibre uses for initial content extraction from PDF files.
|
![]() |
![]() |
![]() |
#20 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 1452526
Join Date: May 2016
Device: Samsung tab s , fire HDX 8.9, fire hd 8
|
Quote:
|
|
![]() |
![]() |
![]() |
#21 | |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
|
Quote:
|
|
![]() |
![]() |
![]() |
#22 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,207
Karma: 5200000
Join Date: May 2016
Location: Transatlantic
Device: Sony, Nook, Onyx, Boyue, Kindle
|
You could split up the file into smaller files. At least it would read quicker. Adobe acrobat pro does it, but is expensive. Surprisingly ig seems Cbrome will do it.
https://superuser.com/questions/6847...ile-in-windows |
![]() |
![]() |
![]() |
#23 | |
null operator
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 12,172
Karma: 10633638
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Image PDF to PDF with text or other ebook format. | Memes | 6 | 06-16-2017 04:29 AM | |
How to keep bold texts in converted pdf, when you convert docx to pdf? | Foxitoff | Conversion | 1 | 11-04-2015 11:24 PM |
Convert epub to pdf, with notes with main text in the pdf? | 8140david | ePub | 1 | 06-18-2015 02:13 PM |
Convert epub to pdf, with notes with main text in the pdf? | 8140david | Conversion | 1 | 06-18-2015 12:02 PM |