02-11-2019, 04:34 PM | #16 |
Fanatic
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
|
I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why: ... so I have too a PDF coming out of scanner. In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images. Yet, calibre outputs a bunch of images, one per page. So, again, why calibre does not use the "hidden text"? Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?! |
02-12-2019, 12:41 AM | #17 | |
Bibliophagist
Posts: 35,401
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards. |
|
02-12-2019, 02:34 AM | #18 | |
Fanatic
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
|
Quote:
Yet, I would really like to know why calibre does not see that text, or doesn't want to use it. In my case it's a PhD theseis that was typewritten and the text (sort of Courier) is a piece of cake to OCR (and it was OCRed during the scanning). Maybe this is/was not clear: not because it's large (they have to, because they also have images or images only), but because of the PDF->EPUB conversion. I did not want to open a new thread for a problem that was "solved" in this manner: "DO NOT use PDFs!" |
|
02-12-2019, 03:42 AM | #19 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
it's not calibre that decides to use or not the text, it's pdftohtml from the poppler project, which calibre uses for initial content extraction from PDF files.
|
02-12-2019, 06:09 AM | #20 | |
Wizard
Posts: 3,305
Karma: 10259306
Join Date: May 2016
Device: kobo forma, Kobo Libra, Huawei media Tab, fire HD10, PW3 HDX8.9,
|
Quote:
|
|
02-12-2019, 11:14 AM | #21 | |
Fanatic
Posts: 563
Karma: 403106
Join Date: Aug 2014
Device: PRS-T1
|
Quote:
|
|
02-12-2019, 09:28 PM | #22 |
Wizard
Posts: 2,827
Karma: 10700629
Join Date: May 2016
Location: Canada
Device: Onyx Nova
|
You could split up the file into smaller files. At least it would read quicker. Adobe acrobat pro does it, but is expensive. Surprisingly ig seems Cbrome will do it.
https://superuser.com/questions/6847...ile-in-windows |
02-12-2019, 09:48 PM | #23 | |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Convert Image PDF to PDF with text or other ebook format. | Memes | 7 | 05-01-2023 04:52 PM | |
How to keep bold texts in converted pdf, when you convert docx to pdf? | Foxitoff | Conversion | 1 | 11-04-2015 10:24 PM |
Convert epub to pdf, with notes with main text in the pdf? | 8140david | ePub | 1 | 06-18-2015 01:13 PM |
Convert epub to pdf, with notes with main text in the pdf? | 8140david | Conversion | 1 | 06-18-2015 11:02 AM |