Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 02-11-2019, 05:34 PM   #16
Ghitulescu
Evangelist
Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'
 
Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why:

... so I have too a PDF coming out of scanner.
In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images.
Yet, calibre outputs a bunch of images, one per page.

So, again, why calibre does not use the "hidden text"?
Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?!
Ghitulescu is offline   Reply With Quote
Old 02-12-2019, 01:41 AM   #17
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 5,453
Karma: 25244745
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Aura One, Aura H2O, Aura HD, Nexus 7 HD, iPad Air, Tolino epos
Quote:
Originally Posted by Ghitulescu View Post
I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why:

... so I have too a PDF coming out of scanner.
In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images.
Yet, calibre outputs a bunch of images, one per page.

So, again, why calibre does not use the "hidden text"?
Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?!
I have one commercially produced scan of a book from the 1870's* The images are not the greatest but they are what the original book looked like. The text plane is useful for searching but when I take a close look at it, it has a multitude of OCR errors which make it painful to read.

* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards.
DNSB is offline   Reply With Quote
Old 02-12-2019, 03:34 AM   #18
Ghitulescu
Evangelist
Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'
 
Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
Quote:
Originally Posted by DNSB View Post
I have one commercially produced scan of a book from the 1870's*
...
The text plane is useful for searching but when I take a close look at it, it has a multitude of OCR errors which make it painful to read.
Yes, it's usually this way. Old books used very "serifed"/decorative fonts that are rather difficult to be OCRed by simple/cheap software. Yes, it's annoying to replace all "m" by "r n" or "i n", all "h" by "li" and stuff. But that text exists.
Yet, I would really like to know why calibre does not see that text, or doesn't want to use it.
In my case it's a PhD theseis that was typewritten and the text (sort of Courier) is a piece of cake to OCR (and it was OCRed during the scanning).

Maybe this is/was not clear: not because it's large (they have to, because they also have images or images only), but because of the PDF->EPUB conversion. I did not want to open a new thread for a problem that was "solved" in this manner: "DO NOT use PDFs!"
Ghitulescu is offline   Reply With Quote
Old 02-12-2019, 04:42 AM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 34,049
Karma: 10261488
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
it's not calibre that decides to use or not the text, it's pdftohtml from the poppler project, which calibre uses for initial content extraction from PDF files.
kovidgoyal is offline   Reply With Quote
Old 02-12-2019, 07:09 AM   #20
stumped
Wizard
stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.stumped ought to be getting tired of karma fortunes by now.
 
Posts: 1,213
Karma: 1452526
Join Date: May 2016
Device: Samsung tab s , fire HDX 8.9, fire hd 8
Quote:
Originally Posted by DNSB View Post
..

* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards.
but not many desert islands have handy supplies of sulphuirc acid, or even windows to apply it to ???
stumped is offline   Reply With Quote
Old 02-12-2019, 12:14 PM   #21
Ghitulescu
Evangelist
Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'Ghitulescu gives new meaning to the word 'superlative.'
 
Posts: 426
Karma: 150782
Join Date: Aug 2014
Device: PRS-T1
Quote:
Originally Posted by stumped View Post
but not many desert islands have handy supplies of sulphuirc acid, or even windows to apply it to ???
I suggest you then to read L'Île mystérieuse (The Mysterious Island) by Jules Verne, in particular Chapter XVII.
Ghitulescu is offline   Reply With Quote
Old 02-12-2019, 10:28 PM   #22
Pajamaman
Wizard
Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.Pajamaman ought to be getting tired of karma fortunes by now.
 
Posts: 1,207
Karma: 5200000
Join Date: May 2016
Location: Transatlantic
Device: Sony, Nook, Onyx, Boyue, Kindle
You could split up the file into smaller files. At least it would read quicker. Adobe acrobat pro does it, but is expensive. Surprisingly ig seems Cbrome will do it.
https://superuser.com/questions/6847...ile-in-windows
Pajamaman is offline   Reply With Quote
Old 02-12-2019, 10:48 PM   #23
BetterRed
null operator
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 12,178
Karma: 10633638
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by Pajamaman View Post
You could split up the file into smaller files. At least it would read quicker. Adobe acrobat pro does it, but is expensive. Surprisingly ig seems Cbrome will do it.
https://superuser.com/questions/6847...ile-in-windows
Good idea - I've used a freebie called PDFsam Basic to good effect to split and extract chapters from PDFs

BR
BetterRed is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert Image PDF to PDF with text or other ebook format. Memes PDF 6 06-16-2017 04:29 AM
How to keep bold texts in converted pdf, when you convert docx to pdf? Foxitoff Conversion 1 11-04-2015 11:24 PM
Convert epub to pdf, with notes with main text in the pdf? 8140david ePub 1 06-18-2015 02:13 PM
Convert epub to pdf, with notes with main text in the pdf? 8140david Conversion 1 06-18-2015 12:02 PM


All times are GMT -4. The time now is 05:16 AM.


MobileRead.com is a privately owned, operated and funded community.