Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 10-26-2015, 08:55 AM   #1201
crankypants
Zealot
crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.crankypants ought to be getting tired of karma fortunes by now.
 
Posts: 112
Karma: 2016606
Join Date: Oct 2015
Device: Android 4.2 Google Play Reader
Google books often has PDF files which are just a set of images of scans from an old book. Does this software convert those scanned images (inside the PDF) to text or EPUB? Calibre does this but only with 98% accuracy and Calibre doesn't support ligatures (like "if" next to each other which then becomes one electronic character). So if I have 1,000,000 words total in the book, then I have to find and correct 20,000 words that didn't get identified correctly. And that usually means going back to the PDF to read the actual text and type it in.
crankypants is offline   Reply With Quote
Advert
Old 10-26-2015, 11:09 PM   #1202
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by crankypants View Post
Google books often has PDF files which are just a set of images of scans from an old book. Does this software convert those scanned images (inside the PDF) to text or EPUB? Calibre does this but only with 98% accuracy and Calibre doesn't support ligatures (like "if" next to each other which then becomes one electronic character). ...
K2pdfopt generates PDF output only, but it will add an OCR layer to the scanned text, or you can output the OCR'd text directly to an ASCII text file. It uses the Tesseract OCR engine, so that will govern its accuracy. I don't know if it is better than calibre--I'm not sure which OCR engine calibre uses. I'm also not sure what you mean by "supports ligatures." Do you mean you want it to generate a special "ligature" character code, or you want it to correctly break ligatures into their two separate letters? To be honest, I don't recall Tesseract's behavior on ligatures at the moment, either way. It's easy enough to try it out.

PS. Are you sure calibre is doing the OCR and the OCR layer isn't already in the scanned file? As far as I can tell, calibre does not have integrated OCR capability unless you are using it with a third-party tool. If the OCR is in the scanned file, it's probably done with Tesseract already, since Tesseract is supported by Google.

Last edited by willus; 10-27-2015 at 08:52 AM.
willus is offline   Reply With Quote
Old 10-28-2015, 06:33 PM   #1203
oren64
I ❤ Reading
oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.oren64 ought to be getting tired of karma fortunes by now.
 
oren64's Avatar
 
Posts: 2,387
Karma: 31162271
Join Date: Mar 2015
Location: Israel
Device: kobo glo
I want to say thanks for this excellent tool, now i can read books as pdf in my Kobo in much more pleasant way.

The converting of pdf in Hebrew RtL is wonderful, before i convert the pdf to epub and the result is not satisfying, and the process is exhausting pdf>word>pdf>word>htm>epub (the twice pdf>word is because the text is showing inverted at first).

My request is to add an option, to add cover as image file.
part of my pdf files don't have a cover, so i convert the image file to pdf with software and merge image-pdf to the book with Simpo PDF Merge.

Last edited by oren64; 10-29-2015 at 02:29 PM. Reason: with, image
oren64 is offline   Reply With Quote
Old 10-28-2015, 10:20 PM   #1204
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by oren64 View Post
I want to say thanks for this excellent tool ...
My request is to add an option, to add cover as img file.
part of my pdf files don't have a cover, so i convert the img file to pdf whit software and merge image-pdf to the book with Simpo PDF Merge.
Thank you. That's a good idea. I'll add it to my feature request list.
willus is offline   Reply With Quote
Old 11-01-2015, 05:04 AM   #1205
kdalloway
Junior Member
kdalloway began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Oct 2015
Device: kindle paperwhite3
crash on win10,older cpu version can not download
kdalloway is offline   Reply With Quote
Old 11-01-2015, 05:30 AM   #1206
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by kdalloway View Post
crash on win10,older cpu version can not download
Can you possibly post or PM me the source PDF file and the options you used to convert it? if your PC was made after 2008, the old CPU version probably won't help. 32 or 64 bit version?
Edit: I just verified the older-CPU version. It downloaded on the first try and runs correctly.

Last edited by willus; 11-01-2015 at 09:35 AM.
willus is offline   Reply With Quote
Old 11-01-2015, 10:56 AM   #1207
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,148
Karma: 82980001
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by willus View Post
K2pdfopt generates PDF output only, but it will add an OCR layer to the scanned text, or you can output the OCR'd text directly to an ASCII text file. It uses the Tesseract OCR engine, so that will govern its accuracy. I don't know if it is better than calibre--I'm not sure which OCR engine calibre uses. I'm also not sure what you mean by "supports ligatures." Do you mean you want it to generate a special "ligature" character code, or you want it to correctly break ligatures into their two separate letters? To be honest, I don't recall Tesseract's behavior on ligatures at the moment, either way. It's easy enough to try it out.

PS. Are you sure calibre is doing the OCR and the OCR layer isn't already in the scanned file? As far as I can tell, calibre does not have integrated OCR capability unless you are using it with a third-party tool. If the OCR is in the scanned file, it's probably done with Tesseract already, since Tesseract is supported by Google.
Yup -- calibre will rely on existing OCR in the file, but otherwise simply adds the images themselves.
eschwartz is offline   Reply With Quote
Old 11-12-2015, 02:54 PM   #1208
isaacbh
Enthusiast
isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.
 
Posts: 36
Karma: 56000
Join Date: Mar 2015
Device: Aura H20, iPad Air 2
I'm using the cbox option quite a bit on complex PDFs. I was wondering why k2pdfopt will ignore any non-cbox'd pages. Is there a reason for this design? Right now I have to use -cbox 0,0 for any other page or page range. Wouldn't it be easier if k2pdfopt will assume that all pages without an explicit cbox should be treated with an implicit cbox 0,0? (This applies to ibox as well).

Last edited by isaacbh; 11-12-2015 at 03:00 PM.
isaacbh is offline   Reply With Quote
Old 11-12-2015, 10:28 PM   #1209
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by isaacbh View Post
I'm using the cbox option quite a bit on complex PDFs. I was wondering why k2pdfopt will ignore any non-cbox'd pages. Is there a reason for this design? Right now I have to use -cbox 0,0 for any other page or page range. Wouldn't it be easier if k2pdfopt will assume that all pages without an explicit cbox should be treated with an implicit cbox 0,0? (This applies to ibox as well).
I'll consider adding an option to default to -cbox 0,0 on all pages that don't already have a -cbox specified. But I don't think I want -ibox to default to anything in particular for all unspecified pages.
willus is offline   Reply With Quote
Old 11-12-2015, 11:57 PM   #1210
isaacbh
Enthusiast
isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.isaacbh actually enjoys Vogon poetry.
 
Posts: 36
Karma: 56000
Join Date: Mar 2015
Device: Aura H20, iPad Air 2
Oh right, defaulting ibox to 0,0 is kinda stupid But thanks for considering it for cbox
isaacbh is offline   Reply With Quote
Old 11-14-2015, 12:26 PM   #1211
kahnwong
Junior Member
kahnwong began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2015
Device: Kindle Paperwhite 3 & Kindle Touch
Hi. Crop function works wonder. However, I have some problem with re-flow text. I converted a PDF using re-flow (-dev kv -wrap+) and when I read it on Paperwhite 3, I got this. Wonder whether it's on my end or not.
Attached Thumbnails
Click image for larger version

Name:	screenshot_2015_11_15T00_13_09+0700.png
Views:	182
Size:	123.2 KB
ID:	143731  
kahnwong is offline   Reply With Quote
Old 11-14-2015, 08:01 PM   #1212
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by kahnwong View Post
Hi. Crop function works wonder. However, I have some problem with re-flow text. I converted a PDF using re-flow (-dev kv -wrap+) and when I read it on Paperwhite 3, I got this. Wonder whether it's on my end or not.
That's difficult to diagnose without more information. When you view the PDF on a PC reader, is it also cropped like that? Are you using any other options? Does the conversion look right in the preview window (if you're using Windows)? Can you post (or PM me) the source and converted PDF files?
willus is offline   Reply With Quote
Old 12-16-2015, 08:59 AM   #1213
fullspeedin2sun
Junior Member
fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.
 
Posts: 5
Karma: 15180
Join Date: Dec 2015
Device: Kindle Paperwhite 3
can native pdf output alter text size?

I have a large document with tiny text that is tough to read and was hoping that I could make the text larger with k2pdfopt. Can I change the text size of the pdf when using native PDF output? The size of the text in the preview doesn't seem to alter whether I set the DPI to 100 or the 300 my Paperwhite 3 runs at.
fullspeedin2sun is offline   Reply With Quote
Old 12-16-2015, 05:03 PM   #1214
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 883
Karma: 6370957
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by fullspeedin2sun View Post
I have a large document with tiny text that is tough to read and was hoping that I could make the text larger with k2pdfopt. Can I change the text size of the pdf when using native PDF output? The size of the text in the preview doesn't seem to alter whether I set the DPI to 100 or the 300 my Paperwhite 3 runs at.
Yes, you should be able to magnify the text with k2pdfopt, but it depends on what settings you are using and how the document is laid out. If magnifying the text means you also have to re-flow it, then native mode won't work.

Can you post the source PDF or a couple of pages from it, or PM it to me? Also can you send a screen shot of the options you are using?
willus is offline   Reply With Quote
Old 12-21-2015, 02:52 PM   #1215
fullspeedin2sun
Junior Member
fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.fullspeedin2sun is as sexy as a twisted cruller doughtnut.
 
Posts: 5
Karma: 15180
Join Date: Dec 2015
Device: Kindle Paperwhite 3
source PDF and screen shot of my k2pdfopt options

The options I'm using are in the attached screencap. I was hoping to make this PDF easier to read on my Kindle Paperwhite 3 without making it much bigger as it looks like it was natively created.

Also, is there an option to add margins in the GUI?
Attached Thumbnails
Click image for larger version

Name:	2015-12-21_14-44-48.png
Views:	146
Size:	300.2 KB
ID:	144786  
fullspeedin2sun is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 06:53 PM.


MobileRead.com is a privately owned, operated and funded community.