Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 09-11-2016, 04:45 PM   #1306
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by knntmr View Post
Ok, for now renaming with adding -title ".." worked very well.

My second issue is with the tables. I assume there is no perfect way for the program to detect tables and leave them as they are.

In a 2-column pdf, I generally prefer to reflow into 1.

But,

A. Sometimes there are pages in landscape mode with a full blown table.
B. Sometimes there are tables occupying half of the page and then followed by two columns.

In my experience, the conversion is butchering the tables in these scenarios. These also tend to happen in 1 column articles in some occasions. Is there any way to prevent this?

For example:

A: Leaving a selected page as it is and converting/reflowing the rest.
B: Manually selecting a section of one page (for a half-pager table) and reflowing the rest of the sections on that page.

It will be perfect for me if I can solve this issue.

Again, many thanks.
There is not presently an option to treat a cropped region (-cbox) differently from the rest of the document (e.g. not re-flowed). I could think about how to do something like that. I believe Adobe Reader DC allows you to overlay graphical markings into a PDF, so you could use that to add boxes around your tables, which will prevent k2pdfopt from re-flowing them.

Edit: See this newly created help page on the topic...

Last edited by willus; 10-07-2016 at 11:17 PM.
willus is offline   Reply With Quote
Old 10-18-2016, 01:29 PM   #1307
mauricebis
Junior Member
mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'
 
Posts: 4
Karma: 42208
Join Date: Oct 2016
Device: kindle
How to improve conversion k2pdfopt

Hello,

I'm trying to use k2pdfopt. It almost did a perfect job on a few sample pages but I don't understand why a block is not recognised on a column while columns are all similar. The source document is a scanned document and i used the command: k2pdfopt age.pdf -ui- -w 560 -h 735 -dpi 150 -as -col 2 -ac -sm -o k2try.pdf. As can be seen on the result attached, the first lines of the 1st column of the second page are not recognised as a block ? Did I miss an option ?

Thanks for your help.
Attached Files
File Type: pdf age_marked.pdf (850.7 KB, 187 views)
mauricebis is offline   Reply With Quote
Advert
Old 10-18-2016, 11:33 PM   #1308
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by mauricebis View Post
Hello,

I'm trying to use k2pdfopt. It almost did a perfect job on a few sample pages but I don't understand why a block is not recognised on a column while columns are all similar. The source document is a scanned document and i used the command: k2pdfopt age.pdf -ui- -w 560 -h 735 -dpi 150 -as -col 2 -ac -sm -o k2try.pdf. As can be seen on the result attached, the first lines of the 1st column of the second page are not recognised as a block ? Did I miss an option ?

Thanks for your help.
My guess would be that the gray pixels due to the scanning of the page (see circled regions on attached image) are preventing the line detection. There may be some options you could tweak in terms of the -gtr option. But for books like this I think you're better off running k2pdfopt in two passes. First, convert the pdf from two book pages per page to one book page per page by using the -cbox option (two crop boxes per page). Something like this:

Code:
k2pdfopt -mode crop -cbox 1.137in,0.3018in,4.427in,7.827in -cbox 5.735in,0.3119in,4.336in,7.727in source.pdf -o intermediate.pdf
You may have to adjust the crop boxes depending on how consistent the scanned pages are. If you do this, the auto-straighten and auto-contrast adjust will work better than if you try to use the -ac option to auto crop. So you then process the intermediate output like so:

Code:
k2pdfopt -ui- -w 560 -h 735 -dpi 150 -as intermediate.pdf -o final.pdf
I wasn't able to test this since I do not have your source file, but I think it will work better than what you're doing.
Attached Thumbnails
Click image for larger version

Name:	column.png
Views:	191
Size:	247.8 KB
ID:	152454  

Last edited by willus; 10-18-2016 at 11:34 PM. Reason: Forgot attachment
willus is offline   Reply With Quote
Old 10-20-2016, 04:21 PM   #1309
mauricebis
Junior Member
mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'mauricebis understands when you whisper 'The dog barks at midnight.'
 
Posts: 4
Karma: 42208
Join Date: Oct 2016
Device: kindle
thanks

Many thanks for these detailed explanations.
mauricebis is offline   Reply With Quote
Old 10-22-2016, 03:47 PM   #1310
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
k2pdfopt v2.35 released

K2pdfopt v2.35 is released. This is a mostly bug-fix release with some minor new features and updated libraries and compiling platforms. See details at the web site.
willus is offline   Reply With Quote
Advert
Old 10-23-2016, 12:34 PM   #1311
drjd
The Couch Potato
drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.drjd ought to be getting tired of karma fortunes by now.
 
drjd's Avatar
 
Posts: 34,509
Karma: 230999999
Join Date: Aug 2015
Device: Kobo Glo, Kobo Touch, Archos 9, Onyx Boox C67ML Carta
Quote:
Originally Posted by willus View Post
K2pdfopt v2.35 is released. This is a mostly bug-fix release with some minor new features and updated libraries and compiling platforms. See details at the web site.
Thanks for the update! Getting it!
drjd is offline   Reply With Quote
Old 11-17-2016, 04:39 PM   #1312
aanno
Junior Member
aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.aanno knows what's going on.
 
Posts: 2
Karma: 25000
Join Date: Nov 2016
Device: Kobo Aura One, Kindle Keyboard, Sony PRS-T2
Talking k2pdfopt is a great tool

Dear willus,

thank you for your amazing k2pdfopt. I tried several tools to make PDFs readable on an eBook reader (including calibre and commercial OCR tools like Abbyy) but your tools works best and fastest.

I'm still a bit curious about the technology: How do you archive the result? I guess there is no way from your PDF result to a 'real' eBook format (like epub)?!?

Kind regards,

aanno
aanno is offline   Reply With Quote
Old 11-18-2016, 11:46 AM   #1313
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by aanno View Post
Dear willus,

thank you for your amazing k2pdfopt. I tried several tools to make PDFs readable on an eBook reader (including calibre and commercial OCR tools like Abbyy) but your tools works best and fastest.

I'm still a bit curious about the technology: How do you archive the result? I guess there is no way from your PDF result to a 'real' eBook format (like epub)?!?

Kind regards,

aanno
Thank you for the nice feedback. If you read through the k2pdfopt home page, it talks a little about how k2pdfopt works--by analyzing the visual image of each page and looking for rectangular regions (boxes) of text that it can break out into smaller pages. For word wrapping, it then breaks each region into text rows, again using pattern analysis, and then into individual words so that it can re-flow the text if desired. The algorithm is heuristic and does not always work correctly (as I am often told!), but for many "standard" formats that don't have a lot of variation, it works well.

You could try using Office 365 (Word) to read your PDF file--it will directly read PDF files and has good capability to convert scanned PDFs to Word using OCR. You might even try opening the k2pdfopt conversion in Word and see what that looks like--if it's formatted closer to the way you want for an epub. Either way, if you can get your document into Word format, you'll have a lot more capability to convert to epub using Sigil, for example.
willus is offline   Reply With Quote
Old 11-18-2016, 02:19 PM   #1314
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
I think I once tried converting a k2pdfopt-converted PDF into EPUB using calibre... that was when I was initiated into the existence of invisible layers, which is of course so typically PDF.
It looked extra horrible, with triply-repeated lines though the whole thing.

Would Word be capable of ignoring that kind of hidden information?
eschwartz is offline   Reply With Quote
Old 11-18-2016, 04:06 PM   #1315
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by eschwartz View Post
I think I once tried converting a k2pdfopt-converted PDF into EPUB using calibre... that was when I was initiated into the existence of invisible layers, which is of course so typically PDF.
It looked extra horrible, with triply-repeated lines though the whole thing.

Would Word be capable of ignoring that kind of hidden information?
Take a look at the PDF-to-Word conversion examples in this post.
willus is offline   Reply With Quote
Old 11-23-2016, 12:53 AM   #1316
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@willlus

Your examples are pretty convincing and the quality of the results is baffling.

I remarked that the source PDF files are in PDF/A format (or at least that I can select any piece of text). Not all PDF we can find fulfill that feature. Does it mean that the "image" PDF (maybe there is a more precise word to qualify them) cannot be processed that way?
roger64 is offline   Reply With Quote
Old 11-23-2016, 01:17 PM   #1317
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by roger64 View Post
@willlus

Your examples are pretty convincing and the quality of the results is baffling.

I remarked that the source PDF files are in PDF/A format (or at least that I can select any piece of text). Not all PDF we can find fulfill that feature. Does it mean that the "image" PDF (maybe there is a more precise word to qualify them) cannot be processed that way?
If the PDF does not have text which can be selected, i.e. it is scanned or is a sequence of bitmapped pages, then you can use the OCR feature in k2pdfopt to convert the scanned text to a text layer which allows selection of text. See the OCR help page.
willus is offline   Reply With Quote
Old 11-26-2016, 01:55 PM   #1318
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
k2pdfopt v2.36 released

K2pdfopt v2.36 is released. This is a mostly bug-fix release with some minor new features. See details at the web site.
willus is offline   Reply With Quote
Old 11-29-2016, 06:29 AM   #1319
behrooz
Junior Member
behrooz began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2016
Device: kindel whitepaper 7th
Hello , first of all apologize for my English and congratulations for the site. I have kindel whitepaper 7th 2015. Which is best option in k2pdfopt for my device?
behrooz is offline   Reply With Quote
Old 11-29-2016, 09:37 PM   #1320
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by behrooz View Post
Hello , first of all apologize for my English and congratulations for the site. I have kindel whitepaper 7th 2015. Which is best option in k2pdfopt for my device?
If your device is the Kindle Paperwhite 3 released in 2015, it would be:

-dev kp3

If it's a kindle 7 released in 2014:

-dev k2

You can run

k2pdfopt -dev ?

to see a list of all devices and their resolutions and compare to this page.
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 08:59 AM.


MobileRead.com is a privately owned, operated and funded community.