Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 04-03-2013, 05:53 AM   #376
jldg
Junior Member
jldg began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2013
Location: france
Device: koboglo
thank you for four quick answer.
I discover the "crop box" .., explanation is clear.
for this document (no column) , I can use "calibre" to convert pdf to epub.
( k2pdf is bettterfor some details..)
but calibre can't convert multicolumn documents..
I'll see what hapens, next time, with a multi-column doc...
jldg is offline   Reply With Quote
Old 04-03-2013, 10:09 PM   #377
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,607
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Hi

A question about internal margins.

Thanks for a wonderful software. I am a Linux user (64bits) and I have a KoboGlo. I tried k2pdfopt on one of my own PDF (just for a ten pages trial). There is a header.

I used the following command:
Code:
/opt/k2pdfopt brune.pdf -w 748 -h 1024 -odpi 213 -m 0.5 -p 10-20
The PDF end result leaves a 10 px right margin on my Kobo, which is normal because the width I should have used should have been 758px instead of 748.

I would like my text to be centered, that is to have a 5px margin on both sides. I learnt there is an -omb parameter but I could get no result out of it.

What would be your advice to improve the left and right margin display of this text on the Kobo?
Attached Files
File Type: pdf brune.pdf (1.04 MB, 405 views)
File Type: pdf brune_k2opt.pdf (739.6 KB, 335 views)

Last edited by roger64; 04-03-2013 at 10:18 PM.
roger64 is offline   Reply With Quote
Advert
Old 04-04-2013, 12:35 AM   #378
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by roger64 View Post
Hi

A question about internal margins.

Thanks for a wonderful software. I am a Linux user (64bits) and I have a KoboGlo. I tried k2pdfopt on one of my own PDF (just for a ten pages trial). There is a header.

I used the following command:
Code:
/opt/k2pdfopt brune.pdf -w 748 -h 1024 -odpi 213 -m 0.5 -p 10-20
The PDF end result leaves a 10 px right margin on my Kobo, which is normal because the width I should have used should have been 758px instead of 748.

I would like my text to be centered, that is to have a 5px margin on both sides. I learnt there is an -omb parameter but I could get no result out of it.

What would be your advice to improve the left and right margin display of this text on the Kobo?
Use the full pixel width for your device (758) along with margins of 5/213 inches = ~0.025 in, so:

/opt/k2pdfopt brune.pdf -w 758 -h 1024 -odpi 213 -m 0.5 -oml 0.025 -omr 0.025

The -oml and -omr set the left and right output device margins in inches.
willus is offline   Reply With Quote
Old 04-04-2013, 03:33 AM   #379
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,607
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
@willus

Thanks for your explanation for computing the text margins. k2pdfopt is an impressive and amazingly precise tool.

The conversion of a standard book like mine may need two kind of commands, one for the cover page and for the other pages without margins) that may exist, another for normal pages with margins. So we may have several output files for one book. I can use pdfsam to merge these output files.

Last edited by roger64; 04-04-2013 at 03:36 AM.
roger64 is offline   Reply With Quote
Old 04-04-2013, 08:36 AM   #380
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by roger64 View Post
@willus

Thanks for your explanation for computing the text margins. k2pdfopt is an impressive and amazingly precise tool.

The conversion of a standard book like mine may need two kind of commands, one for the cover page and for the other pages without margins) that may exist, another for normal pages with margins. So we may have several output files for one book. I can use pdfsam to merge these output files.
Thanks. You are correct that I don't presently have a good way to apply different options to different pages within a single k2pdfopt conversion, so converting different sets of pages with different options using consecutive commands and then assembling the outputs is the way to go. I had not heard of pdfsam. Thanks for the tip. I use jpdftweak for general PDF file manipulation.
willus is offline   Reply With Quote
Advert
Old 04-04-2013, 05:41 PM   #381
MaxStirner
Connoisseur
MaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic somethingMaxStirner has a certain pleonastic something
 
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
Bilingual texts?

Is there any way for OCRing multiple language pages for example a dictionary page which is (usually) biligual? I don't have any idea if Tesseract allows doing this so it might be impossible to achieve..
MaxStirner is offline   Reply With Quote
Old 04-04-2013, 11:08 PM   #382
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MaxStirner View Post
Is there any way for OCRing multiple language pages for example a dictionary page which is (usually) biligual? I don't have any idea if Tesseract allows doing this so it might be impossible to achieve..
This is a better question for the Tesseract folks. You can always just try the English language OCR in Tesseract and see what you get. For fun, I tried OCR-ing the attached document (multilingual.pdf) which I created using google translate. When I use the English Tesseract training pack (result in multi_eng.pdf), the first three pages--English, French, and German--OCR mostly correctly--some of the special French characters come through, but others are lost or done incorrectly, and the German umlaut doesn't come through, and the Russian (Cyrillic) doesn't get done correctly at all. When I use the Russian training pack (result in multi_rus.pdf), the Russian page is (mostly) correct, but none of the others are. So it depends partly on how different the languages are. I don't see any generic "Romance language" training packs for Tesseract, unfortunately--English is the largest training data package (other than Asian languages), so I'd guess it's your best bet for English/French/Spanish and other English-alphabet languages, though I can't say for certain. Again, a Tesseract expert would have to weigh in.

Note that to see the Russian characters correctly, you need to copy and paste the Russian PDF page into a unicode-aware application (like the google translate box in a modern browser). K2pdfopt does not use the correct Cyrillic font. The commands I used were:

k2pdfopt -mode copy -ocr t -ocrvis t multilingual.pdf -ocrlang eng -o multi_eng.pdf

k2pdfopt -mode copy -ocr t -ocrvis t multilingual.pdf -ocrlang rus -o multi_rus.pdf
Attached Files
File Type: pdf multilingual.pdf (161.2 KB, 340 views)
File Type: pdf multi_eng.pdf (16.8 KB, 337 views)
File Type: pdf multi_rus.pdf (22.0 KB, 613 views)

Last edited by willus; 04-04-2013 at 11:14 PM.
willus is offline   Reply With Quote
Old 04-06-2013, 01:22 PM   #383
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
k2pdfopt v1.65 released

K2pdfopt v1.65 is released. This is a bug fix / maintenance release with some minor new features. See the web site for details.
willus is offline   Reply With Quote
Old 04-06-2013, 01:31 PM   #384
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Vanilla View Post
Quote:
Originally Posted by willus View Post
I know what the issue is, so you don't need to send more examples. It's exactly what you suggested--big blocks getting appended to each other without looking inside of them for line spacings ...
Thank you for answering, i will patiently wait for the next release, whenever that is
@Vanilla -- Try v1.65, just released.
willus is offline   Reply With Quote
Old 04-07-2013, 06:51 AM   #385
Kornholio
Junior Member
Kornholio began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Mar 2013
Device: Sony PRS-T1
Quote:
Originally Posted by willus View Post
Sorry--I missed this post. The problem is that your document size (4.5 x 7 inches) combined with k2pdfopt's default output resolution (167 dpi) results in no wrapping being required. So you have two options if you want wrapped text: (1) increase the output dpi (will make everything larger) to something like 200, or (2) use -wrap+, which will un-wrap the narrow column on the right so that all the text fits the width of your reader screen. You also should use -m 0 to avoid having any clipping since your viewable region runs right to the edge of the page. Finally, for cases like this I like to use -sm so that I can verify how k2pdfopt is interpreting the page layout. Final commands, then:

k2pdfopt -m 0 -sm -fc- -odpi 200 page17.pdf

or

k2pdfopt -m 0 -sm -fc- -wrap+ page17.pdf

(you can also combine -odpi 200 and -wrap+).
Thank you, in the meantime since my previous post was my first and took a while to be moderated (i suspect that's why you missed it as well) i've mostly solved the issue using these arguments (possibly i'm forgetting something):
-m 0 -col 1 -fc- -wrap-
to prevent any wrapping / layout changes or text resizing. It worked pretty good since actually it fits quite well with the reader (wide) size in terms of text size (so just dumb luck basically + switching to landscape orientation). I will try your solution to see how that works out.

Thanks again for your help!

Last edited by Kornholio; 04-07-2013 at 07:00 AM.
Kornholio is offline   Reply With Quote
Old 04-07-2013, 09:01 AM   #386
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Kornholio View Post
Thank you, in the meantime since my previous post was my first and took a while to be moderated (i suspect that's why you missed it as well) i've mostly solved the issue using these arguments (possibly i'm forgetting something):
-m 0 -col 1 -fc- -wrap-
to prevent any wrapping / layout changes or text resizing. It worked pretty good since actually it fits quite well with the reader (wide) size in terms of text size (so just dumb luck basically + switching to landscape orientation). I will try your solution to see how that works out.

Thanks again for your help!
It sounds like you've rotated the document so that you are viewing it in landscape mode on your reader, which the above options would not do. Maybe you used this?

-m 0 -mode fw

The -mode fw is a shortcut for several options. See my command-line options help page for the details. (Actually, you don't need -m 0 anymore with v1.65. It's now the default.) If you didn't try the above command, you should try it. It's a good solution if you don't need text re-flow.
willus is offline   Reply With Quote
Old 04-07-2013, 03:03 PM   #387
JensW
Enthusiast
JensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trolls
 
Posts: 29
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
Hi all,
just wanted to let you know that I have also updated my Windows GUI for k2pdfopt with a few of the new options of k2pdfopt, most important the OCR functions. The GUI contains links to all Tessaract training files, so downloading them is pretty easy. The respective environment variable is set by the GUI, you only have to specify the path where you have extracted the language files.

I did not want to implement the Download and Extraction procedure into the GUI due to possible safety concerns users might have ("Why does that program connect to the internet?!"), so that part is handled by your trusted browser. ;-)

The new version 1.04.1 is available at my homepage


Great work with the updates Willus, thank you once more
JensW is offline   Reply With Quote
Old 04-08-2013, 02:26 PM   #388
dgvirtual
Enthusiast
dgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with othersdgvirtual plays well with others
 
dgvirtual's Avatar
 
Posts: 30
Karma: 2848
Join Date: Feb 2013
Location: Lithuania
Device: Kobo Glo
I have a problem with a three-column text. Inside one page an image is located in a way to overlap the two columns. And the program does not read the page correctly - it does not recognize the text as three column. Now, it renders the second page fine. Now, if I cut the image out by specifying 3,4" bottom margin, the columns get recognized (although the lines separating the columns does not get ignored, which is a mino problem, though).

could something be done about pages like this, or is it just too much play? Here is the file I have a problem with:

http://www.nzidinys.lt/files/various...iene%20txt.pdf

Last edited by dgvirtual; 04-08-2013 at 03:02 PM.
dgvirtual is offline   Reply With Quote
Old 04-08-2013, 10:27 PM   #389
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by dgvirtual View Post
I have a problem with a three-column text. Inside one page an image is located in a way to overlap the two columns. And the program does not read the page correctly - it does not recognize the text as three column. Now, it renders the second page fine. Now, if I cut the image out by specifying 3,4" bottom margin, the columns get recognized (although the lines separating the columns does not get ignored, which is a mino problem, though).

could something be done about pages like this, or is it just too much play? Here is the file I have a problem with:

http://www.nzidinys.lt/files/various...iene%20txt.pdf
Layouts like this make me want to re-think the way I order regions in k2pdfopt, or at least to provide a couple more options, but I was able to get something reasonable with the existing version:

k2pdfopt -col 4 -cgr .4 -evl 1 -sm -mb 1.1 -ch 0.5 Az.pdf

-col 4 enables detection of up to 4 columns (2 levels of recursion).
-cgr .4 limits the horizontal search range for the column divider. The value of .4 gets k2pdfopt to treat the left column divider as the first divider, which is the key to correct layout on page 1.
-evl 1 erases the vertical lines, which helps k2pdfopt find the column dividers.
-sm shows you how k2pdfopt is flowing your document (in the ..._marked.pdf file). You can take that out on the final conversion since it slows things down considerably.
-mb 1.1 ignores the page numbers / footer on the bottom of each page by cropping off the bottom 1.1 inches from each source page.
-ch 0.5 allows regions as short as 0.5 inches in height to be separated into multiple columns, which is important for page 1 (the default is 1.5 inches).
willus is offline   Reply With Quote
Old 04-13-2013, 09:07 AM   #390
Vanilla
Junior Member
Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.Vanilla shines like a glazed doughnut.
 
Posts: 8
Karma: 8900
Join Date: Jan 2013
Device: kindle 4 nt
Thumbs up

Quote:
Originally Posted by willus View Post
@Vanilla -- Try v1.65, just released.
I just tested it, and it works great so far - thank you very much!
Vanilla is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 11:21 PM.


MobileRead.com is a privately owned, operated and funded community.