Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 12-05-2017, 08:51 AM   #1486
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Please post examples where you are getting a segfault--the source file and the exact command options you are using to convert it, and what page it happens on. I can fix that for the next release. Are you using v2.42? If a particular page is causing a segfault, you can use the -px option to avoid processing it, e.g. -px 10,20,34-35 will skip pages 10, 20, and 34-35 of the source file.

You need not split the pdf before running k2pdfopt. You can use the -p option to process specific page ranges. There are a number of programs that will join/merge PDFs without extra processing, e.g. cpdf is what I usually use--very fast. I don't know which ones are available for OSX, though.
willus is offline   Reply With Quote
Old 12-12-2017, 11:24 AM   #1487
RikyRap
Junior Member
RikyRap began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2017
Device: none
Sorry, I'm a new users of ebooks and I want to buy a new Kindle paperwhite 3, I see this incredible program k2pdfopt, that I want to use for my pdf on my kindle paperwhite 3, but I see that when I select paperwhite 3 in the menu of the program, automatically it selects Kindle Voyage and not Kindle paperwhite 3....Why? Perhaps because Vojage has the same resolution of paperwhite 3 or is there a problem in program?? Thanks for the program and the work.
RikyRap is offline   Reply With Quote
Old 12-12-2017, 05:53 PM   #1488
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
Quote:
Originally Posted by willus View Post
Please post examples where you are getting a segfault--the source file and the exact command options you are using to convert it, and what page it happens on.
It was happening on different pages, but I had about 4 seg faults before 1 success on this pdf:

https://archive.org/details/tactics03balcgoog

I had already ocred in elucidate (tesseract), and then ran -mode copy -dev dx. For the last 3 times, I closed most other apps, aside from scrolling and break-scheduling software.
MarjaE is offline   Reply With Quote
Old 12-12-2017, 08:55 PM   #1489
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by RikyRap View Post
Sorry, I'm a new users of ebooks and I want to buy a new Kindle paperwhite 3, I see this incredible program k2pdfopt, that I want to use for my pdf on my kindle paperwhite 3, but I see that when I select paperwhite 3 in the menu of the program, automatically it selects Kindle Voyage and not Kindle paperwhite 3....Why? Perhaps because Vojage has the same resolution of paperwhite 3 or is there a problem in program?? Thanks for the program and the work.
See Post #1467.
willus is offline   Reply With Quote
Old 12-13-2017, 06:22 AM   #1490
RikyRap
Junior Member
RikyRap began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2017
Device: none
Quote:
Originally Posted by willus View Post
Perfect, thanks a lot
RikyRap is offline   Reply With Quote
Old 12-13-2017, 08:16 AM   #1491
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
It was happening on different pages, but I had about 4 seg faults before 1 success on this pdf:

https://archive.org/details/tactics03balcgoog

I had already ocred in elucidate (tesseract), and then ran -mode copy -dev dx. For the last 3 times, I closed most other apps, aside from scrolling and break-scheduling software.
So I take it the segfaults do not occur consistently since you were able to get it to run to completion eventually? Or do they usually occur on the same page(s)? I ran the PDF file from the website (no elucidate processing) on MS Windows 7 64-bit with k2pdfopt v2.42 and the command options you listed and had no issues. I'll try it next on OSX after processing with elucidate.
willus is offline   Reply With Quote
Old 12-13-2017, 12:55 PM   #1492
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
They aren't especially consistent. One of my other pdfs would sometimes fail at the same page, but not all the time.
MarjaE is offline   Reply With Quote
Old 12-16-2017, 03:56 AM   #1493
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Thumbs up Line breaks & Handling Chinese

Thank you for your great tool. This is the best software I've ever used for PDF optimization.

While converting a Chinese document, I have found some unnecessary line breaks. I use the v2.42 on Windows and "smart line breaks" is unchecked for this conversion (when checked, more unnecessary line breaks appear).

Command line: -dev kp2 -fs 16.5 -col 1 -ws -0
Additional Options: -bp[-] -om 0.2 -y

For example, the underlined part is in one single sentence, separated by a comma, but k2pdfopt broke that into two lines (see images). I have more examples if you need them.

Click image for larger version

Name:	before.png
Views:	282
Size:	154.0 KB
ID:	160693 Click image for larger version

Name:	after.png
Views:	269
Size:	159.0 KB
ID:	160694

Here is the PDF line break.pdf. To save space, I have only kept the page in question. If you convert this file with the setting above, the problematic lines are the last line of result page 1 and the first line of result page 2 (should be on the same line).

I don't know if it has something to do with the fact that the words are in Chinese. In Chinese, there is no space between words. In a sentence, there are only characters and punctuation marks. Non-native speakers can think of it roughly as numbers plus punctuation marks.

For example

Code:
234946543,************。
Suppose that each number is a Chinese character, and some words consist of multiple characters. Let's say that "23" stands for "students", 4 "don’t", 94 "like", 65 "that", 43 "teacher". So "234946543" means "The students don’t like that teacher". There are no space between words or characters. We know how to separate words (23-4-94-65-43) just by reading the segment 234946543.

And a Chinese word that has multiple characters can be separated between lines. Normally, a single line has a (relatively) fixed number of characters. If a line has, say, a width of eight characters, this sentence would be

Code:
23494654
3,*******
*****。
Code:
学生不喜欢那个老
师,◎◎◎◎◎◎
◎◎◎◎◎。

Even though 43(老师) is the word for "teacher", the two characters that make up the word—4 老 and 3 师—are still on different lines, and this is the norm. It’s a bit like

Quote:
The students don’t like that tea-
cher
, because she always assigns
a lot of homework.
Since we don’t have space between words, there’s no need for hyphens to divide words at the end of lines either.

K2pdfopt seems to treat everything between two punctuation marks as a long word since there's no noticeable space between two Chinese characters.

Chinese may be the only language that doesn’t have space between words. (In fact, ancient Chinese books don’t even have punctuation marks, so children would first learn how to divide sentences. )


Is it possible that k2pdfopt adopt a different approach when it comes to Chinese (add an option of “source text is in Chinese” to the interactive menu, perhaps) ? If the option is ticked, the software find characters instead of words.


Even if the issue with the images attached is irrelevant to Chinese, I would still recommend an improved mode to handle Chinese documents. Thank you.

Last edited by Steven630; 12-16-2017 at 05:26 AM. Reason: add details and a PDF file
Steven630 is offline   Reply With Quote
Old 12-16-2017, 11:28 PM   #1494
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by Steven630 View Post
Thank you for your great tool. This is the best software I've ever used for PDF optimization.

While converting a Chinese document, I have found some unnecessary line breaks. I use the v2.42 on Windows and "smart line breaks" is unchecked for this conversion (when checked, more unnecessary line breaks appear).

Command line: -dev kp2 -fs 16.5 -col 1 -ws -0
Additional Options: -bp[-] -om 0.2 -y
...
1. The unnecessary line break you point out has nothing to do with the spacing between Chinese characters. It occurs because the one line, due to the punctuation mark, is not fully justified (see attachment, green circle). Because the rest of the text is fully justified, k2pdfopt thinks that line, because not fully justified, has ended a paragraph. I could consider having an adjustable threshold setting for this, but for now this determination cannot be adjusted.

2. The -ws option, as you have discovered, can be tuned to a small value to allow k2pdfopt to re-flow between the characters (really should be -ws 0, not -ws -0 -- I'll have to fix that). As you can see in the attachment, k2pdfopt is correctly separating all of the Chinese characters with this setting.

3. The additional option -bp[-] that you put doesn't do anything. The brackets are used to indicate optional suffixes, e.g. when the command-line usage says -bp[+|-|--], it means you can put one of these four options: -bp, -bp+, -bp-, or -bp--.
Attached Thumbnails
Click image for larger version

Name:	screenshot.png
Views:	269
Size:	95.2 KB
ID:	160708  

Last edited by willus; 12-16-2017 at 11:30 PM.
willus is offline   Reply With Quote
Old 12-17-2017, 02:20 AM   #1495
Steven630
Groupie
Steven630 began at the beginning.
 
Posts: 154
Karma: 10
Join Date: May 2012
Device: Kindle Paperwhite2
Right aligned text & KCC

Quote:
Originally Posted by willus View Post
1. The unnecessary line break you point out has nothing to do with the spacing between Chinese characters. It occurs because the one line, due to the punctuation mark, is not fully justified (see attachment, green circle). Because the rest of the text is fully justified, k2pdfopt thinks that line, because not fully justified, has ended a paragraph. I could consider having an adjustable threshold setting for this, but for now this determination cannot be adjusted.

2. The -ws option, as you have discovered, can be tuned to a small value to allow k2pdfopt to re-flow between the characters (really should be -ws 0, not -ws -0 -- I'll have to fix that). As you can see in the attachment, k2pdfopt is correctly separating all of the Chinese characters with this setting.

3. The additional option -bp[-] that you put doesn't do anything. The brackets are used to indicate optional suffixes, e.g. when the command-line usage says -bp[+|-|--], it means you can put one of these four options: -bp, -bp+, -bp-, or -bp--.
1. Thank you for the explanation. I found unnecessary line breaks throughout the book. In Chinese, a new paragraph almost always begins with an indentation of two characters. This might be a better indicator for a new paragraph.

When a line ends with a punctuation mark, especially with a Chinese parenthesis, it is impossible to have full justification. Note the difference between English brackets () and Chinese ones (). A ")" always has blank to its right (see image, underlined blue parts).

Click image for larger version

Name:	before1.png
Views:	251
Size:	94.4 KB
ID:	160710 Click image for larger version

Name:	before2.png
Views:	233
Size:	121.4 KB
ID:	160711
Click image for larger version

Name:	after1.png
Views:	216
Size:	132.1 KB
ID:	160712 Click image for larger version

Name:	after2.png
Views:	255
Size:	140.7 KB
ID:	160713

2. So I just have to set ws to 0 when converting Chinese files, right?

3. Thank you for pointing out my mistake.

There is another odd behaviour (see image, two underlined red parts). Besides unnecessary line, the text is right aligned.

Here is the original two pages of PDF before 1 and 2.pdf. (I have not underlined all unnecessary line breaks.)

I have found a software, Kindle Comic Converter, that converts images to manga (comic) mobi files. I first use k2pdfopt to convert and reflow PDF to png images and then use KCC to convert them into mobi. Unlike PDF files, manga mobi files don't refresh every page on Kindle, as if it were a text-based mobi file. Page-refreshing is the reason why I avoid reading PDFs on my Kindle even if the PDF files have been well-converted. With the help of your tool and KCC, PDF reading experience on Kindle is maximized. I recommend Kindle users to use both k2pdfopt (for image optimization and text reflow) and KCC (for Kindle-friendly mobi files).

The only problem is that since KCC is not developed for scanned text files, it uses an aggressive auto-cropping mode, which cuts all margins produced by k2pdfopt.

Thank you again for your time!

Update

Just found Amazon's official tool—Kindle Comic Creator. I'll try it out and report the result.

Update again

Here is the result. Kindle Comic Creator, Amazon's official software is amazing. It even supports PDF as source files. Just follow the steps and fill in the title and author, select page turning mode, panel view (for scanned PDF, no need to enable panel view) etc. And the output is half the size of that of the unofficial Kindle Comic Converter.

So k2pdfopt is best used with Kindle Comic Creator.

Last edited by Steven630; 12-17-2017 at 09:20 AM. Reason: Recommend Amazon's official software—Kindle Comic Creator
Steven630 is offline   Reply With Quote
Old 12-27-2017, 07:31 PM   #1496
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I never got cpdf working. Is it possible to merge pdfs w/o reprocessing in k2pdfopt?
MarjaE is offline   Reply With Quote
Old 12-30-2017, 08:37 PM   #1497
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
I asked around elsewhere, and the answer was that seg fault 11 implied some level of incompatibility between k2pdfopt and MacOS:

https://discussions.apple.com/thread/8217579

I don't get it all the time, but with some big files, I would get it several times in a row, without running anything else except my scrolling software and the finder.

If I could merge pdf files w/o breaking formatting, I could run k2-- sometimes with ocr-- on smaller sections and then merge them together.
MarjaE is offline   Reply With Quote
Old 01-11-2018, 02:20 AM   #1498
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
In some cases, I get a seg fault from trying to k2 -ocr a single page.

P.S. I get bad results from -ocr on other pages. That's with the gocr. I don't know what to install to get k2 to use Tesseract.

Last edited by MarjaE; 01-11-2018 at 02:40 AM.
MarjaE is offline   Reply With Quote
Old 01-11-2018, 07:59 PM   #1499
MarjaE
Guru
MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.MarjaE ought to be getting tired of karma fortunes by now.
 
Posts: 924
Karma: 53902736
Join Date: Jun 2015
Device: multiple
This repeatedly fails on page 270: http://www.colindarch.info/docs/1994...novshchina.pdf

I'm using -mode copy -dev dx -ocr with k2 defaulting to GOCR.

The error message refers to "line 2: 1674 Segmentation fault: 11".
MarjaE is offline   Reply With Quote
Old 01-11-2018, 10:04 PM   #1500
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,272
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
Quote:
Originally Posted by MarjaE View Post
This repeatedly fails on page 270: http://www.colindarch.info/docs/1994...novshchina.pdf

I'm using -mode copy -dev dx -ocr with k2 defaulting to GOCR.

The error message refers to "line 2: 1674 Segmentation fault: 11".
I was able to reproduce this in Windows. Thank you. Hopefully I can fix it for the next release, but it may be an issue with the GOCR library. Tesseract worked fine. You may want to try and figure out how to install Tesseract--it's a much better OCR engine than GOCR.
willus is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 09:10 PM.


MobileRead.com is a privately owned, operated and funded community.