08-03-2013, 07:16 AM | #496 |
Junior Member
Posts: 3
Karma: 6000
Join Date: Aug 2013
Device: Sony PRS-T2
|
Problems with figures!
Hello willus,
first of all I want t thank you a lot for the great work you have put down, this program is just awesome. Now, the problems: i have a Sony PRS-T2 reader (nearly same specs of Kindle 2) and i want to convert some PC-pdf's to reader pdf's. As you can see in the attachment, when it comes to figures, i obtain that orrible result (figures that should be much smaller and part of a page, not occupy an entire page and be splitted in more consecutive pages). The formulas and the text of the book look very good, it's only a problem of some figures! Can you help me? Thanks in advance! P.S. If u want, i can send to you the original djvu that i have to convert P.P.S Should i use some particular options because i have not a kindle but a sony? |
08-03-2013, 04:14 PM | #497 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
Advert | |
|
08-04-2013, 04:20 AM | #498 | |
Junior Member
Posts: 3
Karma: 6000
Join Date: Aug 2013
Device: Sony PRS-T2
|
Quote:
Yeah, PRS-T2 is nearly the same as Kindle 2 (6 inches diagonal, 600x800, or little less for sony, like 580x790) so i think that those options should be ok! Just for my personal information: what is the problem with the pdf output? Why those figures? If u obtain a good result, could you tell me the options that you have used during conversion, so that i can experiment from these base options for future conversions? Last edited by Pagliuz; 08-04-2013 at 05:07 AM. |
|
08-04-2013, 05:30 PM | #499 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
Option 1. Because the text is only 4.7 inches wide if you strip away the margins, it reads pretty well without any re-flow if you use the standard "fit width" mode (with -n- to turn off native PDF output since your source is a DJVU file): k2pdfopt -mode fw -n- myfile.djvu This is sure not to mess up figures or alignment, so you get the best looking output, but it may be that the text is too small for you to read. If that is the case, then: Option 2. Try to make sure k2pdfopt only re-flows the text and not the figures or equations: k2pdfopt -col 1 -whmax 0.2 myfile.djvu The "-col 1" will disable detection of multiple columns, and the -whmax 0.2 is an undocumented option which tells k2pdfopt not to re-flow any image taller than 0.2 inches (i.e. anything that's taller than a standard line of text). This will do a better job of keeping some of the figures from being interpreted as lines of text which get re-flowed. It's not perfect, but it seems to be better than the default conversion you got. Another option is to add -mt 0.75, which will chop off the headers on each page so that the text is more continuous in the converted file, but this makes it harder to reference the original page numbers in the converted file. |
|
08-04-2013, 06:20 PM | #500 | |
Junior Member
Posts: 3
Karma: 6000
Join Date: Aug 2013
Device: Sony PRS-T2
|
Quote:
You saved my day! Keep on the fantastic work you are doing man! |
|
Advert | |
|
08-07-2013, 01:27 PM | #501 |
Connoisseur
Posts: 71
Karma: 18500
Join Date: Apr 2013
Device: Kindle Touch, Paperwhite
|
Sorry to bother you again Wilus but maybe you remeber my question about multilanguage support. Yesterday I was perusing through Tesserract google group without any speciffic reason and suddenly stumbled accross this post
https://groups.google.com/forum/#!ms...I/QMMHDV_GWRIJ Don't know if this is of any help to you but just in case.. |
08-07-2013, 10:52 PM | #502 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
08-17-2013, 08:25 PM | #503 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Dual language OCR example with k2pdfopt and Tesseract
Quote:
k2pdfopt -ocr dual_english_chinese.pdf -mode copy -ocrlang language where I substituted different values for language: eng, chi_tra, chi_tra+eng, and eng+chi_tra. See the attached files. The best results, by far, were using only chi_tra alone, which sort of defeats the purpose of dual language OCR(!), but each result was different, so I am assuming that the actual mechanism of passing lang1+lang2 to Tesseract is working and that this was just a particularly poor case for Tesseract. Maybe mixed European languages will work better? |
|
08-18-2013, 01:24 AM | #504 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
08-21-2013, 01:19 AM | #505 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
08-25-2013, 03:13 PM | #506 |
Junior Member
Posts: 1
Karma: 6000
Join Date: Aug 2013
Device: kindle touch
|
Hi there,
I am very new to ereaders in general, and I am also very new to k2pdfopt. To make matters worse, I am not so savvy with computing. However, I did attempt to set up Tesseract and the environment variable, but I still get the error as shown in the screenshot. Any ideas? Do I have to set another environment variable for kdpdfopt itself? Also, is there a kdpdfopt guide for dummies? I appreciate the help sections on the site, but it is still a bit too fast for me. I will be utilising the programme exclusive for creating pdfs from linguistics pdfs (typically two column, with diagrams and charts, classic science articles). Thank you! |
08-26-2013, 12:37 AM | #507 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
Are your linguistics PDFs mostly scanned or not? If they aren't scanned (if they are generated directly from a source file with the original text), you should be able to use "-mode 2col" and skip OCR altogether, e.g. k2pdfopt -mode 2col myfile.pdf Otherwise, OCR is probably the way to go. Sorry, there's no "for dummies" guide at the moment. All I've got is my help pages, but again, the Windows GUI may make things easier for you. You might also want to watch the video on the Native PDF page. Edit: I've attached a screenshot of my Tesseract data folder (on my D drive). To OCR English text, you need the files shown, which have to be extracted from the downloaded training file (ends in .tar.gz). Last edited by willus; 08-28-2013 at 08:31 AM. Reason: Typo corrected |
|
08-27-2013, 11:57 PM | #508 |
Member
Posts: 11
Karma: 18680
Join Date: Aug 2013
Device: none
|
Why some times i have very small text line (1 or more)?
Last edited by WT Sharpe; 08-28-2013 at 12:53 AM. Reason: Hyperlink removed. |
08-28-2013, 08:27 AM | #509 |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
One small text line between two large ones is a bit unusual (at least I haven't seen it much), and it looks like you have plenty of space between the words. I would need to see your source document and any specific options you used for the conversion. It looks like you had a hyperlink removed--doesn't say why. Maybe it is copyrighted? Can you please PM (private message) it to me (or just the troublesome page)?
One option you might try is to reduce the required gap between words that enables breaking lines. This is specified by -ws, which defaults to 0.375. Maybe try -ws 0.3 or -ws 0.25. |
08-28-2013, 09:32 AM | #510 |
Member
Posts: 11
Karma: 18680
Join Date: Aug 2013
Device: none
|
[Image exceeds guidelines - MODERATOR]
some times i have block of text with small size nice, lool like worked with -ws 0.15 Last edited by Dr. Drib; 01-15-2014 at 11:25 AM. |
Tags |
ebook apps, k5 tools, kindle tools, kindle touch, tools |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Viewing PDFs with another font | Font | PocketBook | 4 | 11-12-2010 08:27 AM |
Viewing Textbook PDFs... | NJReader | enTourage Archive | 4 | 08-17-2010 05:17 PM |
PRS-600 Restart bug while viewing PDFs? | conundrum | Sony Reader | 2 | 03-04-2010 08:46 PM |
More on viewing pdfs | dso371 | Bookeen | 8 | 03-11-2008 07:15 PM |
Viewing Untagged PDFs on Palm T|X | Eroica | Reading and Management | 3 | 12-10-2007 01:44 PM |