08-19-2014, 06:30 PM | #886 |
Junior Member
Posts: 1
Karma: 12694
Join Date: Aug 2014
Device: kindle keyboard
|
Problem with text in notes
Hi all,
I used this tool to convert a PDF and use it in my Kindle keyboard. Everything seemed fine, but the problem is that when I highlight text it is captured with all the words together in the "My clippings" file with no whitespaces. For example: "Thisisasentencecomplete" instead of "This is a sentence complete". It is strange because if I copy and paste from Acrobat in PC the text does not get intermingled. Questions: -Is this normal when using k2optpdf? Can this problem be avoided? I executed the tool with the default options but I did other tests changing them with no luck. -Can I do something to recover my notes? I have a lot of them and I would like to avoid having to separate manually all the words... -I noticed that if I generate a new PDF file with Acrobat using the OCR then the notes are captured correctly in the Kindle. However, I did not find a way to recover my previous notes with the right word separation. -I tried also with kinde-annotations tool with no luck. Thank you for any advice |
08-20-2014, 08:48 AM | #887 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
Advert | |
|
08-20-2014, 08:53 AM | #888 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
08-21-2014, 02:58 PM | #889 |
Junior Member
Posts: 1
Karma: 10
Join Date: Aug 2014
Device: Kindle keyboard
|
For example, you can try with the free book:
http://statweb.stanford.edu/~tibs/El...earn/main.html http://statweb.stanford.edu/~tibs/El...II_print10.pdf It seems to happen with many other similar examples. I understand that these are native PDF, no scan. I tried the options provided. They change the notes captured and sometimes some whitespaces are preserved but the contents of My Clippings.txt is bad in the three cases (no options, -ocrsp, -ocrsp+). Thanks. |
08-22-2014, 12:20 AM | #890 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
Advert | |
|
09-01-2014, 06:49 AM | #891 |
Enthusiast
Posts: 33
Karma: 12694
Join Date: Aug 2014
Device: kindle paperwhite
|
Fantastic. I only wish I Calibre could further convert it to kindle formats without distortion.
|
09-02-2014, 08:02 PM | #892 |
Junior Member
Posts: 2
Karma: 12694
Join Date: Sep 2014
Device: Nook SImple Touch
|
Hey, I have pdf which is mostly text with some images. When I use k2pdfopt to convert it for my e reader, it crops based on white space as it says, but it splits up some images and doesn't preserve their orientation to each other. for example i have a table like:
Code:
x y ___ ___ 1 5 2 10 3 15 4 20 Last edited by krogank9; 09-02-2014 at 08:07 PM. |
09-02-2014, 10:36 PM | #893 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
09-02-2014, 11:49 PM | #894 | |
Junior Member
Posts: 2
Karma: 12694
Join Date: Sep 2014
Device: Nook SImple Touch
|
Quote:
sure. i dont mind posting the pdf. here it is: https://384ceda309d543db8121e87d97c2...g/Unit%201.pdf also, it originally came in .odt open office format, i converted it to pdf from that: https://googledrive.com/host/0BwfCKx...1oazg/Unit.odt the other option i was thinking but have no idea how to do is to put a black border around all the images in my document to try to make sure it doesnt autocrop as whitespace. not sure how that would work though... but yeah anyway it totally messes up the formatting of the tables & images and makes it unusable Last edited by krogank9; 09-02-2014 at 11:51 PM. |
|
09-03-2014, 08:47 AM | #895 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
k2pdfopt -wt 254 -c -gtr .002 -col 1 unit1.pdf The high white threshold will help see the light pink and light gray and light blue colors in the figures as "black" rather than "white" (i.e. as content)--and will thus make it harder for k2pdfopt to split them. I've also set the -gtr value lower than normal to try and prevent splitting figures, but I don't think it did much. And I used -col 1 since it's not a 2-column document. But again, I'd definitely try working with the .odt file directly. I'll try it later myself if I get a chance. |
|
09-03-2014, 11:55 PM | #896 |
Junior Member
Posts: 2
Karma: 12694
Join Date: Sep 2014
Device: Kindle Paperwhite
|
Can someone help me set up Tesseract in k2pdfopt
I'm not used to coding at all, so have run into a dead end on this. I have k2pdfopt installed on my macbook pro using OS10.9.4 and that is working well. I have been able to use it successfully to optimize files to read on my new kindle paperwhite. My problem is that I have a lot of PDFs from photocopies and want to set up tesseract to convert them. I downloaded and installed the Tesseract files and also RCenvironment to set the environment variables.
I think I just don't know enough about this to figure out the correct value to go with the variable: TESSDATA_PREFIX I thought I needed to put in a path map like: /Library/Java/tesseract-ocr/ but obviously this isn't working. When I try it, k2pdfopt says: Could not find Tesseract data (env var TESSDATA_PREFIX = (not assigned)). Using GOCR v0.49. Can someone help me figure out what I need to do? I apologize if this has been answered in earlier posts, as I just have not had the time to comb through everything to problem solve this. Thank you |
09-05-2014, 08:29 AM | #897 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Quote:
|
|
09-05-2014, 09:24 AM | #898 | |
Banned
Posts: 488
Karma: 1080260
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
|
Quote:
Here is attached reflowed file in landscape mode Max. Columns 1, with table on page 2 left intact i.e. x and y columns left next to each other on the same page (using 2 as a value for Max. Columns would split the columns on separate pages.) You can also use another e-ink or tablet(color recognition) simultaneously, just for zooming-in on the pictures and tables in the original file and checking an original layout, while reading reflowed text on another. e.g. I can use Sony Prs T1 for scribbling, higlighting, dictionary, annotation(by keyboard or scribling) etc. while simultaneously reading the same pdf enlarged on Kindle DX that lacks those same functions for pdfs. Better idea for such letter sized and A4 pdfs with very small letters is using 10" e-ink though e.g. I can even read it in portraite mode on Kindle DX (140 mm screen width) zoomed-in without margins. In landscape mode zoomed without margins (202 mm) I can get magnification compared to 185 mm text width of that letter sized pdf on paper. Also, if you have a lot of such letter sized pdf files and want small e-ink then you can better try some 6.8" e-ink like Kobo Aura HD that has got around 140 mm screen width in landscape i.e. about 20 mm more than 120 mm of 6" e-ink. If you have Abbyy Finereader you can transform every recognized table into a picture by selecting it and choosing Change-Area-Type. In Adobe Acrobat you can select a picture and choose Edit in Paint and then draw rectangle around a picture. Last edited by markom; 09-06-2014 at 07:57 AM. |
|
09-05-2014, 04:42 PM | #899 | |
Junior Member
Posts: 2
Karma: 12694
Join Date: Sep 2014
Device: Kindle Paperwhite
|
Help using Tesseract with k2pdfopt in os10.9.4
Quote:
I'd be happy to do this through the terminal shell if I knew how. I found out about RCenvironment through an earlier posting and thought that I had enough step by step info to be successful. I just don't have enough background (read: any at all) in code to be able to even figure out where I need to look to figure out the steps I need to take to at this point. I really would love to use OCR with k2pdfopt (which otherwise has been really helpful) to better use my Kindle for coursework. Thanks for any feedback. I really appreciate it! |
|
09-06-2014, 10:22 PM | #900 | |
Fuzzball, the purple cat
Posts: 1,273
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
|
Setting the TESSDATA_PREFIX environment variable in Mac OSX
Quote:
Interestingly, if you delete an environment variable with RCEnvironment, you have to completely re-start to get rid of it for good (you can't just logout and login again). What this all does is to save the environment variable info to a file in the .MacOSX folder underneath your home folder. There is another file, /etc/launchd.conf, which I believe stores system-wide environment variables. So you can also store the TESSDATA_PREFIX environment variable in /etc/launchd.conf by adding a line to it like so: setenv TESSDATA_PREFIX /usr/local/Cellar/tesseract/3.02.02/share/ You'll need sudo privileges to edit /etc/launchd.conf, but hopefully RCEnvironment will do the trick. Let me know if you get it working. Last edited by willus; 09-06-2014 at 10:36 PM. |
|
Tags |
ebook apps, k5 tools, kindle tools, kindle touch, tools |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Viewing PDFs with another font | Font | PocketBook | 4 | 11-12-2010 08:27 AM |
Viewing Textbook PDFs... | NJReader | enTourage Archive | 4 | 08-17-2010 05:17 PM |
PRS-600 Restart bug while viewing PDFs? | conundrum | Sony Reader | 2 | 03-04-2010 08:46 PM |
More on viewing pdfs | dso371 | Bookeen | 8 | 03-11-2008 07:15 PM |
Viewing Untagged PDFs on Palm T|X | Eroica | Reading and Management | 3 | 12-10-2007 01:44 PM |