Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > PDF

Notices

Reply
 
Thread Tools Search this Thread
Old 06-09-2013, 06:04 PM   #436
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 559
Karma: 2526455
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by curiouscat View Post
... Also with more than two options configured the programme crashes...
@curiouscat -- Let's try and tackle one thing at a time--maybe this bit about "more than two options." Can you maybe send a screen shot of how the screen looks right before you start the conversion and how it looks as the conversion starts? And also post the PDF file that you are trying to convert? These things would be very helpful.
willus is offline   Reply With Quote
Old 06-10-2013, 06:29 AM   #437
curiouscat
Member
curiouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to behold
 
Posts: 21
Karma: 11998
Join Date: Jun 2013
Device: Nook Simple Touch Glowlight
I deleted the environment I created. Ran the programme and it worked. Used the other option 'g' and on a whim typed -opdi and this worked for the font, (still can't see a command on the menu though.)

Can you walk me through how to set up the environment properly so I can take advantage of the more accurate ocr please?

Thanks.

---Screenshot and file attached.
Attached Thumbnails
Click image for larger version

Name:	screenshot.jpg
Views:	51
Size:	119.5 KB
ID:	106844  
Attached Files
File Type: pdf ebook_k309_block3_e2i1_n9781848734913_l3.pdf (884.5 KB, 83 views)
curiouscat is offline   Reply With Quote
Old 06-10-2013, 08:55 AM   #438
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 559
Karma: 2526455
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by curiouscat View Post
Used the other option 'g' and on a whim typed -opdi and this worked for the font, (still can't see a command on the menu though.)
What do you mean by "the other option 'g'"? When I type 'g' at the "Enter option above" prompt from the user menu, I get:

** Unrecognized option: g. **


Quote:
Originally Posted by curiouscat View Post
Can you walk me through how to set up the environment properly so I can take advantage of the more accurate ocr please?
Regarding OCR with Tesseract, were you able to download and extract the training file? If so, can you send a screenshot of the folder that the files are in, or the name of the complete path? Did you set the TESSDATA_PREFIX environment variable?
willus is offline   Reply With Quote
Old 06-10-2013, 09:31 AM   #439
JensW
Enthusiast
JensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trolls
 
Posts: 28
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
Hi,
I'm currently working on a complete rewrite of my Windows GUI for k2pdfopt.
In the new version you can edit the behaviour of every command and also add new commands and restructure pretty much everything.
This is currently done by editing an ini-file but I guess I will also make a profile editor in the near future.

I have included two work-in-progress screenshots in the attachments, or you can check them out on my website

http://www.students.uni-marburg.de/~...pdfoptgui.html


Best regards
Jens
Attached Thumbnails
Click image for larger version

Name:	guiprev1.png
Views:	41
Size:	21.9 KB
ID:	106856   Click image for larger version

Name:	guiprev2.png
Views:	48
Size:	25.2 KB
ID:	106857  
JensW is offline   Reply With Quote
Old 06-10-2013, 10:21 AM   #440
markom
Addict
markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.
 
Posts: 346
Karma: 420000
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
Quote:
Originally Posted by curiouscat View Post
hi, I have a Sony PRS-t2. I like the layout the k2pdfopt gives me by default but can't work out how do increase the font size of the output file. Also the option to highlight my pdf seems to disappear after conversion.

Help appreciated. Thanks
In the meantime while you are learning how to use k2pdfopt, for similar files where there is no need for pdf reflow you can get away without k2pdfopt quickly and easily by using Pdfscissors(or Briss) for cropping and then printing such cropped pdf to file in Adobe Reader 11 (using poster mode(tile mode in Adobe Acrobat) for 120x90 mm).

http://www.pdfscissors.com/
http://www.adobe.com/hr/products/reader.html

It just takes about five minutes for both cropping and printing-to-file and there is no need for additional OCR because it stays there.

Here is your pdf cropped close to the main text proper and other one cropped not so close to the text.

http://speedy.sh/cxDfs/ebook-k309.rar

Choose landscape mode (orientation) in Sony PRS.
In this way we can also use handwriting on every page without leaving for normal view, as is the case when using fit-to-landscape-width or any other zooming there in Sony PRS.
We can also just use buttons instead of pen/fingers for navigation.



For Kindle readers there is really no need to crop and print pdf's beforehand in this way because we can install KPV kindlepdfviewer(KOReader) thereon and use two-point-cropping in landscape mode or use its reflow capability for A4 format (reflow function in kindlepdfviewer is based on k2pdfopt).

http://www.mobileread.com/forums/sho....php?p=2466450

Last edited by markom; 06-10-2013 at 12:03 PM.
markom is offline   Reply With Quote
Old 06-10-2013, 10:27 AM   #441
curiouscat
Member
curiouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to behold
 
Posts: 21
Karma: 11998
Join Date: Jun 2013
Device: Nook Simple Touch Glowlight
Quote:
Originally Posted by willus View Post
What do you mean by "the other option 'g'"? When I type 'g' at the "Enter option above" prompt from the user menu, I get:

** Unrecognized option: g. **


See ocr screenshot

Regarding OCR with Tesseract, were you able to download and extract the training file? If so, can you send a screenshot of the folder that the files are in, or the name of the complete path? Did you set the TESSDATA_PREFIX environment variable?
See Tesseract screenshot. I entered what you did in the example but got confused. When you mention editing the variable it goes from 'user' to 'system' on your screenshots so no clue what to do.

Screenshot of language file also included.
Attached Thumbnails
Click image for larger version

Name:	ocr.jpg
Views:	54
Size:	122.9 KB
ID:	106860   Click image for larger version

Name:	environment variables.jpg
Views:	55
Size:	91.9 KB
ID:	106861   Click image for larger version

Name:	tesseract.jpg
Views:	53
Size:	42.6 KB
ID:	106862  
curiouscat is offline   Reply With Quote
Old 06-10-2013, 10:30 AM   #442
curiouscat
Member
curiouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to beholdcuriouscat is a marvel to behold
 
Posts: 21
Karma: 11998
Join Date: Jun 2013
Device: Nook Simple Touch Glowlight
Markom, thanks.
curiouscat is offline   Reply With Quote
Old 06-10-2013, 12:40 PM   #443
JensW
Enthusiast
JensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trolls
 
Posts: 28
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
Quote:
Originally Posted by curiouscat View Post
See Tesseract screenshot. I entered what you did in the example but got confused. When you mention editing the variable it goes from 'user' to 'system' on your screenshots so no clue what to do.

Screenshot of language file also included.
If you chose "G" as the OCR-method it will use GOCR no matter what and it doesn't matter if you have the tessaract files or not.

without wanting to promote myself: you could try my windows GUI (see my post above). Just start the program, enter the path to your k2pdfopt.exe, go to the OCR-Tab, check "Enable OCR", select "Tessaract" from the Dropdown menu, click on the "Get language file button" to download the correct file for your language. extract this file to a directory of your choice and enter this directory into the "Tessaract language file directory"-textbox (make sure to check this option too).

then you just need to drag/drop your pdf files to the files list and click start. it will set the correct environment variable itself (needs administrator rights) and convert your files.

best regards
Jens



PS: Just asking around: In the past I have deliberatly not included any online functionality into my GUI as to not scare anyone. The thing is, it could really help a ton. I could update commands pretty much on the fly, could download tessaract files automatically, extract them and set the correct folder and the like.
Would you mind the program connecting to the net and would want it to stay "offline" or would you appreciate such functions?

Last edited by JensW; 06-10-2013 at 12:44 PM.
JensW is offline   Reply With Quote
Old 06-10-2013, 09:57 PM   #444
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 559
Karma: 2526455
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by JensW View Post
If you chose "G" as the OCR-method it will use GOCR no matter what and it doesn't matter if you have the tessaract files or not.

without wanting to promote myself: you could try my windows GUI (see my post above). ...
What he said.

Quote:
Originally Posted by JensW View Post
PS: Just asking around: In the past I have deliberatly not included any online functionality into my GUI as to not scare anyone. The thing is, it could really help a ton. I could update commands pretty much on the fly, could download tessaract files automatically, extract them and set the correct folder and the like.
Would you mind the program connecting to the net and would want it to stay "offline" or would you appreciate such functions?
You could always make these explicit options, e.g. user option to go online and get the Tesseract file and/or search for updates. I think that would be my vote.

This is really excellent work, Jens. One thing to consider--what about some kind of preview option to show a specific marked or converted page on the fly, as controls are adjusted? You'd need me to drop out a bitmap for you, I presume, with some kind of extra command-line option (e.g. -bpp <x> -bppf <filename>, where bpp = create a bitmap preview page and <x> is the output page number), which wouldn't be difficult.
willus is offline   Reply With Quote
Old 06-10-2013, 10:11 PM   #445
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 559
Karma: 2526455
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by markom View Post
In the meantime while you are learning how to use k2pdfopt, for similar files where there is no need for pdf reflow you can get away without k2pdfopt quickly and easily by using Pdfscissors(or Briss) for cropping and then printing such cropped pdf to file in Adobe Reader 11 (using poster mode(tile mode in Adobe Acrobat) for 120x90 mm).
This can also be done nicely with a single k2pdfopt command using the fitwidth option and judiciously cropping margins:

k2pdfopt -mode fitwidth -mt 0.9 -ml 2.42 ebook.pdf

The -mt 0.9 and -ml 2.42 options crop off the top 0.9 inches and left 2.42 inches from each source page (which drops off all the pink writing in the left margin). See the attached output (for source pages 20-30 only). You can add -ls- to the command-line options to turn the pages to portrait mode (-ls- = turn landscape off since it's turned on by -mode fitwidth).

The resulting text will be searchable and highlightable without having to resort to OCR since -mode fitwidth defaults to native PDF output, but text re-flow will not be possible. k2pdfopt also uses a little more intelligence about where to break the text for placement on each output page compared to the cropping algorithm demonstrated by Markom's screenshot, which cuts directly through lines of text (I was not able to download the .rar file--why not just post it as an attachment, Markom?).
Attached Files
File Type: pdf ebook_k2opt.pdf (193.5 KB, 42 views)
willus is offline   Reply With Quote
Old 06-11-2013, 02:19 AM   #446
JensW
Enthusiast
JensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trollsJensW doesn't feed trolls
 
Posts: 28
Karma: 81500
Join Date: Apr 2013
Device: Kindle 4
Quote:
Originally Posted by willus View Post
This is really excellent work, Jens. One thing to consider--what about some kind of preview option to show a specific marked or converted page on the fly, as controls are adjusted? You'd need me to drop out a bitmap for you, I presume, with some kind of extra command-line option (e.g. -bpp <x> -bppf <filename>, where bpp = create a bitmap preview page and <x> is the output page number), which wouldn't be difficult.
This sounds like a really good function. I would need some kind of dirty workaround for users with an older version of k2pdfopt but I think this would work.
In this wake: Is it possible to compile your files with file details, especially version number? This could help a lot to check if certain features are available.
JensW is offline   Reply With Quote
Old 06-11-2013, 03:14 AM   #447
markom
Addict
markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.markom ought to be getting tired of karma fortunes by now.
 
Posts: 346
Karma: 420000
Join Date: Sep 2012
Device: sony prs t1 kindle dx ipad
Quote:
Originally Posted by willus View Post
...(I was not able to download the .rar file--why not just post it as an attachment, Markom?).
The link on my rar file is inconspicuous " ebook k309.rar" on the top of the speedy page next to the yellow tag 1.48 MB. Here I posted it as attachment again.

Yep, file cropped by k2pdfopt will not have problems with overlaps of the last line.

In Adobe virtual printer or any other there we can choose some overlap value e.g. default 0.001 inch and as result get some of the bottom sentences cut in half or if overlap value is bigger e.g. 0.05 inch, usually get bottom sentences repeated on the other page.
Attached Files
File Type: rar ebook_k309.rar (1.48 MB, 31 views)

Last edited by markom; 06-11-2013 at 03:58 AM.
markom is offline   Reply With Quote
Old 06-11-2013, 03:30 PM   #448
Eithrial
Member
Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'Eithrial knows the difference between 'who' and 'whom'
 
Posts: 17
Karma: 10000
Join Date: Jan 2013
Device: Kindle PW
Is there option to use k2pdfopt just to ocr and autostraigthen, leaving original pdf layout untouched?
Eithrial is offline   Reply With Quote
Old 06-11-2013, 07:42 PM   #449
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 559
Karma: 2526455
Join Date: Jun 2011
Location: California
Device: Kindle 2, iPad
Quote:
Originally Posted by Eithrial View Post
Is there option to use k2pdfopt just to ocr and autostraigthen, leaving original pdf layout untouched?
Yes. Try the following:

k2pdfopt -mode copy -ocr t -as myfile.pdf

-mode copy Sets a number of options so that k2pdfopt copies the source document size and contents.
-ocr t Turns on OCR with Tesseract
-as Turns on autostraighten

If the output resolution isn't satisfactory, you can use -dr to increase it, e.g. -dr 2 will double it. More detail on the options is on my command-line option help page.
willus is offline   Reply With Quote
Old 06-12-2013, 10:46 PM   #450
Lili819
Member
Lili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterLili819 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 17
Karma: 12832
Join Date: Jun 2013
Device: MPB, iOS, Kindle PW
Hi willus, thanks so much for this AWESOME tool!

I love it and am trying to get the pdf to the most comfortable reading view (whereas pre-discovering your program I was simply trying to make the pdf useful).

I'm having trouble changing the size of the body text. I want to keep the pdf in native mode

I tried:
-n -odpi 400 (only the headers changed size)
-mode fw -ls- -odpi 400 (nothing)

Is there something else I should be doing instead?
Thank you!
Lili819 is offline   Reply With Quote
Reply

Tags
ebook apps, k5 tools, kindle tools, kindle touch, tools

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Viewing PDFs with another font Font PocketBook 4 11-12-2010 08:27 AM
Viewing Textbook PDFs... NJReader enTourage Archive 4 08-17-2010 05:17 PM
PRS-600 Restart bug while viewing PDFs? conundrum Sony Reader 2 03-04-2010 08:46 PM
More on viewing pdfs dso371 Bookeen 8 03-11-2008 07:15 PM
Viewing Untagged PDFs on Palm T|X Eroica Reading and Management 3 12-10-2007 01:44 PM


All times are GMT -4. The time now is 10:57 AM.


MobileRead.com is a privately owned, operated and funded community.