Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 05-26-2010, 03:35 AM   #1
Mike_73
Guru
Mike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheese
 
Posts: 736
Karma: 1123
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi
Advise for scanned pdf

I have scanned pdfs that have very light fonts. Zooming into the page increases the documents to a readable level, but text is still very grayish. I could extract the single pages and then increase contrast for every page and put it back together as a pdf. Unfortunately, the auto-contrast-adjustment takes the pages average contrast values which can differ from page to page depending on graphics and charts on a page, so the outcome isn't really consistent.

Is there a program which can increase or somewhat re-render text in a scanned pdf without having to deal with single pages?
Mike_73 is offline   Reply With Quote
Old 05-26-2010, 08:18 AM   #2
kartu
PRS+ author
kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.kartu ought to be getting tired of karma fortunes by now.
 
Posts: 1,637
Karma: 2446233
Join Date: Dec 2007
Device: Sony PRS-300, 505, 600, 650, 950
Did you try this?
http://www.mobileread.com/forums/showthread.php?t=13135
kartu is offline   Reply With Quote
Old 05-26-2010, 10:10 AM   #3
eksor
Connoisseur
eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.
 
eksor's Avatar
 
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
scantailor when color/grayscale output is selected

Regards
eksor is offline   Reply With Quote
Old 05-26-2010, 11:45 AM   #4
jswinden
Astrophotographer
jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.jswinden ought to be getting tired of karma fortunes by now.
 
jswinden's Avatar
 
Posts: 5,374
Karma: 6830000
Join Date: Sep 2006
Location: USA
Device: iPad Mini 2, iPhone 5, Nexus 7.2
I've yet to view a scanned PDF that was very readable on a reader. Unless you can use OCR software, most of which does a lousy job of converting scanned images to reflowable text, you will generally have crap on your screen. If you can use OCR software and create reflowable text, and then if you spend hours editing to correct the copious mistakes of the OCR software and to format the book so that images and tables appear where and as they should, you might wind up with a decent ePub. It is a lot of work that might take as long as reading the doc in printed format. Notebook paper sized PDF files are not designed to be viewed on a book reader. They are designed to be printed. They really don't even work that well on large computer monitors.

Bottomline: If you can create truly reflowable text from the scanned doc, then with some work you can create a very readable ebook. If your scanned doc looks like a copy of a copy of a copy of a copy of a copy, that is rather fuzzy and difficult to read even when printed, then you probably won't be able to create a usable ebook.
jswinden is offline   Reply With Quote
Old 05-26-2010, 03:23 PM   #5
Mike_73
Guru
Mike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheese
 
Posts: 736
Karma: 1123
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi
Quote:
Originally Posted by eksor View Post
scantailor when color/grayscale output is selected

Regards
I just gave this one a try, but it looks like all pages have to be single pictures. I tried it on a picture I had in my files, but I'm not sure what the program did. I made a project, applied some different settings and saved the project. How do I save my efforts?

On the other side, processing all pictures of a file is tedious, even if there's batch processing. I also would have to extract all pages first from my pdfs. So it's quite a bit of work.

Like stated in my first post, my pdfs are mostly readable. I would have wished to increase contrast of the text, but I'm not too eager to jump through loops as long as the files are readable sufficiently.


That pdfs made for letter sized prints or monitors is not the best thing for readers is obvious. Still, having the portability of the files on my reader is just plain awesome. I'm not looking for the 100% solution. I was just checking if I could increase my current 80% to somewhere between 85-90% without too much effort

Anyway, thanks a lot for the help
Mike_73 is offline   Reply With Quote
Old 05-27-2010, 07:13 AM   #6
eksor
Connoisseur
eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.
 
eksor's Avatar
 
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
Quote:
Originally Posted by Mike_73 View Post
I just gave this one a try, but it looks like all pages have to be single pictures. I tried it on a picture I had in my files, but I'm not sure what the program did. I made a project, applied some different settings and saved the project. How do I save my efforts?

On the other side, processing all pictures of a file is tedious, even if there's batch processing. I also would have to extract all pages first from my pdfs. So it's quite a bit of work.

Like stated in my first post, my pdfs are mostly readable. I would have wished to increase contrast of the text, but I'm not too eager to jump through loops as long as the files are readable sufficiently.


That pdfs made for letter sized prints or monitors is not the best thing for readers is obvious. Still, having the portability of the files on my reader is just plain awesome. I'm not looking for the 100% solution. I was just checking if I could increase my current 80% to somewhere between 85-90% without too much effort

Anyway, thanks a lot for the help


I apologize for the forgotten issues.

I thought that you begun with single images not with a pdf. Any way, pdftk can "burst" a multipage pdf to single page pdf. then with "convert" www.imagemagick.org you can convert them to png, jpg ... and process them with scantailor. After processing you will obtain a set of tiff. From this a pdf can be obtained with tiff2ps and ps2pdf or if you want, build a cbr (just rar thw whole set, after png or jpg conversion, and rename) and filter thru calibre or pdflrf. I have done this with a badly scanned comic with good results but not with a text. I will try to do some experiments to post to this thread.

Regards
eksor is offline   Reply With Quote
Old 05-27-2010, 12:36 PM   #7
Mike_73
Guru
Mike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheeseMike_73 can extract oil from cheese
 
Posts: 736
Karma: 1123
Join Date: Dec 2009
Device: PRS-505, PRS-600, iPad 16GB Wifi
Having a comparison would be awesome. I would appreciate your input.

Still, it sounds tedious!?!?
Mike_73 is offline   Reply With Quote
Old 05-28-2010, 05:43 AM   #8
eksor
Connoisseur
eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.eksor ought to be getting tired of karma fortunes by now.
 
eksor's Avatar
 
Posts: 94
Karma: 999884
Join Date: Jun 2009
Device: prs700, i-mate JAMin, smartq v7, GeeksPhone Zero, iPad 3rd Gen
Quote:
Originally Posted by Mike_73 View Post
Having a comparison would be awesome. I would appreciate your input.

Still, it sounds tedious!?!?
I'm glad to be useful.

Yes, it is tedious.

Lets begun with a pdf, Sample.pdf, in a Linux environment and finally I'd managed to do the whole thing without scantailor

Obtain individual pages with pdftk:

pdftk Sample.pdf burst

This outputs in the same directory pg_0001.pdf pg_0002.pdf and so on, in this case ends with the third page

Adjust individually the contrast and perform some filtering:

for a in $(seq -w 1 3); do convert -contrast -enhance pg_000$a.pdf eq_pg_000$a.pdf;done

(replace 3 with the last page, caution is needed with the number of zeroes, if more than 9 pages, and less than 100, then use pg_00$a.pdf instead as input and eq_pg_00$a.pdf as output and so on, the -w in seq means ad zeroes to the left)

after this we have eq_pg_0001.pdf eq_pg_0002.pdf eq_pg_0003.pdf in the working directory.

Now we want to build a new pdf with the processed pages:

pdftk eq*.pdf cat output eqSample.pdf

and that's all. Convert is a cross platform tool http://www.imagemagick.org/script/index.php. I don't know how to make scripts in MSDOS. I could try good old Digital DCL in VMS :-))

Hope this helps, regards.
Attached Files
File Type: pdf eqSample.pdf (307.6 KB, 206 views)
File Type: pdf Sample.pdf (245.8 KB, 150 views)
eksor is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scanned PDF onto Kindle 2. Help! Tac420oma PDF 6 07-20-2012 08:42 AM
PRS-600 Dictionary on scanned PDF? antistar Sony Reader 8 11-29-2009 03:05 PM
Some Calibre PDF>Mobi conversion advise please AdrianC Calibre 3 09-16-2009 02:00 PM
Ok I have scanned pdf books....but DeathtoToasters Sony Reader 38 11-04-2008 07:51 PM
pdf with scanned images Leite iRex 5 08-18-2008 12:54 PM


All times are GMT -4. The time now is 08:04 AM.


MobileRead.com is a privately owned, operated and funded community.