Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-11-2008, 01:00 PM   #16
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by DDHarriman View Post
That depends to what final result you want.

If you want just a quick and dirty way of reading the book and can accept the errors that still are in the file, choose text, or word or html.

If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”.

Here you have to correct all the errors and format the book in a way that it looks good when read, like inserting page breaks before the begin of a new chapter (if you use the one book one file method), names of the chapters proper formatted so when you convert you generate a table of contents, etc, etc, etc, etc, etc, etc…
You can do part of the corrections in the OCR program itself or part/all of it outside, per example in word.
If you choose this late way, I advise you to save the output in pure text, so you can format everything from the beginning - if you save as word, many times, parts of the text are OCRed as different fonts, different sizes, bold you name it, and thus one passes sometimes hours just trying to un-format the text.

To convert to the final format, you have Calibre for the Sony, or Mobipocket creator for the Mobipocket format, etc…

One more thing: the proof reading part is always the most time consuming, irritating and difficult of any shift of supports workflow.
A normal book - as you say 400 pages - can take you from hours to days!!!
That’s why per example Project Gutenberg uses voluntaries to collectively proof read a book - let’s say 10 people for a 400 page book!

Good luck,

Thanks for all that info!

Here is what I have found.

The Finereader does a great job at the OCR.

When I transfer the pdf to the sony the smallest version is too small to read, but the formatting looks great.

It is when I increase the font size, which I assume activates the reflow, that the formatting then gets screwed up and weird.

There will be like 4 lines on a page then the next page is full, then 2 lines, etc.

So is there a way to fix the way the sony reflows the document?
DeathtoToasters is offline   Reply With Quote
Old 10-11-2008, 02:51 PM   #17
vivaldirules
When's Doughnut Day?
vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.vivaldirules ought to be getting tired of karma fortunes by now.
 
vivaldirules's Avatar
 
Posts: 10,059
Karma: 13675475
Join Date: Jul 2007
Location: Houston, TX, US
Device: Sony PRS-505, iPad
DeathtoToasters, if you would like a simple way to read the pdf book you've created from scans on your Sony Reader, I recommend that you try the utility program PDFLRF which is available free at MobileRead. It works pretty well and I use it all the time. The result will be an LRF composed of rotated images of the upper and lower halves of your scanned pages as separate pages of the LRF. The text (and any graphics) will be very readable and there won't be any OCR-related errors.
vivaldirules is offline   Reply With Quote
Advert
Old 10-11-2008, 04:09 PM   #18
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
I would be tempted to convert the PDF file into LRF, either as VR suggests, or by using Book Designer.
Patricia is offline   Reply With Quote
Old 10-11-2008, 04:57 PM   #19
Pulp
Palm Addict
Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.Pulp ought to be getting tired of karma fortunes by now.
 
Pulp's Avatar
 
Posts: 477
Karma: 1001951
Join Date: Aug 2008
Device: Cybook Gen3 [512mb, FW: 1.5]
Quote:
Originally Posted by Patricia View Post
No, OCR screws up the text in perfectly normal books too. I expect to spend about 10-15 hours unscrewing a straightforward novel, depending on its length, and on whether there are a lot of italics or emdashes to reinstate.
This really depends on the OCR-software you are using, I scanned a 560 page book 2 days ago and had it converted with Finereader 9 (trial) and the software is amazing - it comes with inbuilt dictionaries and "knows" what characters it had troubles with so the "proofing" process was done within an hour.
Pulp is offline   Reply With Quote
Old 10-11-2008, 06:51 PM   #20
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
The final result as always to do with the accuracy one wants.
And take care, it’s not written words miss-recognized, you get all kinds of little problems…

Example: in books with lots of talking, one thing that can happen, even with the 2 OCR programs I point above is, instead of:
- Hi, this is me speaking - said Peter - etc…
You can get:
-Hi, this is me speaking - said Peter -etc …
Not all these mistakes in one sentence of course, but you get the picture.

I know examples that could take 2 to 3 hours then others would take days.
DDHarriman is offline   Reply With Quote
Advert
Old 10-11-2008, 06:52 PM   #21
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
DeathtoToasters

Yes. That's exactly what people have reported with the Sony and PDF's after the last firmware upgrade.
DDHarriman is offline   Reply With Quote
Old 10-12-2008, 06:49 AM   #22
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Quote:
Originally Posted by DDHarriman View Post
If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”.
Personally, I find the "proof reading" mode too much work to be worth it. What I do, export the book to HTML, convert it to LRF and put it in my reader. Then I read the book (I would have done that anyway, so no extra work at all), bookmarking all pages with errors. When I finish the book, I simply fix all found errors. Much easier than using the proof mode, IMHO.

Quote:
A normal book - as you say 400 pages - can take you from hours to days!!!
With my method, it takes (the time to read the book) + (a few hours - usually two to three for a common paperback book). And I can't really count the (time to read the book), since the reading part is why I scanned the book in the first place.
pepak is offline   Reply With Quote
Old 10-12-2008, 09:09 PM   #23
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Well I am a firm believer in that a few errors are not going to be that big of a deal.

Have you ever seen that experiament when they mix up the letter in the midlde of the wrod and nodoby notcies becusae the mind knows what is being said and corrects it automtacilly.

Well I truly believe in that

I can scan a 400 page book into my mac within 30 min, so that is why I have no issues doing it.
DeathtoToasters is offline   Reply With Quote
Old 10-12-2008, 10:15 PM   #24
bbusybookworm
Tech Junkie
bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'
 
bbusybookworm's Avatar
 
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: Earth
Device: iPad, MotoXStyle, OnePlusOne
Quote:
Originally Posted by DeathtoToasters View Post
Well I am a firm believer in that a few errors are not going to be that big of a deal.

Have you ever seen that experiament when they mix up the letter in the midlde of the wrod and nodoby notcies becusae the mind knows what is being said and corrects it automtacilly.

Well I truly believe in that

I can scan a 400 page book into my mac within 30 min, so that is why I have no issues doing it.
Glad to see that its worked out for you.

Enjoy your Book(s)
bbusybookworm is offline   Reply With Quote
Old 10-12-2008, 11:56 PM   #25
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by bbusybookworm View Post
Glad to see that its worked out for you.

Enjoy your Book(s)

For the most part it did...now if I could find a solution for the terrible reflow issues.

Would converting to a lrf file solve this issue?
DeathtoToasters is offline   Reply With Quote
Old 10-13-2008, 12:22 AM   #26
bbusybookworm
Tech Junkie
bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'
 
bbusybookworm's Avatar
 
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: Earth
Device: iPad, MotoXStyle, OnePlusOne
Quote:
Originally Posted by DeathtoToasters View Post
For the most part it did...now if I could find a solution for the terrible reflow issues.

Would converting to a lrf file solve this issue?
Probably.

Reflow is a problem with PDF as the format is not really designed for resiziability.

Using LRF or RTF should probably give you a better result.
bbusybookworm is offline   Reply With Quote
Old 10-13-2008, 04:57 AM   #27
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Do not save as PDF in Finereader but text instead.

Then format it at will - forget proof reading if you want, but correct the false end paragraphs in normal lines that still appear eventualy - with word or any text editor, then create the final file in Sony’s own format - you will have resolved your reflow problems.
DDHarriman is offline   Reply With Quote
Old 10-13-2008, 12:52 PM   #28
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
If you really need to save the OCR document as PDF then change the page size of the PDF you create. It you set the page size to 120mm x 90mm then the page will fit on the Sony just fine and any other 6" reader that support PDF for that matter. Believe it or not you do not have to create huge full letter sized pages when you create a PDF.

Dale
DaleDe is offline   Reply With Quote
Old 10-13-2008, 02:08 PM   #29
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by DDHarriman View Post
Do not save as PDF in Finereader but text instead.

Then format it at will - forget proof reading if you want, but correct the false end paragraphs in normal lines that still appear eventualy - with word or any text editor, then create the final file in Sony’s own format - you will have resolved your reflow problems.
Wow this worked out great.

What I did was save the finereader end result as RTF then used calibre to convert it and the formatting is GREAT! Didn't have to edit a thing!

It does take about 10-25 seconds to bring the book up form the list or to change the font, but page turning is normal.
DeathtoToasters is offline   Reply With Quote
Old 10-13-2008, 02:08 PM   #30
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by DaleDe View Post
If you really need to save the OCR document as PDF then change the page size of the PDF you create. It you set the page size to 120mm x 90mm then the page will fit on the Sony just fine and any other 6" reader that support PDF for that matter. Believe it or not you do not have to create huge full letter sized pages when you create a PDF.

Dale
Hmmmm I will try that next!
DeathtoToasters is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Advise for scanned pdf Mike_73 Sony Reader 7 05-28-2010 05:43 AM
PRS-600 Dictionary on scanned PDF? antistar Sony Reader 8 11-29-2009 03:05 PM
Does it handle PDF books full of scanned pages? jusmee Astak EZReader 2 10-26-2009 07:06 PM
pdf with scanned images Leite iRex 5 08-18-2008 12:54 PM
preparing scanned books before PDF-ing sputnik Reading and Management 2 06-09-2008 02:00 AM


All times are GMT -4. The time now is 12:21 PM.


MobileRead.com is a privately owned, operated and funded community.