Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2010, 03:15 PM   #1
PGA
Junior Member
PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.PGA knows what's going on.
 
Posts: 9
Karma: 25648
Join Date: Mar 2010
Device: Sony PRS-600
Best way of scanning to PRS 600, newbie

Hello. I bought the PRS 600 just a few days ago and im in a project scanning in some books.. I wonder what is the best way to convert the files to epub?. When ive scanned them i put the PDF through omnipage. Then i either save it as PDF or DOC. But when i convert pdf through calibre to epub the text gets all mashed up in a way. not really the way i would want it. What's the best procedure to get the best quality of text in the epub format and later in the e-reader device?
PGA is offline   Reply With Quote
Old 03-01-2010, 03:52 PM   #2
Xochipilli2012
Zealot
Xochipilli2012 began at the beginning.
 
Xochipilli2012's Avatar
 
Posts: 101
Karma: 38
Join Date: Jan 2010
Location: Seattle
Device: Red PRS-600, Slate Blue Astak EZReader Pocket Pro
Hi There, It appears that Omnipage can save in HTML. You could then use Calibre to import the HTML and convert to LRF or EPUB for your PRS-600. If like playing with command lines, Calibre offers that option too.

Cheers!
Xochipilli2012 is offline   Reply With Quote
Old 03-02-2010, 03:06 AM   #3
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
The important thing is not to try to convert PDF to other formats. PDF is not a "book" format - it doesn't contain paragraphs, sentences, or even words, and hence is extremely difficult to convert to other formats. You really need to use an OCR program and save your book as some "text" format, such as HTML or Word.
HarryT is offline   Reply With Quote
Old 03-02-2010, 03:46 AM   #4
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
And even then you will have to edit the HTML/DOC to glue together hyphenated words, and connect paragraphs at page breaks, remove headers/footers etc.
pietvo is offline   Reply With Quote
Old 03-03-2010, 04:27 AM   #5
Xochipilli2012
Zealot
Xochipilli2012 began at the beginning.
 
Xochipilli2012's Avatar
 
Posts: 101
Karma: 38
Join Date: Jan 2010
Location: Seattle
Device: Red PRS-600, Slate Blue Astak EZReader Pocket Pro
Quote:
Originally Posted by HarryT View Post
The important thing is not to try to convert PDF to other formats. PDF is not a "book" format - it doesn't contain paragraphs, sentences, or even words, and hence is extremely difficult to convert to other formats. You really need to use an OCR program and save your book as some "text" format, such as HTML or Word.
Harry, I usually know better than to mess with a Dalek, but your statement is only correct if you are referring to "image-only" PDFs. Most commercially produced PDFs do have text, but because it's a format that is intended to be printer-friendly, one needs to contend with headers, footers, and page numbers when converting to other formats.

As PRS-600 users might notice, some PDFs allow text re-sizing and re-flow, and others can only be zoomed. In the latter case we are dealing with image-based PDFs.

There are good reasons for doing such conversions. For the most part, the PRS-600 does a really nice job rendering and re-flowing most of the PDFs I need to read. That's one of the main reason I chose the Sony Reader as I understood it excels at this. But I've recently encountered PDFs that were made in Linux (I'm not saying the weirdness was because of Linux) that look fine when printed or viewed full-screen on a computer, but suffer from weird mid-word line breaks on the PRS-600. I've found that converting such books into HTML using MobiPocket Creator, stripping out the header and footer code using regex in a text editor, and then converting to EPUB or LRF in Caliber makes these former PDFs more comfortable to read. But I wouldn't go through all that trouble unless the PDFs were mis-behaving in the first place.

I've been delighted to learn about some tools that can help make the image-only PDFs more convenient to view on Sony Readers, as well as do quick PDF-PDF conversions to minimize whitespace. While I'm willing to OCR an image-only PDF to extract the text, sometimes it's more trouble than it's worth. And if you're dealing with non-Roman characters (a book on Classic Greek grammar, for instance) it is even more difficult.

There's a lot of different approaches to these problems, and what might be ideal for me could be unacceptable to someone else. I think it's best to know what all the options are, in any case. PDF conversion to other formats is but one of these options.

Cheers!
Xochipilli2012 is offline   Reply With Quote
Old 03-03-2010, 11:24 AM   #6
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
From my personal experience, unless the book is really oddly formatted, or the scans amazingly clean, it's just not worth OCRing a scanned book if all I plan to do is read it on my Sony myself, especially with all the manual correction that would need to be done after the OCR.

(If I plan to distribute the book to others, that's another ball of wax.)

Scan the book -- if they're available on your OS, run it through something like unpaper or scan tailor, and finally, process the resulting image-based PDF with something like PDFRead or pdflrf to divide the text into Sony-reader-sized chunks.
frabjous is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
My thoughts, PRS-300, PRS-600, PRS-505, PRS-700, Kindle 2 zacheryjensen Sony Reader 78 12-05-2010 07:33 PM
PRS-300 $150, PRS-600 $170 + $25 Gift Card; PRS-900 $250 + extra Battery @SonyUSA Lilly Deals and Resources (No Self-Promotion or Affiliate Links) 1 07-25-2010 02:14 PM
Newbie MobileRead(er) with Sony 600 jmterry2014 Introduce Yourself 8 03-17-2010 02:43 PM
Canadian perspective of the recent Sony ebook readers - PRS-300/PRS-600/PRS-2121 nrapallo News 1 08-29-2009 10:38 AM


All times are GMT -4. The time now is 02:30 AM.


MobileRead.com is a privately owned, operated and funded community.