![]() |
#1 |
Junior Member
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9
Karma: 25648
Join Date: Mar 2010
Device: Sony PRS-600
|
Best way of scanning to PRS 600, newbie
Hello. I bought the PRS 600 just a few days ago and im in a project scanning in some books.. I wonder what is the best way to convert the files to epub?. When ive scanned them i put the PDF through omnipage. Then i either save it as PDF or DOC. But when i convert pdf through calibre to epub the text gets all mashed up in a way. not really the way i would want it. What's the best procedure to get the best quality of text in the epub format and later in the e-reader device?
|
![]() |
![]() |
![]() |
#2 |
Zealot
![]() Posts: 101
Karma: 38
Join Date: Jan 2010
Location: Seattle
Device: Red PRS-600, Slate Blue Astak EZReader Pocket Pro
|
Hi There, It appears that Omnipage can save in HTML. You could then use Calibre to import the HTML and convert to LRF or EPUB for your PRS-600. If like playing with command lines, Calibre offers that option too.
Cheers! |
![]() |
![]() |
![]() |
#3 |
eBook Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,544
Karma: 93383099
Join Date: Nov 2006
Location: UK
Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6
|
The important thing is not to try to convert PDF to other formats. PDF is not a "book" format - it doesn't contain paragraphs, sentences, or even words, and hence is extremely difficult to convert to other formats. You really need to use an OCR program and save your book as some "text" format, such as HTML or Word.
|
![]() |
![]() |
![]() |
#4 |
Reader
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 520
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
And even then you will have to edit the HTML/DOC to glue together hyphenated words, and connect paragraphs at page breaks, remove headers/footers etc.
|
![]() |
![]() |
![]() |
#5 | |
Zealot
![]() Posts: 101
Karma: 38
Join Date: Jan 2010
Location: Seattle
Device: Red PRS-600, Slate Blue Astak EZReader Pocket Pro
|
Quote:
As PRS-600 users might notice, some PDFs allow text re-sizing and re-flow, and others can only be zoomed. In the latter case we are dealing with image-based PDFs. There are good reasons for doing such conversions. For the most part, the PRS-600 does a really nice job rendering and re-flowing most of the PDFs I need to read. That's one of the main reason I chose the Sony Reader as I understood it excels at this. But I've recently encountered PDFs that were made in Linux (I'm not saying the weirdness was because of Linux) that look fine when printed or viewed full-screen on a computer, but suffer from weird mid-word line breaks on the PRS-600. I've found that converting such books into HTML using MobiPocket Creator, stripping out the header and footer code using regex in a text editor, and then converting to EPUB or LRF in Caliber makes these former PDFs more comfortable to read. But I wouldn't go through all that trouble unless the PDFs were mis-behaving in the first place. I've been delighted to learn about some tools that can help make the image-only PDFs more convenient to view on Sony Readers, as well as do quick PDF-PDF conversions to minimize whitespace. While I'm willing to OCR an image-only PDF to extract the text, sometimes it's more trouble than it's worth. And if you're dealing with non-Roman characters (a book on Classic Greek grammar, for instance) it is even more difficult. There's a lot of different approaches to these problems, and what might be ideal for me could be unacceptable to someone else. I think it's best to know what all the options are, in any case. PDF conversion to other formats is but one of these options. Cheers! |
|
![]() |
![]() |
![]() |
#6 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
From my personal experience, unless the book is really oddly formatted, or the scans amazingly clean, it's just not worth OCRing a scanned book if all I plan to do is read it on my Sony myself, especially with all the manual correction that would need to be done after the OCR.
(If I plan to distribute the book to others, that's another ball of wax.) Scan the book -- if they're available on your OS, run it through something like unpaper or scan tailor, and finally, process the resulting image-based PDF with something like PDFRead or pdflrf to divide the text into Sony-reader-sized chunks. |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
My thoughts, PRS-300, PRS-600, PRS-505, PRS-700, Kindle 2 | zacheryjensen | Sony Reader | 78 | 12-05-2010 07:33 PM |
PRS-300 $150, PRS-600 $170 + $25 Gift Card; PRS-900 $250 + extra Battery @SonyUSA | Lilly | Deals and Resources (No Self-Promotion or Affiliate Links) | 1 | 07-25-2010 02:14 PM |
Newbie MobileRead(er) with Sony 600 | jmterry2014 | Introduce Yourself | 8 | 03-17-2010 02:43 PM |
Canadian perspective of the recent Sony ebook readers - PRS-300/PRS-600/PRS-2121 | nrapallo | News | 1 | 08-29-2009 10:38 AM |