Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-19-2011, 05:54 PM   #1
sovre
Connoisseur
sovre began at the beginning.
 
sovre's Avatar
 
Posts: 86
Karma: 10
Join Date: Dec 2010
Location: California
Device: iPod Touch; PRS-950
great looking pdf ebooks?

I'm curious about how the great looking PDF ebooks I have seen are created. Are all of these done using an ADF? Or is there third party software that can help you get this result with a flatbed scanner, cleaning up your document and making it look more professional?

I use ABBYY Finereader for my scanning. When I scan a book and save it in pdf format I end up with a very rough looking document which I would not want to put on my eReader. Everything is visible: page shadows, slope and bend of scanned pages, page edges.

It seems like if you save in PDF format ABBYY insists on saving the entire page image. Why isn't there a way to just save the "text" part of the page in pdf format? Or is there a way, and I don't know about it?

There are some things I would actually much rather save and read in PDF format because they are too hard to work with and clean up in RTF format, but I wish I knew of a way to make a more presentable looking PDF.

Last edited by sovre; 11-19-2011 at 06:01 PM.
sovre is offline   Reply With Quote
Old 11-20-2011, 07:27 AM   #2
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 427
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
"Text under image" is the default option in FineReader. Check the settings for PDF output. It will reduce the filesize considerably.

Here's some quick tips:
  • After you proof-read the entire book in FineReader, export to .docx as "Formatted Text" with line breaks intact.
  • (Optional) Font matching. I'm not gonna go into details here. This thing needs a wiki page of its own or at least a sticky topic or something.
  • Edit the .docx file with Word 2010 SP1 using macros for setting character spacing whenever a line doesn't fit. Turn on hidden characters for this, it's the reverse "P" character (link).
  • Export to PDF using Adobe Acrobat X.
  • (Optional) Vectorize the covers and any other graphics such as diagrams, etc., with Vector Magic and edit them with Inkscape.
  • (Optional) Try to start topics with capital letters next time.

Book scanning "the quality way" is more or less of an art form. It takes patience and time to track down fonts, proof-read, vectorize covers, diagrams, etc. But it's totally worth it if done right, especially if it's a good book. It's a pleasure to read such a book. The "quick and dirty" way is by simply using Scan Tailor, wrap them up in a PDF and that's it.

Last edited by DSpider; 11-20-2011 at 07:52 AM.
DSpider is offline   Reply With Quote
 
Advertisement
Old 11-23-2011, 05:05 PM   #3
sovre
Connoisseur
sovre began at the beginning.
 
sovre's Avatar
 
Posts: 86
Karma: 10
Join Date: Dec 2010
Location: California
Device: iPod Touch; PRS-950
DSpider, thanks for your helpful reply. Experimenting with the PDF settings I changed the FineReader option from "text under image" to "text and pictures only" and doing this alone produced a much more readable looking document. One thing I noticed, though, with the book I used for my test-scan is that the blocks of text do not remain aligned from page to page and now and then for some reason two parallel pages appear together as one page, despite the fact I have FineReader set to split pages.

Is there a tutorial you could point me to illustrating your third suggestion? I would like to give this a try, but it is unknown territory to me, because I don't have any experience using macros or advanced features in MS Word.

Last edited by sovre; 11-23-2011 at 05:12 PM.
sovre is offline   Reply With Quote
Old 11-24-2011, 04:39 AM   #4
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 427
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
FineReader sometimes doesn't split pages. It's very rare, but it happens. So you'll need to click the "Edit Image" button and split it from there.

Blocks of text do not remain aligned only if you export as "Exact Copy". This happens because the scanned material isn't properly aligned either; some pages will be scanned lower on the glass window, some will be higher(*). But if you export to .docx as "Formatted Text", the text will be aligned to the margins and page size of your choosing.

If some lines don't fit the set margin and continue on the next line, using macros would save a lot of time. Instead of editing each one by going through the laborious task of right clicking, selecting Font, selecting Advanced, and adding 0.05 font spacing increments (0.05, 0.1, 0.15, 0.2, etc) for dozens or perhaps hundreds of such lines, you simply set up macros to activate when you press a key combination (or a "hotkey").

My advice is if that you're gonna use Word, you may as well learn how to use it properly. There are some very good tutorials from TotalTraining, Lynda.com, etc.


* If you really fancy "Exact Copy", you could use Scan Tailor to align them before importing into FineReader. For OCR-ing it's recommended that you output to grayscale instead of black and white for better accuracy (FineReader has it's own filtering method). Personally I think that "Exact Copy" isn't worth it. It produces a lot of "fixed objects" and outputting as tagged PDF would be useless.

Last edited by DSpider; 11-24-2011 at 07:05 AM.
DSpider is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Great WSJ article on ebooks Scott Nicholson News 16 06-10-2010 03:48 PM


All times are GMT -4. The time now is 06:12 PM.


MobileRead.com is a privately owned, operated and funded community.