Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-14-2012, 11:52 AM   #1
wastewater
Junior Member
wastewater began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2012
Device: android
Smile ebook workflow advice please

I had an idea of taking a textbook for work and converting it to an eBook and then perhaps into an audio book. Of course this is not as easy as I had hoped it would be (the learning curve is steep) but I am having fun learning how to do it.

I first scanned the pages with the zerox copier at work to tiff files with a resolution of 400. I think this is a good format to use however; I can also scan the pages into JPEG, XPS or PDF. Which format would you recommend? Keep in mind that I don't have the adobe acrobat editing software nor any XPS editing software just the viewers and any freeware I might be able to use. However the PDF and XPS formats in the copier does have an option for OCR output which would eliminate that extra step later on
I scanned the first chapter as a test in the Tiff format and used Scan Tailor to split and align the pages. I tried to use Cuneiform OCR software on the tiff files and it seems to work well.
I have also learned that I can combine the single Tiff files into one multiple tiff file using the IrfanView software if I need to do so.
However, if I should use the Cuneiform OCR software at all, what format should the OCR output be in to make the process as simple as possible
The choices are formatted text (*.txt), HTML (*.htm), Interior Format (*.fed), Rich Text Format (*.rtf), table Text of DBF format (*.dbf), table text (*.txt) or finally unformatted text (*.txt)
How do I combine the many pages of output text regardless of the format into a usable eBook?
Epub format would be a nice final choice but any format that I could convert to other formats using Calibrie would be good.
Please keep in mind that the text book has some photos and a few graphs and tables but mostly text, my O.S. is Windows 7, I am fairly tech savvy but software programming is beyond my realm of knowledge (I am a wastewater treatment plant operator by profession) and this is a low budget protect so I don't want to buy any expensive software like Adobe Acrobat to get this accomplished.

I am sure there are many paths I could take to get this accomplished. Any thoughts or advice that would make my process as simple as possible would be much appreciated.

Thanks

Life is good!
wastewater is offline   Reply With Quote
Old 01-14-2012, 02:13 PM   #2
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 413
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
So what you're basically asking is where can you get free software. Hmmm... For OCR-ing, ABBYY FineReader 11 is currently the best but you could also take a look at Tesseract 3.01 (see the "User interfaces" section as well).

"Tesseract is considered one of the most accurate free software OCR engines currently available."

For layout and post-processing there's also:
  • LibreOffice (for ODT files, RTF, etc) with the Writer2ePub plugin (for exporting as ePub)
  • Sigil (for editing ePub files)
  • Calibre (for converting various formats, including ePub to Mobi) ...

For PDF I don't know any open source software but I'm sure there are plenty. Calibre is one of them. Note that PDF is Adobe's baby, always has been. They invented it. While it was released as an open standard in 2008, Adobe Acrobat is still their sugar daddy.
DSpider is offline   Reply With Quote
Old 01-14-2012, 04:51 PM   #3
wastewater
Junior Member
wastewater began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2012
Device: android
it is not so much free software i need but workflow advice. Should i scan to tiff files or pdf files or xps or jpeg or some other format first. what process fllow leads to good results with the fewest steps
wastewater is offline   Reply With Quote
Old 01-15-2012, 02:53 PM   #4
pholy
Booklegger
pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.pholy ought to be getting tired of karma fortunes by now.
 
pholy's Avatar
 
Posts: 1,790
Karma: 7999034
Join Date: Jun 2009
Location: Toronto, Ontario, Canada
Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch
My workflow scans to .png files, then creates an rtf file. The .tiff files would also be good, but .jpg files tend to lose their sharp edges, making the OCR more difficult and prone to errors. As I understand it, the jpeg compression was intended for photos from nature, where there aren't so many sharp edges.
I do my major corrections to the rtf file in OpenOffice, then output to html files which I clean up with HTML-Tidy and various scripts. The toc and ncx files are mostly boiler plate, and then I zip it into an epub file. The proofreading and corrections take the most time, and I do it both with the rtf files and the html files, and again with the supposedly final epub file.

Hope this helps you somewhat.
pholy is offline   Reply With Quote
Old 01-16-2012, 10:59 AM   #5
wastewater
Junior Member
wastewater began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2012
Device: android
scan tailor issue

thanks but now i am having a problem loading the raw tiffs into scan tailor for post processing. i did a test and played around before with a previous set of scans of the first chapter and everything seemed OK now i scanned the whole book into tif's and i cant upload the images. i get the message ( the following file could not be uploaded" and some of the blank thumbnail images to the right start to turn black. i can open the TIF images in other programs like windows live photo gallery and they look fine. i tried to uninstall and reinstall scan tailor but that did not help any ideas ???

I think the black images are just the thumbnails in the out file I guess???

Last edited by wastewater; 01-16-2012 at 11:19 AM.
wastewater is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Advice for ebook site Visuddhi General Discussions 6 03-08-2011 10:11 PM
Opinion on workflow (and enhancing it) - research-type workflow TheDarkTrumpet Which one should I buy? 8 03-02-2009 10:41 AM
ebook noob with a workflow strategy question. Bierkonig Workshop 0 12-05-2008 08:21 AM
Need workflow for creating EBook venkan Introduce Yourself 2 11-13-2008 12:24 PM
Advice on first eBook - Conan Spellbot 5000 Upload Help 5 05-29-2008 04:04 PM


All times are GMT -4. The time now is 10:02 AM.


MobileRead.com is a privately owned, operated and funded community.