MobileRead Forums - View Single Post

DDHarriman · 10-11-2008, 11:50 AM

That depends to what final result you want.

If you want just a quick and dirty way of reading the book and can accept the errors that still are in the file, choose text, or word or html.

If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”.

Here you have to correct all the errors and format the book in a way that it looks good when read, like inserting page breaks before the begin of a new chapter (if you use the one book one file method), names of the chapters proper formatted so when you convert you generate a table of contents, etc, etc, etc, etc, etc, etc…
You can do part of the corrections in the OCR program itself or part/all of it outside, per example in word.
If you choose this late way, I advise you to save the output in pure text, so you can format everything from the beginning - if you save as word, many times, parts of the text are OCRed as different fonts, different sizes, bold you name it, and thus one passes sometimes hours just trying to un-format the text.

To convert to the final format, you have Calibre for the Sony, or Mobipocket creator for the Mobipocket format, etc…

One more thing: the proof reading part is always the most time consuming, irritating and difficult of any shift of supports workflow.
A normal book - as you say 400 pages - can take you from hours to days!!!
That’s why per example Project Gutenberg uses voluntaries to collectively proof read a book - let’s say 10 people for a 400 page book!

Good luck,

10-11-2008, 11:50 AM	#15
DDHarriman Guru Posts: 860 Karma: 4380 Join Date: Feb 2008 Location: Almada, Portugal Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note	That depends to what final result you want. If you want just a quick and dirty way of reading the book and can accept the errors that still are in the file, choose text, or word or html. If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”. Here you have to correct all the errors and format the book in a way that it looks good when read, like inserting page breaks before the begin of a new chapter (if you use the one book one file method), names of the chapters proper formatted so when you convert you generate a table of contents, etc, etc, etc, etc, etc, etc… You can do part of the corrections in the OCR program itself or part/all of it outside, per example in word. If you choose this late way, I advise you to save the output in pure text, so you can format everything from the beginning - if you save as word, many times, parts of the text are OCRed as different fonts, different sizes, bold you name it, and thus one passes sometimes hours just trying to un-format the text. To convert to the final format, you have Calibre for the Sony, or Mobipocket creator for the Mobipocket format, etc… One more thing: the proof reading part is always the most time consuming, irritating and difficult of any shift of supports workflow. A normal book - as you say 400 pages - can take you from hours to days!!! That’s why per example Project Gutenberg uses voluntaries to collectively proof read a book - let’s say 10 people for a 400 page book! Good luck,