MobileRead Forums - View Single Post

pholy · 01-15-2012, 02:53 PM

My workflow scans to .png files, then creates an rtf file. The .tiff files would also be good, but .jpg files tend to lose their sharp edges, making the OCR more difficult and prone to errors. As I understand it, the jpeg compression was intended for photos from nature, where there aren't so many sharp edges.
I do my major corrections to the rtf file in OpenOffice, then output to html files which I clean up with HTML-Tidy and various scripts. The toc and ncx files are mostly boiler plate, and then I zip it into an epub file. The proofreading and corrections take the most time, and I do it both with the rtf files and the html files, and again with the supposedly final epub file.

Hope this helps you somewhat.

01-15-2012, 02:53 PM	#4
pholy Booklegger Posts: 1,801 Karma: 7999816 Join Date: Jun 2009 Location: Toronto, Ontario, Canada Device: BeBook(1 & 2010), PEZ, PRS-505, Kobo BT, PRS-T1, Playbook, Kobo Touch	My workflow scans to .png files, then creates an rtf file. The .tiff files would also be good, but .jpg files tend to lose their sharp edges, making the OCR more difficult and prone to errors. As I understand it, the jpeg compression was intended for photos from nature, where there aren't so many sharp edges. I do my major corrections to the rtf file in OpenOffice, then output to html files which I clean up with HTML-Tidy and various scripts. The toc and ncx files are mostly boiler plate, and then I zip it into an epub file. The proofreading and corrections take the most time, and I do it both with the rtf files and the html files, and again with the supposedly final epub file. Hope this helps you somewhat.