Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 01-13-2011, 07:56 AM   #1
Croker
Connoisseur
Croker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enough
 
Posts: 77
Karma: 630
Join Date: Sep 2008
Location: Liverpool, UK
Device: Kindle Voyage, obv.
Scanning/OCR going okay...converting to a final format isn't

Hello!

I'm just part way through scanning my first book, and I'm having some issues. I've got the scanning and OCR worked out now, to the point that I can get the text into a Word document with the formatting I like.

However, here's where I run into trouble. The ultimate intended destination for these files is my Kindle. I have tried importing the Word file directly into Mobipocket Creator, but that doesn't seem to work as I'd like. Page breaks seem to be recognised, but line breaks don't. So, if I've put a heading like "Chapter One", I leave a blank line underneath it. When created by MobiCreator, though, the blank line has vanished. Also, one paragraph (and a sentence in another paragraph) decided to display as bold, for no apparent reason.

Should I be trying to send the text from Word into another format first, and then convert to mobi later? Should I even be sending the text from the OCR program to Word?

Any advice warmly received!
Croker is offline   Reply With Quote
Old 01-14-2011, 01:17 AM   #2
Darqref
space cadet
Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.
 
Posts: 270
Karma: 725123
Join Date: Aug 2007
Location: Seattle area
Device: Rocket PRO, gen3, Pocketbook360
Quote:
Originally Posted by Croker View Post
Hello!

I'm just part way through scanning my first book, and I'm having some issues. I've got the scanning and OCR worked out now, to the point that I can get the text into a Word document with the formatting I like.

However, here's where I run into trouble. The ultimate intended destination for these files is my Kindle. I have tried importing the Word file directly into Mobipocket Creator, but that doesn't seem to work as I'd like. Page breaks seem to be recognised, but line breaks don't. So, if I've put a heading like "Chapter One", I leave a blank line underneath it. When created by MobiCreator, though, the blank line has vanished. Also, one paragraph (and a sentence in another paragraph) decided to display as bold, for no apparent reason.

Should I be trying to send the text from Word into another format first, and then convert to mobi later? Should I even be sending the text from the OCR program to Word?

Any advice warmly received!
I haven't done too many, and my formatting tastes are simple, but here's what I did.

1. get it into *reasonable* format in word. Don't try to be too specific about things, and it's best if you can be both simple and specific about formats such as quoted text and chapter headings.

2. Download HarryT's procedures on using BookDesigner and Mobipocketcreator. These tools are available from MR, and questions here tend to get some help.

3. Save the Word file as RTF. (incidently, my OCR program has the option of saving into more than one version of RTF. Always choose the simplest version. I use the one aimed at WordPad, not Word, since WordPad has a more limited RTF feature set.)

4. Import the file into BookDesigner, and clean up everything. make your chapter headings, build your table of contents, make sure to search for all the silly little places where a space got set to a different format, and such.

5. Save the file from BookDesigner as HTML0 (which is the native format for the tool). Don't try to have BookDesigner build the mobi file

6. Following Harry's instructions, use MobipocketCreator to generate the .mobi file

7. Archive the html0 and mobi files. If you find you want to tweak things in a simple manner, go back to BookDesigner. If you have complicated formatting requirements, ask for more help, 'cause you're beyond me.
Darqref is offline   Reply With Quote
 
Advertisement
Old 01-14-2011, 08:12 AM   #3
Croker
Connoisseur
Croker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enough
 
Posts: 77
Karma: 630
Join Date: Sep 2008
Location: Liverpool, UK
Device: Kindle Voyage, obv.
Quote:
Originally Posted by Darqref View Post
I haven't done too many, and my formatting tastes are simple, but here's what I did.

1. get it into *reasonable* format in word. Don't try to be too specific about things, and it's best if you can be both simple and specific about formats such as quoted text and chapter headings.

2. Download HarryT's procedures on using BookDesigner and Mobipocketcreator. These tools are available from MR, and questions here tend to get some help.

3. Save the Word file as RTF. (incidently, my OCR program has the option of saving into more than one version of RTF. Always choose the simplest version. I use the one aimed at WordPad, not Word, since WordPad has a more limited RTF feature set.)

4. Import the file into BookDesigner, and clean up everything. make your chapter headings, build your table of contents, make sure to search for all the silly little places where a space got set to a different format, and such.

5. Save the file from BookDesigner as HTML0 (which is the native format for the tool). Don't try to have BookDesigner build the mobi file

6. Following Harry's instructions, use MobipocketCreator to generate the .mobi file

7. Archive the html0 and mobi files. If you find you want to tweak things in a simple manner, go back to BookDesigner. If you have complicated formatting requirements, ask for more help, 'cause you're beyond me.
No, my formatting requirements aren't any more complicated than that.

To be honest, I don't know why I didn't think of using BD - I did use it a few times, ages ago, when I first got my Sony Reader, but I eventually drifted over to Sigil for ePub stuff.

I'll definitely give the above a whirl, though - thanks!
Croker is offline   Reply With Quote
Old 01-14-2011, 01:17 PM   #4
Croker
Connoisseur
Croker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enoughCroker will become famous soon enough
 
Posts: 77
Karma: 630
Join Date: Sep 2008
Location: Liverpool, UK
Device: Kindle Voyage, obv.
I've been testing this method today, and I'm very happy with my initial results. Soon I'll have an entirely digital collection of books, something I've wanted for a very long time!

Thanks gain for the assistance, Darqref!
Croker is offline   Reply With Quote
Old 01-15-2011, 01:56 AM   #5
Darqref
space cadet
Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.Darqref ought to be getting tired of karma fortunes by now.
 
Posts: 270
Karma: 725123
Join Date: Aug 2007
Location: Seattle area
Device: Rocket PRO, gen3, Pocketbook360
Quote:
Originally Posted by Croker View Post
I've been testing this method today, and I'm very happy with my initial results. Soon I'll have an entirely digital collection of books, something I've wanted for a very long time!

Thanks gain for the assistance, Darqref!
My pleasure. My problem is at the pre-Ocr stage. I'm using a digital camera, and the hassle of setting up the book to take a picture of each page makes it more work than I want, most of the time. I have a flatbed scanner, but not a page feeder, and I find the camera gives me a more accurate scan anyway (since I don't have to bend a spine or otherwise have the page misaligned.)
Darqref is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
General scanning/OCR advice? bfollowell Workshop 2 10-31-2010 07:08 AM
Recommendation for basic scanning software (non OCR) yunti Workshop 1 11-27-2009 08:08 AM
Converting OCR Text files jedavis1 Workshop 10 10-01-2009 11:09 PM
Best Format method for Scanning and storing Notes yunti Workshop 3 09-13-2009 06:53 PM
Preferred format for converting? Covak Sony Reader 2 11-21-2007 11:59 PM


All times are GMT -4. The time now is 10:18 AM.


MobileRead.com is a privately owned, operated and funded community.