Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 06-22-2011, 12:29 AM   #1
taxi12
Junior Member
taxi12 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2011
Device: Nook
Question New to scanning and creating ebooks

Hi, I'm new to forums and to (trying) to make ebooks from books. I must say that I'm having quite a difficult time of it and thus decided to join this forum in the hopes that all you more advanced and experienced people might be able to help me out!

I understand converting file formats in Calibre and rely on this program for that and for maintaing my library. I also use th Barnes and Noble Nook for my e-reader.

My quandry arises as I try to actually make the ebooks themselves. You see, I want to cut down on the actual amount of stuff I have and books are a big part of that. I even bought a guillotine paper cutter so that I can cut off the spines of the books and feed them into an automatic document feeder due to my lack of patience to scan each page. I own a Neat Desk Scanner (which I haven't seen mentioned in the forums) also an HP Office Jet Pro L7700 series but would prefer if I can to use the Neat Desk Scanner as it will scan both sides of the page at the same time whereas the HP won't.

Now comes the confusing part for me. Getting the document into an ebook readable form. What comes out is a PDF though I think that I can convert it to a tiff or jpeg image. I've tried various converters like Free OCR, AVS Document Converter and Nuance but none of these seem make what I think I'm looking for... not that I'm completely sure about that anyway. Finally, I've ordered Abbyy Finereader 10 and expect it sometime this week or next hoping that this is the answer to my dilema but I worry that I am still missing large parts of information. I began to think that my Neat Desk Scanner might be the problem but I'll have to determine that after I try the Abbyy software I suppose. I think that the Neat Desk Scanner is similiar to the Fujitsu Scanner though now reading through the forum and looking at the website it looks as though the Fujitsu ha more capability. I run Windows Vista on my laptop and some version of Windows XP Media Center on my desktop and prefer to use my laptop.

I've read through some of the forums already, and I must say that some of the technical jargon is somewhat intimidating... I'm hoping that I can get help in a somewhat for beginners form.

Well, that's about it... I'm surpised if you've made it this far with me... sorry for the length, but others have said to put in all these details if you really wanted help and I REALLY want some help.
Thanks to all who work hard on this forum.
taxi12 is offline   Reply With Quote
Old 06-22-2011, 01:00 PM   #2
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Try our wiki. There is quite a bit of data on this topic. You might want to start with Digitizing Paper Books to Ebooks. There are also topics on OCR and related tasks.

Dale
DaleDe is offline   Reply With Quote
Old 06-29-2011, 01:52 AM   #3
taxi12
Junior Member
taxi12 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2011
Device: Nook
Thanks DaleDe,

I actually started through the Wiki pages before I got to MR.... that's how I found MR. I just finished editing my first ebook today... though it's not perfect. Here were my steps.

What I did was cut off the spine, send it through my Neat Desk Scanner (as it scans both sides of the page at the same time) and then sent it through Abbyy FineReader 10 and then sent it to Word 2007 to catch any other errors and fix the margins. Then I sent it through Calibre to turn it into an ebub file.
I'm having trouble with Abbyy not recognizing the I's correctly, often putting a 1 in place of the I. I had to correct the whole file with that error because it would not light up the Replace All button. It also had trouble with the H- another one I had to replace throughout the whole document. Are there tips for editing?
Also, should I press that button to get rid of headers and footers? (I ended up taking them out individually in Word anyway) and wondered if that button in Abbyy would relieve that step.
And what about getting rid of the formatting? Should I do that for the whole document? I don't really know anything about Book Designer... would it add to what I'm already doing or would it just be another step?
I spent days working on this and I know that there has to be a faster way. I'm trying to get rid of my books and turn them into ebooks but this process is kinda slow. I figure I must be doing something wrong.
Do you have any ideas or know of someone who can steer me in the right direction?
Thanks,
Taxi12
taxi12 is offline   Reply With Quote
Old 07-04-2011, 12:16 PM   #4
Michael Grossman
I Michael Grossman
Michael Grossman began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Jun 2011
Location: Rhode Island
Device: ipad
taxi12

When you used the Neat Desk Scanner Taxi12, did you find that it did an okay job and that it was not the problem you referred to in your first post? - thanks, Michael
Michael Grossman is offline   Reply With Quote
Old 07-26-2011, 12:17 AM   #5
taxi12
Junior Member
taxi12 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2011
Device: Nook
Hi Michael,

It's hard to say... I haven't much to compare it to. The scanner with its capability to scan both sides at the same time is great and it goes pretty fast. I'm still having trouble with the OCR part though. I just don't know how much is too much or a good amount of work that one has to put into editing what comes out. Do you have any tips on working with the OCR software? I'm just curious as to how some people are able to put out so many books... Do they have some awesome OCR software I don't know about or what are they doing to make the process go faster? I've only done two books so far and it took quite a bit of editing and for the amount of books I want to do... I'm feeling kind of overwhelmed at even starting until I get some better information to help me work faster.
taxi12 is offline   Reply With Quote
Old 07-26-2011, 05:21 AM   #6
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
I'm not very fond of automatic processing so that "remove headers and footers" option in FineReader doesn't sound like a very wise thing. For instance if the bottom row is part of the text but the author chose a smaller font... the program will probably see it as a footer. And remove it. Which I really really DON'T want. I don't know, maybe I'm just paranoid... But I usually select the text area manually. Doesn't matter if the page has two or more text areas, just as long as the text you're trying to extract is selected.

Batch replacing isn't a very good idea, unless you know what you're doing - for instance replacing all minuses between spaces (" - ") with an en dash (" – ") between spaces. You could also use the "Replace" button (not "Replace All") and it will automatically take you to the next instance to see if it should be replaced or left alone. It's a pretty neat feature. Use with caution though...


Regarding better OCR software... there is none. And probably never will because a lot of books have various printing imperfections. It's true that newer publications use better printing methods but then there's the occasional typo, mistranslation, etc... So there's going to be some kind of proof reading once you're done, at least once.

I usually proofread in FineReader initially (to have the original scan available right under the extracted text) and a second time on my device or computer screen, depending on the book or output format. I then highlight typos that were omitted in the initial phase (with a yellow background) and correct them in the source document. And I do this for the entire book. Meaning I (casually) read the book a second time. It's a chore, I know. But the end result is a very high quality e-book.

Last edited by DSpider; 07-26-2011 at 05:27 AM.
DSpider is offline   Reply With Quote
Old 07-26-2011, 07:10 AM   #7
wayrad
Fanatic
wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.
 
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
I always use the "remove headers and footers" option when saving to Word in FineReader 9. It often misses one or two headers or footers in a book, but that's better than removing them all manually. Never had any trouble with it removing anything it shouldn't.

That said, after scanning some 50 books I'm still learning. It is a heck of a lot of work, most of it occurring after the actual scanning.
wayrad is offline   Reply With Quote
Old 07-26-2011, 09:59 AM   #8
DSpider
Evangelist
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 450
Karma: 343115
Join Date: Nov 2009
Location: Romania
Device: PW2 2014
Really? How do you know it didn't remove anything else? Did you memorize the entire book? Unless you did memorize it, you probably won't even know parts are missing.

From my experience scanning isn't that bad. Processing, however, takes a lot more time and effort. And believe me, once 95% of the content is done, that extra 5% takes a heck of a lot more than you'd expect. Being a perfectionist at heart, let me tell you, it can get pretty frustrating. But the reward afterwords... Priceless. There's just something about taking pride in a job well done.

Last edited by DSpider; 07-26-2011 at 10:03 AM.
DSpider is offline   Reply With Quote
Old 07-26-2011, 11:38 AM   #9
wayrad
Fanatic
wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.wayrad ought to be getting tired of karma fortunes by now.
 
Posts: 551
Karma: 1121392
Join Date: May 2008
Location: USA
Device: HTC One M8
Quote:
Originally Posted by DSpider View Post
Really? How do you know it didn't remove anything else? Did you memorize the entire book? Unless you did memorize it, you probably won't even know parts are missing.
Er...I compared the ebook with the hard copy, of course!

edited to add: Oh, I see, you don't look at the dead tree book again once you scan it? I normally proofread with it in front of me. Even so, removing headers and footers when saving to Word doesn't remove them from the Finereader file, at least not in FR9, so one could do the comparison equally well onscreen.

Last edited by wayrad; 07-26-2011 at 11:53 AM.
wayrad is offline   Reply With Quote
Old 07-27-2011, 10:14 AM   #10
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
I never set it on automatically processing. After reading the pdf, I do the automatic scanning. I then check each page to see if the text area is correct. Sometimes (especially with older books) some parts are seen as images.
Then I *always* use training patterns. I train for about two pages and then let Abbyy process the rest. After finishing I transfer the lot to Word (without headers and footers, never missed a beat there, except for the occasional page number) and run some macro's there to correct a lot of default errors. Then I do the spelling control and check the layout. I don't worry too much about the layout, since I will use stylesheets anyway.
When this is finished, I make it into a HTML file. Either by a macro or via 'filtered HTML' and load it into Sigil. There I do my final work.
Depending on the book, it usually takes me about 4-6 hours for a normal novel. When I read the book, I proofread it and fix the final issues. Usually about 4-10 per novel.

Most things you mention are 'normal' and expected OCR faults. Another one is incorrectly identify paragraphs. That is why I export to Word with linebreaks intact. The macro I use will transform them back to paragraphs. When I check the layout, I also check for sentences in a paragraph which just happens to end exactly on a line.
Toxaris is offline   Reply With Quote
Reply

Tags
ebook creation, ocr software, scanners, scanning books


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help with creating ebooks suitable for ebook readers greenapple PDF 2 02-25-2010 06:38 AM
Paragraph spacing when creating eBooks? gwynevans Workshop 21 04-24-2009 11:01 AM
Creating eBooks - formats, problems marcinJ13 Bookeen 6 05-04-2008 08:21 PM
Creating .xeb ebooks and Templates for the Iliad CommanderROR Workshop 18 07-31-2006 05:59 PM
Creating eBooks Isn't So Hard After All Antoine of MMM News 4 06-01-2005 04:25 PM


All times are GMT -4. The time now is 10:21 PM.


MobileRead.com is a privately owned, operated and funded community.