Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 05-09-2013, 12:05 PM   #1
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
scanned my first book...now what?

hello everyone

ok so yesterday i scanned my first book with the brother ads2000. the whole process was a success that took me roughly two hours. most of that time was spent cutting the book and seperating the pages.

now, my concerns are:

The pdf, it is searchable because i ran it through ocr, still looks like scanned pages. i want to make it look like an ebook with a white background. what do i do?

secondly, i would like to reduce the file size without comprimising the quality of the scanned text.

and lastly, im a complete noob to this process so dont assume anything. please explain in detail what it is that im supposed to do.

thank you!!!
noork85 is offline   Reply With Quote
Old 05-09-2013, 01:41 PM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Search around, this question has been asked many, many times. In short, take the text from the OCR in the wordprocessor you want and fix all the OCR errors. Convert that to HTML and use that as base for your ebook.
Read up on your HTML/CSS skills.
Toxaris is offline   Reply With Quote
Advert
Old 05-09-2013, 06:12 PM   #3
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
what kind of ocr errors am i looking for?
noork85 is offline   Reply With Quote
Old 05-09-2013, 06:50 PM   #4
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,145
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
Quote:
Originally Posted by noork85 View Post
what kind of ocr errors am i looking for?
Any / all.
Punctuation
Words not recognized correctly
Double letters such as ll
Formatting issues
PeterT is offline   Reply With Quote
Old 05-13-2013, 03:35 PM   #5
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If the book had columns, or pictures inset into the page you want to particularly be watching for sentences which start in one column and end with the text from another. You will have to locate the missing chucks, probably later in the text and reunite them with the text by referring to your book or pdf.

Letter substitutions happen all the time. R becomes E, etc.

Footnotes which may be a good idea in a printed book are much less so in an epub...endnotes at the end of each chapter are better because they are quicker to reach in the same section. This may occasion renumbering as well as links to them and back from them.

Footnotes also, if they continue from one page to another in the same footnote, the second section can end up as nonsense text in the middle of the body of the text.

Even after you think you are done, save the thing and put it into your device and read it again, although probably you will be sick of it by then and you will find yet more errors.
mrmikel is offline   Reply With Quote
Advert
Old 05-13-2013, 06:55 PM   #6
rkomar
Wizard
rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.rkomar ought to be getting tired of karma fortunes by now.
 
Posts: 2,982
Karma: 18343081
Join Date: Oct 2010
Location: Sudbury, ON, Canada
Device: PRS-505, PB 902, PRS-T1, PB 623, PB 840, PB 633
A decent spellchecker helps a lot in finding OCR errors. You still have to go over everything afterwards, but it helps to get rid of most of the errors easily first so that you don't become numb from a flood of them and start missing them as you read.
rkomar is online now   Reply With Quote
Old 05-13-2013, 08:48 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Read https://wiki.mobileread.com/wiki/HowTo:_Create_an_eBook in our wiki.

Dale
DaleDe is offline   Reply With Quote
Old 05-20-2013, 09:51 PM   #8
Stephanos
Connoisseur
Stephanos doesn't litterStephanos doesn't litter
 
Posts: 62
Karma: 133
Join Date: Oct 2007
Location: Minnesota, USA
Device: Kobo Aura Edition 2
As you indicated, your pdf has a text layer from the OCR. One of the more tedious things is getting the text into the word processor without all of the headers, page numbers, etc. that you would rather not have in your ebook. I've found that it is easier to just copy and paste page by page into the word processor. One tip is to hold down the ALT key while you select the text on each page so that you don't get the undesired bits of text.

If you use MS Word, there is a a clipboard feature that will collect up to 24 pieces of text for pasting into Word. So you don't have to keep flipping back and forth between the PDF reader and the word processor.

When you get the text into the WP, it will probably have hard line breaks. This means you have to look at the orginal scan and add an extra paragraph marker at the end of each paragraph. Then, search for double paragraph marks and replace by some placer characters like "~!". Next search for all paragraph markers and replace with spaces. Then search for your placer characters and replace with paragraph markers.

Then you are ready to start the corrections, add back italics, bold, etc. and format to your liking.

Hope this will help you get started. Good luck.
Stephanos is offline   Reply With Quote
Old 05-21-2013, 06:28 PM   #9
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If it has columns you might do better selecting one column at a time in the pdf reader program. That will keep the columns right. This is time consuming.

Be aware the purpose of creating the text layer is often only for indexing, not to provide the text. For this reason, it is not so trustworthy.
mrmikel is offline   Reply With Quote
Old 05-24-2013, 01:08 AM   #10
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
i think im going to have to use the copy and pasting feature.

im not sure i understand the rest...but i guess ill think about that after i do the copy and pasting...

thank you all for your suggestions. keep them coming!

Quote:
Originally Posted by Stephanos View Post
As you indicated, your pdf has a text layer from the OCR. One of the more tedious things is getting the text into the word processor without all of the headers, page numbers, etc. that you would rather not have in your ebook. I've found that it is easier to just copy and paste page by page into the word processor. One tip is to hold down the ALT key while you select the text on each page so that you don't get the undesired bits of text.

If you use MS Word, there is a a clipboard feature that will collect up to 24 pieces of text for pasting into Word. So you don't have to keep flipping back and forth between the PDF reader and the word processor.

When you get the text into the WP, it will probably have hard line breaks. This means you have to look at the orginal scan and add an extra paragraph marker at the end of each paragraph. Then, search for double paragraph marks and replace by some placer characters like "~!". Next search for all paragraph markers and replace with spaces. Then search for your placer characters and replace with paragraph markers.

Then you are ready to start the corrections, add back italics, bold, etc. and format to your liking.

Hope this will help you get started. Good luck.
noork85 is offline   Reply With Quote
Old 05-24-2013, 08:02 AM   #11
mrmikel
Color me gone
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
What I mean by the rest is that the text layer which is put into some epubs is there only so they can be searchable and is done in a slapdash manner, since anything will do to make the file searchable. So it may be full of mistakes, missing sections, sections out of order. This is why generally if you have any choice, anything is better than a pdf as a source. But often you have no choice.

As has been pointed out before, you will have to proofread line by line to make sure that this has not happened. I have found this in the book I have working on now where an entire paragraph was jumbled...it was all there, but required 5-10 text moves to put it in order.
mrmikel is offline   Reply With Quote
Old 06-03-2013, 06:21 PM   #12
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
The problem may be your OCR software. I bought ABBYY Fineware for about $100 and scanned a novel into Word 2000 in fairly large gulps, say 10-20 pages at a time. To the best of my recollection, Fineware simply ignored the headers, page numbers, etc.

The biggest problem I had was the combination UM in lowercase. I can't remember how Fineware translated it, but it had nothing to do with UM. Since the novel was about ski-bums, I finally had to do a search & replace to fix them all.
Notjohn is offline   Reply With Quote
Old 06-07-2013, 12:56 AM   #13
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
Quote:
Originally Posted by Notjohn View Post
The problem may be your OCR software. I bought ABBYY Fineware for about $100 and scanned a novel into Word 2000 in fairly large gulps, say 10-20 pages at a time. To the best of my recollection, Fineware simply ignored the headers, page numbers, etc.

The biggest problem I had was the combination UM in lowercase. I can't remember how Fineware translated it, but it had nothing to do with UM. Since the novel was about ski-bums, I finally had to do a search & replace to fix them all.

the thing is, when i copy and paste into word, everything gets pasted to the side, in short lines. i dont know if im explaining properly. i have to basically press backspace to fill up the line. its very tedious to do so for a whole novel. maybe theres a shortcut, i dont know. im a novice even when it comes to microsoft.
noork85 is offline   Reply With Quote
Old 06-07-2013, 02:05 AM   #14
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You can use some intelligent S&R actions. You could also try my add-in, that might help you out.
Also, don't copy paste. Just save as docx from ABBYY (workable copy) and load that into Word.
Toxaris is offline   Reply With Quote
Old 06-07-2013, 08:21 PM   #15
noork85
Junior Member
noork85 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Sep 2012
Device: iPad 4
Quote:
Originally Posted by Toxaris View Post
You can use some intelligent S&R actions. You could also try my add-in, that might help you out.
Also, don't copy paste. Just save as docx from ABBYY (workable copy) and load that into Word.
whats intelligent S&R actions?

and ill get abby today....and try it out...btw, which one should i get? there seem to be multiple abbyy products

Last edited by noork85; 06-07-2013 at 08:24 PM.
noork85 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
sending scanned book to Kindle for conversion? Georgia Swan Workshop 0 07-31-2011 10:52 AM
scanned book to epub langmarp General Discussions 3 06-28-2010 08:44 AM
Scanned in book only works sideways, or upside down PGA Workshop 2 03-12-2010 03:01 PM
Scanned book conversion jabberwock_11 Calibre 2 01-25-2010 03:37 AM
Google Book Settlement Site Is Up; Paying Authors $60 Per Scanned Book yagiz News 8 04-26-2009 01:43 AM


All times are GMT -4. The time now is 08:00 PM.


MobileRead.com is a privately owned, operated and funded community.