Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > Miscellaneous > Lounge

Notices

Reply
 
Thread Tools Search this Thread
Old 08-30-2008, 12:48 PM   #16
desertgrandma
Enjoying the show....
desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.desertgrandma ought to be getting tired of karma fortunes by now.
 
desertgrandma's Avatar
 
Posts: 14,270
Karma: 10462841
Join Date: Jun 2008
Location: Arizona
Device: A K1, Kindle Paperwhite, an Ipod, IPad2, Iphone, an Ipad Mini & macAir
Quote:
Originally Posted by DaleDe View Post
There is quite a bit of information on the wiki. It is hard to specify step by step because it is so dependent on the source and user methods. However I may create a simplified wiki how-to page if you think it will help. There are pages on on to use Book designer, OCR, and many other things.
Thank you sir, may I have some more please.......
desertgrandma is offline   Reply With Quote
Old 08-30-2008, 02:55 PM   #17
RWood
Technogeezer
RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.RWood ought to be getting tired of karma fortunes by now.
 
RWood's Avatar
 
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
Elsi, as Dale points out there are many possible methods to accomplish the goal of getting from a pbook/image file to a finished file ready for conversion to an ebook.

I just looked at the Wiki thread on Digitizing Paper Books to Ebooks and it is a good start. I will augment it with some of the techniques and tools that I have used.

Some books scan so cleanly that there is very little touch-up required after the OCR conversion. Others are in such bad shape that it is easier to retype the book than it is to clean up the OCRed text. With very few exceptions the books I have converted this way used the basic US English language and character set -- no accented characters or extra "u"s in "color" for example.

It is a long weekend, I'll see what I can get finished then.
RWood is offline   Reply With Quote
Advert
Old 08-30-2008, 04:49 PM   #18
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
How I work

A quick summary of how I clean up the files from The Internet Archive or Google Books (or any PDF from an image file):

1. Go to the source file, say the Internet Archive. Download the text file and a PDF.
2. Try saving the PDF as a text file and see whether it is better than the downloaded text file. Usually there's not much difference.
3. Paste the text file into a doc.
4. Remove all headers and footers.
5. Run a heap of 'Find and Replace' commands to overcome common mistakes:
viz:
(a) find ' ?' replace with '?'.
(b) find ' !' replace with '!'
repeat for every punctuation mark--many of these have an unnecessary space in front of them.
(c) find 'hyphen space paragraph mark' replace with 'paragraph mark' to remove unnecessary hyphenation at the hard line breaks. It's best do check each one individually because some hyphens are meant to be there.
(d) find 'paragraph mark " space' replace with 'paragraph mark" ' --this is a common OCR error.
6. Run Stingo's macro.
7. Check every instance of 'space"space' and correct them.
8. Check every instance of 'space'space' and correct them.
9. If you like curly quotes then do a find and replace of " with " and ' with '. Then check each instance where they occur after a dash of any sort. Also check each instance of space' (because of contractions like 'em, 'tis, 'twas, etc).
10. Now open both the PDF and the Doc. Adjust the page sizes so that each takes up half the screen. Read them side by side. You will have to add dashes (OCR often misses them out) and italics. Focus mostly on these, and on obvious spelling errors. Also add any missing accents.
11. Now run a spell-check on the doc. Note all dubious cases and check them.
12. If the source was very poor then repeat step 10. I often check every instance of ' and ", because these often get missed out.
13. Get the Chapter headings centred and in Bold. Ditto the Author and Title.
14. Insert any pictures.
15. Move any footnotes to the end of sections, the end of the book or wherever you want them.

Now the text should be ready for either Book Designer or Calibre, or your favoured conversion program.
The good news is that a conversion in Book Designer can now be done in less than 5 minutes. The bad news is that you will have spent many hours tidying up the original text. (I really daren't count.)

Obviously, you can do this in stages, a few minutes at a time. Take notes, so that you know what you have already done.

I hope this helps.
Patricia is offline   Reply With Quote
Old 08-30-2008, 07:49 PM   #19
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Thanks Patricia. I will use this as a start for a wiki page.

Dale
DaleDe is offline   Reply With Quote
Old 08-30-2008, 08:35 PM   #20
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
I forgot. At some point check for common OCR errors 'lie' for 'he' etc.

Edited to add:

and run a search for all the numerals in turn. Zero and O are often confused.
And 1, I and l are often all over the place.
" can appear as 4 or 66.
Patricia is offline   Reply With Quote
Advert
Old 08-31-2008, 07:32 PM   #21
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Captured Patricia's discussion in the wiki as How-To: Create an eBook. It has been modified a bit and I expect it to grow as more contribute.

Dale
DaleDe is offline   Reply With Quote
Old 12-23-2009, 02:13 PM   #22
Badandy
Connoisseur
Badandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blueBadandy can differentiate black from dark navy blue
 
Badandy's Avatar
 
Posts: 71
Karma: 13366
Join Date: Dec 2008
Location: Terminus
Device: Kindle 3, iPhone
I know I'm resurrecting this thread from the dead but I wanted to get one thing straight:

It's better to download books from mobileread as opposed to books from PG or feedbooks because mobileread books have gone through one more pass of typo and error correction? If so, that's great!

Feedbooks and manybooks have mobi files for download as well so I'm wondering which are better to stick on my reader before I start reading them.

Last edited by Badandy; 12-23-2009 at 02:23 PM.
Badandy is offline   Reply With Quote
Old 12-23-2009, 04:08 PM   #23
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by Badandy View Post
I know I'm resurrecting this thread from the dead but I wanted to get one thing straight:

It's better to download books from mobileread as opposed to books from PG or feedbooks because mobileread books have gone through one more pass of typo and error correction? If so, that's great!

Feedbooks and manybooks have mobi files for download as well so I'm wondering which are better to stick on my reader before I start reading them.
It is almost always better to go with the handcrafted mobileRead eBooks as opposed to the machine converted ones. Occasionally someone will post a book that has not had that much review but most books from mobileread have been carefully handchecked and improved with better font, special characters, etc.

Dale
DaleDe is offline   Reply With Quote
Old 12-23-2009, 05:54 PM   #24
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
Quote:
Originally Posted by Badandy View Post
It's better to download books from mobileread as opposed to books from PG or feedbooks because mobileread books have gone through one more pass of typo and error correction? If so, that's great!
I do run a spellcheck on PG books. The main PG site is definitely improving. PG Canada usually does very good books. However, PG Australia often has errors.
When working from scans, I try my best to make a good version. If people PM me with corrections then I will do a revised version, though I've a small backlog at the moment. I will try and clear it soon.
Patricia is offline   Reply With Quote
Old 12-23-2009, 10:46 PM   #25
alecE
Evangelist
alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.alecE ought to be getting tired of karma fortunes by now.
 
alecE's Avatar
 
Posts: 412
Karma: 546196
Join Date: Mar 2009
Location: UK canal boat
Device: sony prs505, prs650, kobo Glo HD liseuses
Quote:
Originally Posted by vivaldirules View Post
...There are lots of things at Google Books, the Internet Archive, and elsewhere that are in the public domain but only in PDF format or some on-line viewable flip thing. I'd love to be able to seemlessly (hah!) select, copy, paste, and OCR the text from such images. But I don't know the best way to go and would hate to spend time and money on the wrong thing. Any advice?
This may be a very elementary response, but fwiw, I generally export PDF items to text via the File, Save As Text menu option.Then comes the tedium of eliminating page headings etc - I use NotetabLite to clean, remove hard line endings etc and sort out accents and diphthongs before transferring the text to Sigil. (Incidentally, depending on your intentions it may or may not be ethical to unlock secured PDFs so you can do this format shifting)
alecE is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
I present to you....the e-book uploaders of Mobileread.com nrapallo Upload Help 3 08-05-2009 01:57 PM
Top Uploaders' Early Ebook Uploads (Rank 01 to 18) nrapallo Lounge 4 04-21-2009 09:52 AM
Wanna Free Ebook? Our MR Uploaders! vivaldirules Upload Help 44 12-15-2008 07:21 AM


All times are GMT -4. The time now is 12:24 PM.


MobileRead.com is a privately owned, operated and funded community.