Junior Member
Posts: 4
Karma: 10
Join Date: Feb 2011
Device: Sony PRS-505
|
Further thoughts on scanning
Further thoughts on book scanning.
It was Bob Russell's fine piece on the Optic Book 3600+ which got me started. Now I am trying to carry the flag.
Forgive me, but I don't have Bob's computing skills to present this in the same style as he.
I have a collection of pbooks which I decided to digitise (0r is that digitalise?) as an aid to exercise my brain on the path to old age. One gets fed up with crosswords and Sudoku.
I bought my 3600+ last year and to date have produced 38 ebooks, as ePub.
I have become unhappy with my output, as their appearance on my reader (PRS505) is not yet as good as the store-bought ebooks I have. (You have to have standards)
For some reason I wanted to justify my expenditure and thought I should produce 100 books each year and set about it. Now I have revised that aim to 60 quality books per year.
To achieve that quality I must learn more skills in the programs I am using to edit and finish my books. As I have no experience in the word-smithing professions, WordPad is as much as I ever needed. I will be producing a plea for help after finishing this.
The report!
Phase 1
I unpacked and set up the 3600 (Why does that remind me of a Harry Potter movie) according to Bob's instructions. The only difference , In my experience, was that the carriage lock/unlock was glaringly obvious. A spring-loaded peg within the base of the device IS the lock.
Peg in - unlocked, Peg out - locked. A slide on the base of the device can fix the lock on or fix the lock off.
Leave the peg to move as it may, the device will be unlocked when placed on a flat surface and the moving parts locked when it is lifted.
Sitting, staring at the scanner, I thought of all the information I had gathered while dredging the Forums to develop a plan of work.
My first thought was:- WHAT IS A BOOK?
My books are novels only! No pdfs, no images. no tables, the sort of book one picks up when passing through an airport and are good enough to keep and read again and again. No text books, I am beyond studying (If I want to know something important, I can ask my wife - she knows everything!)
What will be the end product?:- I decided on ePub, with Calibre one size fits all.
What approach to use?:- K.I.S.S., R.T.F.M., Practice,Practice,Practice.
What text editing program to use?:- Open Office is the best I can afford.
What shall I do next?:- Get stuck in!!
Phase 2 Scanning:-
Set up DigiBook according to Bob. Selected grey scale, page image, rotate on even numbers, all of that, booktitle.
I chose a large hardback novel 450+ pages (for an auspicious beginning) present page 1, press button, lift and turn book present page 2 -is upside down on platen- press button, zip, it appears on screen right way up. Great.
Carry on doing this for two and a half hours -easy? Not really
The blurb says No Spine Shadow, just lay it on the platen. not exactly true! One must hold the spine quite firmly - It becomes quite very tiring. I should have chosen a much smaller book to start with.
The worst type of book to scan is the omnibus type edition, very thick, tightly bound, narrow spine side margins. This needs a lot of push and shove. Relax and it will spring out!
Nevertheless I achieved the desired result.
Next step:- Click Transfer button.
The default Page Image is BMP, on anecdotal evidence I have chosen TIF as the means to carry on with the process.
DigiBook now converts BMP to TIF so that the OCR can take place using SprintExpress.
A small widow opens - Flashing to show progress of transfer. Halfway through, a Windows declaration indicates that there is trouble and DigiBook must close, which it does.
No intermediate 'save' steps!!!
All of this 21/2 hours of effort is held in RAM. Pfft it is gone.
Good Heavens I say, Heck I say, or words to that effect, disappointment reigns. I find out - the hard way that DigiBook is very good but a bit flaky.
One thought I had was that my computer was not robust enough. It is quite old, has only 1Gig of RAM which is cluttered up with - well - clutter. (Task Manager shows 380MB usage at idle)
Can 1Gig RAM hold 450+ Page Images in BMP and convert them to TIF Page Images?
As a result of this I now scan no more than 50 pages at a time and if I have been scanning photos prior to book scans, I start those with 20, 30, 50, 50... etc and get good results from this.
I find that I can average 200 pages per hour easily, I don't have to rush. I don't have to make it a chore.
The chores come later!
As that conversion is completed, another window opens to show the OCR progress.
This has a countdown, with which I can check that I have scanned all of the required pages.
I have a tendency to start reading the pages as they are shown on screen and often miss a turn or scan twice (I haven't read some of these books in years)
The OCR converts to WordPad RTF only - as a file BookTitle 0001, 0002, 0003 etc. 50 pages per file. In a folder one has previously chosen.
I always check each 50 in WordPad, it only takes a minute. so that I can correct before I carry on.
Some thoughts on this phase:-
DigiBook is the management program for scanning. One ends up with an RTF WordPad file.
Page Management Is By DigiBook
Abbyy Fine Reader is used as part of the process by DigiBook
It is a fait accompli.
One does text editing with a word processor of ones own choosing - NOT Abbyy Fine Reader.
All in all this is a very good and simple program for producing ebooks.
When I have completed all my books I can go back to store bought or rummage through second hand book stores to find 'out of print' stuff.
Phase 3 Compile the book
I always start with WordPad - select 0001, change title to the book title, numbering has been checked - check again(measure twice, cut once)
Reduce screen to half width. Alongside, open WordPad again with 0002. "select all' 'copy' and 'paste' to the bottom of Book Title, save and repeat with 0003, 0004 etc until complete.
I then gather all loose 000x's into a folder which can be dumped when the ebook is complete and back-up copies made.
Open Book title and tidy up. I use WordPad for this because of its simplicity, there is nothing extraneous.
Scroll through the book removing headers and footers, usually just page numbers, join top of page to bottom of previous page.
I have read on Forum that people use macros to do all of this. Well I know nothing of macros, I wouldn't recognise a macro if it bit me on the backside, but that is my cross to bear. As I said earlier I just get on with it.
Many of my books are from the 1940s and 1950s and so have very poor quality paper and ink, together with odd and crudely sized fonts, which, plus age give OCR a very hard time.
To correct these problems it is required to fire up Open Office to change font size and make "line size single' to suit that font.
Phase 4 The Editing
Now comes the chore!
I don't think that I can present a straight forward time line for the editing .
This is where my lack of experience of editing shows up.
I edited and learned how to do it at the same time. A mishmash of trial and error.
I fired up Open Office and opened Book Title, in ODT. Good grief look at all those wriggly red lines. Phew!
A careful perusal of the problems will show which one must correct and which one can disregard.
Many errors are those which OCR has mis-spelt, such as di for th. Thus die/the, dian/than.
These will be marked and obvious. (Unless this produces a properly spelt word!!)
These will be not marked or obvious, such as Mr Home/Mr Horne, which will be found during proof reading.
There will be many, maybe dozens or hundreds of the same error. These can be corrected by using "Find and replace' one must ensure that this is used in conjunction with "Whole words only' and “Match Case.”
Otherwise you will produce an equal number of errors which will not be high-lighted and must be searched for individually.
I believe that many errors which are blamed on the OCR program are really due to the quality of the source book. The difference of result between my old books and my newer books is huge, which simplifies the editing processes.
I hope no one will take the idea that I am 'teaching the Vicar to suck eggs' But there appears to be many others, who, like me are starting out on this road to ebooks. I wouldn't want them to make the same errors as I.
Other errors:-
Open Office has an American/English dictionary as default, I have English/English books and live in an Australian/English world. The dictionary is the minimum of staple words and all other words not in the dictionary are marked as wrong.
I call up English/UK dictionary but it flips back to default.
A few of my books are reprints of American authors of the 30s, such as Damon Runyon, James Thurber, Ogden Nash, Anita Loos and many more. Many of the words used in the 30s don't appear in modern American dictionary nor do those in an English/English book.
A short story by Milt Gross written in the dialect of New York's East Side (Noo Yoik aw'reddy) would have about 40% of its content appear in any dictionary.
But I digress.
If you are satisfied that the 'errors' are actually proper words, they can be ignored. They have are of no consequence in the conversion to ePub, Mobi etc. (Check! Is this statement correct?)
Personally I have been loading my dictionaries with everything I can, this has proved to speed my work later on, especially when scanning a series of books. Fewer high-lighted words to linger on!
In spite of my desire to achieve high quality, I cannot stop the occasional spelling error creeping into the finished item. So what!
In any event I am the only one going to read them. They are not for publication or dissemination.
I have found that proof reading for long periods spoils my enjoyment of leisure reading.
Too critical an eye picks up spelling and grammatical errors which I have missed previously.
eg "A lone sentry standing virgil at the graveside"
"The coding is secure, we have a new logarithm"
I smell Spell Checker! (and these by a very respected publisher, well, big! )
So, editing is a chore, I find that I go through a book many times to edit and correct and also try to maintain the flow, the look and the feel of the original work which makes for such enjoyable reading and is the reason I have kept them all these years to re-read them again and again.
Phase 5 The Layout
I really do not know to start with this. I have tried many layouts and the difference between the ebook on computer screen and the reader is quite large.
The main difference is, after all of these trials is – I want Justified page throughout but my reader (PRS 505) presents Left Alignment!
I may start rambling here. This is where my lack of experience is beginning to show!
It would seem to be that it does not matter what other attributes one requires, font, size etc. What 505 wants, 505 gives! ( Is this a valid statement?)
I have finalised on A4, default style, Arial or Times New Roman 12 font size. Everything else, justification etc, is applied as an attribute, not incorporated as a defined style.
Therefore, I believe is not incorporated in the final construct data carried to the reader.
If this is correct then I need to learn how to apply styles to my layouts and form a template to use as the basis of my layouts.
Thus the construct data of the template will carry through to the reader!
I am treating 'construct data' as I believe 'meta data' works.
Does this sound real to anyone?
If so, then I need to learn how to determine my layout and save it as a book template proper!
Try as I might I cannot do this properly with 'File', 'Templates', 'Save' buttons. (probably doing it all wrong)
Nevertheless, I have my ebooks and am carrying on with the remainder.
My next problem is :- many of them are collections of short stories and will need TOC's.
Try as I might cannot fathom out how to do those either.
I have tried many instruction sets to do this but they are too involved, assume that I know much about word processing or the program 'Word'. I need step by step (baby steps) instructions.
Experts on the Forum, you know who you are, please look critically at your postings. Most of you post welcome positive knowledge but it is not all there. You know what you are saying but gloss over many of the minor details because those details are so obvious to you, but it those details which are needed to fill the voids in my knowledge.
Just a thought! (This is probably my cry for help)
I am trying very hard to complete this project and do it correctly (as is my wont) but am starting to believe I have become an old fart.
I was discussing prostate problems with my GP and said to him “I have become a classic grandad”. “Define 'classic Grandad' “said he.
A classic Grandad is stooped, portly, jolly, silver haired, smells of pi**.
I was going to discuss copyright but I thought - Don't get me started!!!
By the way I can recommend the Optic Book 3600 (I believe now 3800) for any home. If you have 100+ pbooks to keep and carry into retirement. This is a reasonable investment and a worthwhile hobby
|