Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : I want to digitize my paper books.


llwwss
01-11-2008, 02:25 AM
I have many thick academic books in paper form that i want to read anywhere.
Unfortunately, these books are not available in ebook forms, and i doubt it will be possible any time soon.
So i want to make my own ebooks out of paper books that i already have so that i can read them on the go and on the bed with ebook readers like prs505 or Cybook gen3.

As an individual, is it be possible to digitize books into ebook files
or should i contact a company which does book digitization?
If i can do this myself, what equipments do i need?

DMcCunney
01-11-2008, 02:51 AM
As an individual, is it be possible to digitize books into ebook files or should i contact a company which does book digitization?
If i can do this myself, what equipments do i need?Forget it.

It's very difficult and time consuming, even if you already have the required equipment and skill in using it. If you don't have the equipment or the skill, it will be close to impossible.

And a company that does this won't help you either. Their first question will be "Do you have the right to do this?". You don't. Someone else holds the rights and would have to approve it. They would be in violation of the law doing it without that approval, and they won't touch the job.

And even if you got the approval, it would be far more expensive than it was worth. I can easily see a charge of thousands of dollars per book.

If the books you want to read don't exist in electronic format, resign yourself to paper editions. Seriously.

(For an idea of what has to be done to make a paper book into an electronic version, visit the Distributed Proofreader's site, at http://www.pgdp.net/c/default.php . They do the proofing on the files that become Project Gutenberg titles.)
______
Dennis

slayda
01-11-2008, 07:56 AM
I have many thick academic books in paper form that i want to read anywhere.


Unfortunately, as stated, the typical academic book is quite difficult due to such things as images, figures, charts & equations, unless you can be satisfied with the PDF files that result from the scanning. These can be very large files, slow to read on the typical ebook reader as well as being too small to view adequately.

I have done volunteer work for Project Gutenberg and they don't even have the proof readers bother with these. They have what they call formatters do this work.

On the other hand, if it is purely text, then it is a job you can handle yourself with an appropriate scanner and OCR software plus some sort of editing software such as MS Word. It is still a lot of work. How much work partly depends on the equipment and SW and your experience.

vivaldirules
01-11-2008, 08:11 AM
I've tried doing a few chapters of a few books and it's really tough going. The best net rate that I could do was about one minute per page with the result being a PDF file that I had to run through PDFLRF to put on my Sony Reader. At that rate, I'd need to add a few decades to my life expectancy to do this for my library. Ain't happenin'.

recycledelectron
01-11-2008, 08:47 AM
Forget it.

It's very difficult and time consuming, even if you already have the required equipment and skill in using it. If you don't have the equipment or the skill, it will be close to impossible.

Nonsense. $60 and a weekend will get the first one, if you have a decent digital camera. A few hours per book after that.

Don't let these illegitimi carborundum you.

And a company that does this won't help you either. Their first question will be "Do you have the right to do this?". You don't. Someone else holds the rights and would have to approve it. They would be in violation of the law doing it without that approval, and they won't touch the job.

You are ASS-U-ME ing that the books have copyright notices that prevent them from being copied into digital form. That's absurd. Many books have copyright notices that allow you to convert them to another format for your personal use.

And even if you got the approval, it would be far more expensive than it was worth. I can easily see a charge of thousands of dollars per book.

LMAO! I rip several books a day, it's easy.

Here's the hardware you need:

(1) digital camera, preferably SLR, in the 5 MP or better range for most books. Academic books can be large (8.5" x 11" pages) with small text. In that setting, more MP is better. I'm currently using a ($80) 6.2MP Samsung S630, point-and-shoot. It works for text-bsed hard covers, but it sucks for huge, college math books. I'm saving for a 10-12MP DLSR.

(2) tripod (mine was $18.88 at Wal-Mart.)

(3) book cradle - search this site and Google - I got all my ideas from a few searches. Mine cost $40 in parts. See http://www.mobileread.com/forums/showthread.php?t=13848&highlight=cradle

Unfortunately, as stated, the typical academic book is quite difficult due to such things as images, figures, charts & equations, unless you can be satisfied with the PDF files that result from the scanning. These can be very large files, slow to read on the typical ebook reader as well as being too small to view adequately.


BS

The correct solution is to snap photos of the books, and use these photos (JPEGs.) Yes, a 500 page math book can take over a gig, but it's usable. I use a 2GB SD card in my PRS-505 to store a book, and flip through the images. As processing power increases, we'll be able to use them even more easily. As OCR software improves, we may (one day) be able to OCR the equations.

Andy

JSWolf
01-11-2008, 08:50 AM
Moved since this is a general purpose thread and not just for the 500/505.

vivaldirules
01-11-2008, 09:35 AM
Don't let these illegitimi carborundum you.

My apologies! I think your efforts are valiant, recycledelectron, but I don't think this is an activity for the faint at heart. If a 6.2 Mpixel image is not good enough for a textbook page, my heart flutters to imagine what is. And "paging" through a book by flipping between jpegs that total 1 Gbyte or more has me swooning. I'm glad this works for you but this won't for me - ever. I need a process that is a lot less intense and time-consuming.

I have images of you in a dark basement frantically turning pages. Camera flashes lighten the room every few seconds. The flipping of pages and the whir of fans from a couple of PCs accumulating the photos is the only sound. This goes on from early morning until late at night for days at a time. You've given up work and family and the only time anyone sees you is when the pizza guy shows up. He sees the evil grin on your sweating face as you tell him about how quickly you digitized the complete Oxford English dictionary last week. He happily leaves quickly with a small tip.

Sorry for having fun with you but I couldn't help it. I certainly hope the image I have is wrong!:)

dcalder
01-11-2008, 02:10 PM
If you're looking for "readable" rather than archive-quality, all you need is a decent scanner and VueScan. Heck, even "archive-quality" is do-able, as long as your original book isn't too fragile to handle being opened out flat on the scanner bed.

Using VueScan Professional and my beloved old AcerScan 610ST (with the Adaptec USBXchange SCSI-to-USB adapter), I've done a few doujinshi for scanlation projects and a few out-of-print fanzines for friends who had material published in them but couldn't afford the zine at the time (not all fanzines can afford to give free copies to contributors). A 100+ page fanzine/doujinshi scanned on the "magazine" setting ends up somewhere around 2GB as raw DNG files. Either save as more than one file-type during the scanning process or you can later point VueScan back at those DNG files and re-scan them to a multi-page TIFF or PDF. With a size-reduction setting of 3, you end up with a single file in the neighbourhood of 150+MB - averaging just over 1MB per page for text pages. I think that file with size-reduction setting of none ended up around 330+MB. Keep in mind, of course, that these are pure image PDFs, not text! VueScan can also output as JPEGs, with both the size-reduction setting and file compression setting being configurable.

VueScan can do OCR as well, so text PDFs should be possible, but I have yet to attempt it (though if I do decide to, I can work directly with the DNG files and don't need to re-scan the original). The image PDFs are more than clear enough for the purposes that they're being used for. I highly recommend that anyone who's ever cursed their scanning software take a good long look at VueScan. It's reasonably priced and infinitely better than any other scanner software that I've ever checked out. It can "scan" from disk, scanner, digital camera, etc., and can be set up to do automatic scans at regular intervals, batch scans, etc. A very useful, versatile 'tool' for anyone's software 'toolkit'.

Edit:

Just took a few minutes to toss one of the afore-mentioned scan-generated PDFs on my Cybook. Considering all the previous comments on the complete unsuitability of any ebook reader other than the iLiad for reading PDFs, I hadn't bothered before. But, in the interests of research, I thought it was worth a shot.

The book in question is 109 printed pages, mainly text but with a few drawings and comics; only a couple of pages are full-colour. The PDF file is 168MB and the Cybook handled it with ease. There was a slight delay in turning pages, but then, there's also a slight delay in paging through it on the computer (much like 90% of the PDFs I've ever viewed), so that's rather a moot point. I was able to read it in portrait mode fit to page (yes, really!) but then I run my computer monitor at a resolution that makes other people squint and reach for a magnifying glass, so... *shrug* In landscape mode, fit to width, it was perfectly readable for the average person - probably comparable to the text in the average mass market paperback. And the original of this is an 8 1/2" x 11" fanzine, with text in two columns. So, the answer to the question "is scanning a book as PDF for viewing on the Cybook possible" is definitely a resounding "yes" - at least for someone with reasonably good vision (I wear glasses for distances but not for reading and usually not for the computer monitor either).

I'd suggest, in future, that the more reasonable response to questions about PDFs in general on the Cybook be less of an immediate "no, they're not any good" because, frankly, I think that's a rather inaccurate answer and won't necessarily hold true for everyone. They're not necessarily unreadable, even if they haven't been optimized for viewing on such a small screen. If I were really planning to use this particular file on the Cybook, I'd probably run VueScan back through the raw DNG files, crop out the excess margins to improve display size, maybe play a bit with the file-size reduction settings in hopes of improving display speed, and then generate a new PDF, at which point I would probably be comfortable reading the whole thing in portrait mode.

Note: As far as scanning time goes, this particular 109 page zine took two-three hours to scan - in part because, while I was doing that on the desktop, I was playing a game and browsing the web on the laptop. Theoretically, it should be possible to get the scanning done much more quickly, if that was the only task being carried out.

HarryT
01-12-2008, 03:22 AM
Note: As far as scanning time goes, this particular 109 page zine took two-three hours to scan - in part because, while I was doing that on the desktop, I was playing a game and browsing the web on the laptop. Theoretically, it should be possible to get the scanning done much more quickly, if that was the only task being carried out.

If you're willing to destroy the book, removing the binding and using a scanner with a sheet feeder will get the job done in minutes. That's how DP get their page scans, I believe.

Sparrow
01-12-2008, 01:16 PM
I'd suggest, in future, that the more reasonable response to questions about PDFs in general on the Cybook be less of an immediate "no, they're not any good" because, frankly, I think that's a rather inaccurate answer and won't necessarily hold true for everyone. They're not necessarily unreadable, even if they haven't been optimized for viewing on such a small screen.

This is a good point :2thumbsup.
I've only recently tried PDFs on my CyBook because I'd seen the negative reports here - but was surprised to find that they're actually perfectly readable (for me - I'm nearsighted and can read PDFs on my CyBook without my specs).
I can appreciate some people might have problems; but everyone should see for themselves - they may be pleasantly surprised. :)

RWood
01-12-2008, 02:30 PM
I did some scan/ocr work for the Harvard Classics series. It is not hard. depending upon your ability at editing it can be a nightmare or something less. (Years of editing helped for me.)

The only way to know for yourself is to try it for yourself. Don't let any of us stand between you and your goal. We all have experience, but not your experience.

slayda
01-12-2008, 02:35 PM
Yes PDF "images" can be readable, depending on the original size. If scanned from a book with pages near the size of the Cybook screen then there should be no trouble reading it. However you will not have a book, only a series of pictures of pages stuck together. It will not reflow, you won't be able to use a dictionary on the images of the words, etc. In addition, most academic books have larger pages, some even larger than 8.5 x 11. Even with young eyes, this will be difficult reading.

What I spoke of is creating editable text from scanned books. And it is true that eventually we will have equation editors, etc. but we don't now, at least in general. (There are some very specialized equation editors.)

As a comparison, I recently scanned a paperback book with over 1000 pages. Scanned at 600DPI, it took almost an hour to scan. Then I spent about 4 hours cleaning up the OCR errors. This was a good quality printed book. Cheaper quality usually generates more errors.

This experience, including one that had a half dozen equations that I kept as JPEGs in the text, is what I based my previous statements on. BTW the scanned PDF file was about 122.6 MB but the final RTF was only about 3.4 MB. IMO a significant reduction.:bookworm:

-Thomas-
01-12-2008, 07:04 PM
I'm a student at a german university, and we have integrated copying and scanning devices all over the campus. With these devices you can scan your books very fast in a readable quality (even for figures) and send the resulting PDF format directly via email. Very comfortable! They even have those devices in the reference library :2thumbsup

I already scanned a 300 page paperback (-> 150 scans), it took about 15 minutes. Maybe you have something similar nearby?

Patricia
01-12-2008, 07:26 PM
We aren't allowed to do this in the UK. At my university there are signs above the photocopiers saying that we are only allowed to copy one chapter from a book for copyright reasons. And students aren't given access to scanners without the material being checked for copyright by a librarian.

tompe
01-12-2008, 07:47 PM
We aren't allowed to do this in the UK. At my university there are signs above the photocopiers saying that we are only allowed to copy one chapter from a book for copyright reasons.

Is this restriction for copying to yourself or copying to the class? Our rules are that a teacher can copy a whole book for himself. For the class you can copy a maximum of 15% and 15 pages and distribute in the class. If you want to copy more you have to ask for permission.

recycledelectron
01-13-2008, 12:04 AM
My apologies! I think your efforts are valiant, recycledelectron,

When I referred to illegetimi, I was referring to the copyright mafiAA. I hat it when someone is told they can not legally do something.

The only law should be to not deprive anyone of their life, liberty, or property, except in self defense or in the defense of an innocent person. Ripping a book that is not available as an eBooks is NOT wrong, as it does not deprive anyone of live, liberty, or property.

Telling someone to give up is very distasteful, as it discourages innovation. Innovation is what allows me to live in an air conditioned home, use PCs, and go hunting instead of getting eaten by big predators.

but I don't think this is an activity for the faint at heart. If a 6.2 Mpixel image is not good enough for a textbook page, my heart flutters to imagine what is.

Actually, the 6.2MP camera works fine when correctly focused, but the auto-focus causes me problems. It will get 2/3 of the page fine, but the print near the edge ia a problem when taking in a large page. Therefore, a 6MP DSLR or a 9MP point-and-shoot should work on the worst text books.

Digital cameras are dropping in price so fast, that if the camera's price fazes you, wait a semester and they will be cheaper.

And "paging" through a book by flipping between jpegs that total 1 Gbyte or more has me swooning. I'm glad this works for you but this won't for me - ever. I need a process that is a lot less intense and time-consuming.

I can photo 500 pages an hour. Then, they copy at a rate of several thousand pages and hour to my PC via USB from a card reader. During that time, I can rename them. This is necessary because I snap pics of the odd pages first, and then do the even pages. After I rename them with the page number as the name, they fall in alphabetical order.

It takes a day or two over a weekend to digitize all my text books for that semester, so count off maybe 2 weekends a year to relive myself of carrying a dozen text books at a time. Instead, I'm the one with the tiny notepad-sized case.

My personal library beats the university library, and fits in the passenger seat of my pickup.

As for the GB size, my PRS-505 changes to the next pic as quickly as it flips between pages in a PDF. The zoom works MUCH better on JPEGs than it does on PDFs. I like JPEGs better than PDFs on the PRS-505.

I have images of you in a dark basement frantically turning pages.

Good lighting is essential to good book ripping ;o)

Sorry for having fun with you but I couldn't help it. I certainly hope the image I have is wrong!:)

You are very wrong. I spend 2 weekends a year digitizing my text books, and am the only person on the faculty who does not drag home massive bags of books. I grab my eBook reader, and a note pad in a small case, and go with that. I've got everything I need right there.

Last semester, during finals week, a student walked up to me while I was eating lunch on campus and asked if I had graded his paper. I had previously scanned it and the other papers with an ADF, and saved it on a SD card. While he watched, I dropped the right card in my eBook, graded it, and recorded the grade on a note pad to mark in my online grade book later. He was astounded that I had everything right there in a 1-pound package.

Andy

P.S.

Most people don't read or study.

What would happen if you always had access to every book ever written, and could instantly switch from reading to listening to the audio book at that exact word? (When you get bored, when your eye get tired, or when you have to drive somewhere, you switch seamlessly.)

Could a bright, self-motivated kid get an education in the world's least competent school?

What if that reader did not depend on any outside technology? (i.e., it was solar powered, and rugged like a tennis shoe.)

Think of the regimes that have burned books. Could a government keep its people ignorant?

What would happen to the self reliance of individuals, when they can bring up a manual on auto repair on the side of the road?

How much better off would a patient be, if they could pull up a beginner's medical text when trying to understand a life-changing diagnosis? I've driven to the hospital, and would have liked to find the passage, then ask the eBook to read it to me.

vivaldirules
01-13-2008, 09:42 AM
Well, recycledelectron, I'm very impressed. My apologies, again! A day or two to do several textbooks might be acceptable even for me. Also, using JPEGs instead of PDFs put me off but I agree with you that the zooming and panning works fine and I wish Sony supported that for PDFs. But how do you deal with accessing page 123 and then flipping to page 812? Do you advance ten pages (images) at a time from the menu or do you use a hack? Also, I assume there's no linkable table of contents. Does that slow you down or do you have a solution for that, too?

shousa
01-19-2008, 07:16 AM
I have a number of books I am going to convert using recycledelectron's method of camera and tripod.

Any suggestions or tips recycledelectron over and above what you have written so far? eg how close should the camera be, you know the "finer" points.

Like the above question can you access page 300 then back to 200? (not that that would be a deal breaker for me, just wondering.

This seems good?
http://www.wikihow.com/Scan-a-Book-With-a-Digital-Camera.

jackbrown
01-22-2008, 12:10 PM
A cheap scanner at 300 dpi (black and white!) and software like Abbyy Finereader is all you need for this. Scanning, OCRing and PDFing a book takes a couple of hours. I do it all the time; you can read something else while you do it.

If you're going to use recycledelectron's method, try to figure out a way to quickly turn the images black and white (not grayscale!) as early in the process as possible, and turn the autofocus off; I used a setup like the one he describes for scanning a rare book, and took color pictures (big mistake); also didn't have good enough lighting for a really high contrast ratio. The resulting images basically sucked and I had a nightmarish time making the ebook. It'd be great if your camera could capture in black and white, but it almost certainly can't, so make sure you white balance it against a blank page in the room you are capturing in, then transform the captured files into bw before you OCR. Good luck, and like I said, I think a cheapo scanner is more practical, unless you need really large format captures.

philodox
01-22-2008, 01:14 PM
I've got a couple old books that are nearly falling apart... might be fun to try a scanner with auto feed. Destroying the books wouldn't be a problem at this point. Are there any decent and cheap ones that will take a scan of each side and keep the pages in the right order?

Once I have the images it would be easy enough [though perhaps time consuming] to reformat them as a PDF and use the built in OCR in Adobe Acrobat. Are there PDF to mobi convertors?

Even though each step may take a long time, if I can get a system working that only requires a small amount of user input between these large steps, it might be worth my while. :)

yvanleterrible
01-22-2008, 01:34 PM
I've got a couple old books that are nearly falling apart... might be fun to try a scanner with auto feed. Destroying the books wouldn't be a problem at this point. Are there any decent and cheap ones that will take a scan of each side and keep the pages in the right order?

Once I have the images it would be easy enough [though perhaps time consuming] to reformat them as a PDF and use the built in OCR in Adobe Acrobat. Are there PDF to mobi convertors?

Even though each step may take a long time, if I can get a system working that only requires a small amount of user input between these large steps, it might be worth my while. :)Tried that with a circa sixties book. The paper was so bad that the first page actually got shreaded in the scanner, causing a paper block and necessitating a dismanteling of the device to get at the pieces.
The software included with the machine can take care of the order the pages come out, provided you don't make mistakes in feeding.
Do you have Acrobat Pro? I didn't know it did OCR!?!

aru
01-22-2008, 02:44 PM
Don't forget the Plustek Opticbook 3600, which takes 10-20 sec per page, then if you want it to OCRs it for you. If not it still gets the orientation for even and odd pages right. It has a big button for the next page on the scanner itself, so you don't have to go back and forth to your computer. It scans paperbacks and bound books without problems due to the binding. You only have to open the book 90 degrees. This makes all the difference. In my opinion better than taking pictures with a SLR.
It takes me about an hour to get a reasonable sized book into my PC.

AnemicOak
01-22-2008, 06:01 PM
Here's an automatic book scanner made with legos...

http://www.geocities.jp/takascience/lego/fabs_en.html

slayda
01-22-2008, 06:23 PM
Are there any decent and cheap ones that will take a scan of each side and keep the pages in the right order?



Check out the Scansnap S510 by Fujitsu for a little over $400. (You can check it out on Amazon but won't get the best price there or try the Fujitsu site.). I have the S500. It works very well and comes with good software. Scans two sides at once & you can load up to 50 pages of 20# paper. The better the paper quality (and the larger) the better the final results. Can scan up to 1200DPI in B&W but I've found that 600 DPI is the best compromise between scan quality & speed.

When not in use it has a very small foot print. It is not TWAIN compliant. Output (as I use it) is searchable PDF. I use Nuance's PDF Converter Assistant to create a RTF file for editing.

The only problem I have had was with very poor paper quality in some cheap paperbacks. That resulted in multiple page feeds on a few occasions but mainly it had numerous OCR errors due to the ink bleeding during the printing process.

If you don't mind destroying the book (i.e. taking the pages apart), I highly recommend it.

Gideon
01-22-2008, 11:58 PM
Aru makes a great point, the OpticBook may be a bit of a unitasker, but it's brilliant for scanning books.

In preperation of getting my Sony Reader I went ahead and scanned one of my books. I used to do this when I had a tablet PC so I had some experience. Moving them from OCR'd PDF's to a text format was really where the hrad bit came in.

If you can afford to spend the money on the OpticBook (http://www.amazon.com/gp/redirect.html?ie=UTF8&location=http%3A%2F%2Fwww.amazon.com%2FPlustek-Opticbook-3600-Scanner-Conversion%2Fdp%2FB00065KA72%3Fie%3DUTF8%26s%3Dele ctronics%26qid%3D1200931677%26sr%3D8-2&tag=cityofdoors-20&linkCode=ur2&camp=1789&creative=9325) (a bit under 300 at Amazon, I believe) it is the single best investment you can make in this area - you can, as someone mentioned, scan very quickly and watch a movie at the same time.

The next part is the OCR. This is where it gets tricky, as most OCR programs will absolutely make a wreck of things. I would use greyscale here, btw... in my experience, it comes out better than black and white. Your mileage may vary.

Depending on your platform, you'll have a few options available to you. Most the free ones I've tried are crap. The one that comes with Adobe Acrobat is average, and the best I've used is OmniPage Pro (but hard to get a hold of for an individual, very expensive. Maybe your school or business has it.) Once you OCR it into text the laborous process is going through and cleaning it all up.

The book I made took me about 5 hrs all around, I'd say - but this was a test run, and so there were lots of false starts. I imagine it'd take me about 2-3 hrs now, for an average sized book, and I'd call it worth it.

I plan on writing a tutorial about this once I nail down some fine points. In the meantime, I suggest you look here - it's aimed at Tablet PC users, but there is an enormous amount of useful material here on the subject.

OpticBook Tutorial (http://www.studenttabletpc.com/2005/01/opticbook_3600_and_scanning.html#more)(other methods are mentioned as well on other pages here)

aru
01-23-2008, 04:28 AM
Hi Gideon, my Opticbook 3600 came with a complete software suite including OCR, effectively a turnkey system including ABBYY finereader Sprint, Presto Page Mgr etc. After I installed the software, everything else was automated (except the proofreading :) ).

There is a post already that describes the scanner (which btw enticed me to buy it) http://www.mobileread.com/forums/showthread.php?t=9666&highlight=opticbook
You may want to build on that.

stxopher
01-23-2008, 08:11 AM
One thing to remember if you are looking at the Plustek scanners is not to confuse the Optibook with their new Book Reader. Looks exactly the same but there's a $300 price difference. If you didn't know there were two appliances with from the same company with the same case, photos and basic purpose (scanning books) you might freak slightly and stop looking.

The new Book Reader has a primary focus more on saving the pages as txt, PDFs, PDF text and audio files. (Yea, that last one was audio files. MP3 and WAVs to be precise.) It seems as if it were designed more for keeping the printed word readable for those of us with failing sight than the Optibooks mission was with the saving and shifting of printed information.

Between the two, the Optibook series is still the best bet for most of us scanning books. Its fairly fast, easy and simple at what needs to be done. Still, I sure would like to see the Book Reader in action. Ummmm, making my own audio books for the commute. (No, no, NO! Shut up, little voice in my head with no financial sense and a high gadget lust! Shut up! Need more coffee to drown out the voice!)

philodox
01-23-2008, 09:10 AM
The paper was so bad that the first page actually got shreaded in the scanner, causing a paper block and necessitating a dismanteling of the device to get at the pieces.Yikes, that is something to keep in mind then. :eek:Do you have Acrobat Pro? I didn't know it did OCR!?!I'm actually not sure the exact version that I have, but I can check when I'm at home. It does have OCR though, that I'm sure of.Don't forget the Plustek Opticbook 3600.Never heard of it, I'll do a search and see what I find. Thanks. :)Check out the Scansnap S510 by Fujitsu for a little over $400.Cool, I'll check that out. :cool:

Thanks for the info and tutorial for the Opticbook Gideon. ;)

Gideon
01-23-2008, 11:18 AM
Aru-
I forgot about the OCR support it came with. I always used Acrobat Reader so the only software I used was the actual scanning software. I may need to give it a go though, perhaps its better than OmniPage (And doesn't involve me hauling my stuff to someone with that program!)

snookums
01-29-2008, 01:05 AM
I hear a lot of people here saying that OCR isn't that good. I've found that OCR can be brilliant if you know what you are doing. I feel that OCR gets a bad rep because people don't realize the real magic is in the scanning.

Tip: Scan in RAW format. When you normally scan the data from the scanner is processed with your settings and excess data is discarded. RAW saves all of the data that the scanner gathered. Afterwards you can change settings and see what the result would have been if you had scanned with them. This is especially useful for the first few images where you are trying to find the ideal color balance.

Tip: Scan in Black and White and find the ideal color balance before starting. The color balance is very important. You don't want too much contrast from your scan because that will bring out speckles in the paper that will throw off the OCR software. This is counter-intuitive because you probably wanting to jack up the resolution and contrast to catch all of the detail in the book. Don't. Scan at 300 dpi and set the color or white balance so that you are only getting the text and not the texture of the page.

Tip: Make it straight. OCR software is built to handle horizontal lines of text. If there more than a moderate slant in the way that you were holding the page over the scanner, it will spit out garbled text. Some of the more expensive OCR softwares offer the ability to rotate text, but it's best just to hold the paper straight as possible when you are scanning. That can be harder than you think you are scanning a bound book.

mphuie
01-29-2008, 05:50 PM
As for the GB size, my PRS-505 changes to the next pic as quickly as it flips between pages in a PDF. The zoom works MUCH better on JPEGs than it does on PDFs. I like JPEGs better than PDFs on the PRS-505.


You don't even OCR the pictures, you actually view them on your Sony? It is even possible to read textbook sized pages scaled down an ebook screen? You'd have to manually zoom in and pan around to read anything :blink:

Execution sounds highly flawed.

Gladtobemom
02-05-2008, 11:38 PM
I've put about 30 of my technical references on my tablet PC.

We prepared a little room by installing two daylight ceiling fixtures (each with 4x4ft. daylight bulbs. Then DH put hooks on the ceiling and grommets on a king sized white sheet--and slung it up to tent under the lights.

He deconstructs the books for me by taking the spines off and trimming out the signatures and the sewing. He tries to cut the pages as close to the center of the book as possible.

Then I photograph them with my Pentax K100D (I bought this camera because it takes all my old pentax lenses).

DH and I can do the photography on a 1700 page text in about 8 hours. Yes it's time consuming. Then I make an html web page of them and turn them into a PDF or a Mobi book. IT works great.

I have all the texts I need for reference and teaching in my tablet PC.

I also have them in my little VAIO TR2A.

Total outlay in money, about 50$ for the fixtures and lightbulbs, maybe $10 for the hooks and grommets (had the sheet). The camera was about $500, but I bought it for other reasons.

It is an investment in time. I am NOT distributing these and I own multiple copies. One advantage, I took pictures of the ones with my notes in the margin and linked each page of the clean version with it's annotated version.

I've also put the 3 textbooks that I wrote on Mobi and freely offer the copies to students (after they've bought a copy) in class. I just note it on the copyright page of their copy.

Yep, I destroy the books, so far I've been keeping the copyright pages, pages 16, 99, and the cover. Just to prove that I "own" a legal copy.