View Single Post
Old 10-05-2007, 05:07 PM   #14
ereszet
Zealot
ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.ereszet has a complete set of Star Wars action figures.
 
ereszet's Avatar
 
Posts: 118
Karma: 306
Join Date: Sep 2007
Device: Sony PRS-500 Archos 704 wifi
[QUOTE=Steve Jordan;103351
Excuse me... I thought we were talking about text. [/QUOTE]

Books come with images, photos and maps. A disadvantage of Gutenberg project is that it is limited to text only. I have a collection of thousands of pdf/djvu books and maps coming from free digital libraries that look exactly like originals. That is also what I do with my documents/ books/ business cards, magazines, newspaper clips, etc. by photoscanning. Then I have to process them to remove whatever is wrong due to my not taking proper care at the photoscanning stage and OCR them to index.

For your info: just one of my folders contains over 5 thousand documents with over 5 million word count. The size of the folder is 30 Gb and the size of the index is 500 Mb. In total my collection of indexed books is close to 100 Gb.

Text alone is too easy to scan or photocopy to worry about it too much. In practice there are no lighting problems, just a steady hand and a good focus.
ereszet is offline   Reply With Quote