Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-09-2012, 02:36 PM   #1
TechSarge
Junior Member
TechSarge began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2012
Location: Florida USA
Device: Kindle 4 SO (Died), Kindle Fire HD 7"
Best format for scanned books?

I'm a bit new to the ebook world. I'm trying to make a few ebooks from some obscure, out-of-print books. The original formatting runs the gamut: All text, mostly text with some B&W line art (a map, etc.), a book with greyscale photos and text, to a coffee table type book with lots of colour photos inside the text. The text is sometimes standard, single column; but there are at least two books with double column text and one with triple column! This is a nightmare.

The books were all scanned on a flatbed scanner a few years ago, saved to TIFF files. Some are single page, some are double page. I must say that some pages turned out OK, but some are horrible and require manual tweaking.

I discovered Scan Tailor a few days ago and have run it on a few books which are good examples of the headaches above. I LOVE Scan Tailor! What I had started out trying to do a page at a time in Photoshop 5 a few years ago with no experience ST did in minutes across an entire book.

So, few problems now, though.

ST's output TIFFS for the colour photo coffee table book was over 10 times the file size of the originals (orig. about 8 MB, output was ~80-100 MB or more, per file). Everything set to 600 dpi, as that was what they were scanned at, IIRC. I had to run them through PIXresizer to get them to a manageable size again. The B&W output files were wonderful, though.

Not sure where to go from here. I thought so long on how to get the photos retouched that I never considered what to do once they were done! The obvious thing to do is to PDF them at this point, but I'm unsure about that. I would really like to read some of these books on my Kindle 4, so immediately going to PDF now isn't the best option. Shall I OCR, and spend a month proofreading? I'm also very concerned about being able to use the new ebooks in the future with little to no additional manual labour done to them. I'm also trying to get this done as quickly as possible, with as few steps as possible, but with good quality - archive quality not necessary, but close to it is the goal.

I am using a Win7 PC to do this, with Scan Tailor "enhanced" 0.9.11pre, Adobe Acrobat 9 Pro Extended, and I have the latest Calibre 0.8.3x.
TechSarge is offline   Reply With Quote
Old 02-09-2012, 04:51 PM   #2
frostschutz
Linux User
frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.frostschutz ought to be getting tired of karma fortunes by now.
 
frostschutz's Avatar
 
Posts: 729
Karma: 2010145
Join Date: Sep 2010
Device: iriver Story HD
Quote:
Originally Posted by TechSarge View Post
Shall I OCR, and spend a month proofreading?
If you want the best possible result - yes.

But if the scans are very good and you have a decent OCR software (like ABBYY FineReader) you probably won't need all that much proof reading. If the scans are horrible, obtaining better scans might actually be less work than fixing the errors that are caused by bad quality images...

Also I proof read while I read the books. I have an old fashioned pencil and paper next to me and when I spot an error I just write it down. So next time I'm on the PC I can just correct those errors I found (and if its a recurring error, correct all others like it too, if I can find a search&replace regexp that works).

Only worthwhile if it's a book that you might read again after some time though.

For royalty free books you could just share and ask others to report errors they find back to you ...
frostschutz is offline   Reply With Quote
 
Enthusiast
Old 02-10-2012, 04:11 AM   #3
DSpider
Addict
DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.DSpider ought to be getting tired of karma fortunes by now.
 
DSpider's Avatar
 
Posts: 399
Karma: 326969
Join Date: Nov 2009
Location: Romania
Device: iPod touch 2G (16 GB)
Pencil and paper? My god... Use the reader's highlight function, for crying out loud!


Scan Tailor outputs uncompressed TIFFs and that's why they take up so much space. But don't resize them! Resizing can add antialiasing (blurriness), which doesn't compress well. If you run them through Acrobat (reduced PDF then optimized PDF) they will get compressed. Just be careful not to choose lossy compression because TIFF can support either lossless or lossy.

Quote:
I'm also very concerned about being able to use the new ebooks in the future with little to no additional manual labour done to them. I'm also trying to get this done as quickly as possible, with as few steps as possible, but with good quality - archive quality not necessary, but close to it is the goal.
Ha! Good luck with that. What I found is that not one format converts 100% accurately. We're in 2012 and you'd think HTML is probably the safest bet. But it's NOT the format itself, it's the software that reads and interprets it! For instance, Sigil has ePub validation built-in. Your ePub could pass and still look completely different on some e-readers! We're back in 1999 where you could look at the same website with a different browser and everything would look different.

Last edited by DSpider; 02-10-2012 at 06:16 AM.
DSpider is offline   Reply With Quote
Old 02-10-2012, 09:52 AM   #4
Keroberos
Zealot
Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.Keroberos can program the VCR without an owner's manual.
 
Keroberos's Avatar
 
Posts: 127
Karma: 194002
Join Date: Aug 2009
Device: Kobo Mini (4GB), Nook Classic wi-fi, iPod Touch (Bluefire Reader)
Quote:
Pencil and paper? My god... Use the reader's highlight function, for crying out loud!
Actually I've also found it to be much quicker to use pencil and paper when proofreading (don't need to mess around with clicking through menus to take notes or while making the corrections).

For archiving I would use XHTML. Yeah OCRing can be a pain, but it only takes me a couple of hours of work to get the OCR'd text fairly clean on the PC (would probably be less if I took the time to build a decent scanning cradle), and a quick read-through on my reader to catch the rest.
Keroberos is offline   Reply With Quote
Old 02-13-2012, 08:44 PM   #5
TechSarge
Junior Member
TechSarge began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Feb 2012
Location: Florida USA
Device: Kindle 4 SO (Died), Kindle Fire HD 7"
I was not aware about the issues with resizing the TIFFs. Thanks for the tip - I'll refrain in future, but what's done is now done, as I had to delete the large files due to hard drive space - or lack thereof.

Yeah, It's a all-new and improved "format wars" all over again. At least with the VHS/Beta war, the Netscape/IE war, the Blu-Ray/HD-DVD war, they were only a two player war. This one is multiplayer, and if you choose wrong you're screwed good.

I personally am starting to HATE PDF. It's SO easy to corrupt the files by accident, and they're unusable then. I have so many PDF's that are nothing but graphics and photos inside, they'll be a major pain to convert to anything readable.

I'd seriously consider archiving with XHTML, but I know less about that than other formats. There are so many HTML derivations today that I can't keep up. Would XHTML be able to handle the line art and other non-text problems? What do I use to create/edit/view XHTML files? Do these convert to ePub and Mobi well?
TechSarge is offline   Reply With Quote
Old 02-13-2012, 09:06 PM   #6
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,395
Karma: 4531756
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
ePub is nothing more than XHTML files, image files and some metadata inside a zip container. It is a good format for keeping your archives since it is compressed and contains all the needed items in one file. It can be edited by taking it apart of directly using a suitable ePub editing program. See the mobileread wiki for technical details on all things ebook.
DaleDe is online now   Reply With Quote
Old 02-13-2012, 09:22 PM   #7
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 9,395
Karma: 4531756
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2
By the way the format wars as pretty well settled to two basic formats, Amazon and everybody else. Basically Mobi vs. ePub with PDF still used when a fixed format is needed although several people are using ePub variants even for that. Of course there is still the problem with DRM as there is no standard way to do that, unlike video formats.
DaleDe is online now   Reply With Quote
Old 02-14-2012, 09:49 AM   #8
GeoffC
Chocolate Grasshopper ...
GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.GeoffC ought to be getting tired of karma fortunes by now.
 
GeoffC's Avatar
 
Posts: 26,895
Karma: 16968764
Join Date: Mar 2008
Location: Scotland
Device: Cybook Gen3 , Pocketbook 302 (Black) , Nexus 10: wife has PW
Of course, just scanning and saving as pdf files would save an awful lot of work :
GeoffC is offline   Reply With Quote
Old 02-25-2012, 07:57 PM   #9
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 308
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
Quote:
Originally Posted by DSpider View Post
Pencil and paper? My god... Use the reader's highlight function, for crying out loud!
I also use pencil and paper to write down the errors that I find. There are still many readers that don't have a highlight function.

My Bebook One is already a few years old and slow compared to current e-readers, but it is still good enough for reading.

By the way, many people are surprised when they see me writing down something from an ebook. Some even make fun of me, saying that I need as much paper as for a printed book.
NASCARaddicted is offline   Reply With Quote
Old 02-29-2012, 04:37 AM   #10
kbaerwald
Der Leser
kbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead communitykbaerwald has become a pillar of the MobileRead community
 
kbaerwald's Avatar
 
Posts: 249
Karma: 15682
Join Date: Apr 2009
Location: Germany
Device: Diverse
Depends on your objective:

- if you have an old (really old) book and want to conserve it this way -> tif or pdf with facsimile-like images embedded : readable on pc/tablet/10"ereader. Chance is good that you can still read it in a few years
- if you want to read immediately -> cleaning and fast screening for errors -> searchable pdf : readable on pc/tablet/10"ereader (if not to avoid - 6")
- if you are the "editor" type of person with a sense for aesthetics -> cleaning, thorough editing, layouting and publishing to epub or whatever : readable on all current ereaders. What happens in a few years?

This sequence also represents a scale of effort

Klaus
kbaerwald is offline   Reply With Quote
Old 03-03-2012, 09:26 AM   #11
NASCARaddicted
Addict
NASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and graceNASCARaddicted herds cats with both ease and grace
 
Posts: 308
Karma: 43106
Join Date: Apr 2009
Location: Germany
Device: BeBook One, Pocketbook Touch
Personally, I keep all the books that I scanned as xhtml. I think it is very unlikely that xhtml-format will cease to work from one day to another.
NASCARaddicted is offline   Reply With Quote
Old 03-04-2012, 06:37 AM   #12
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 61,012
Karma: 38191453
Join Date: Nov 2006
Location: UK
Device: Kindle PW2, iPad Retina Mini, iPhone 4, MS Surface Pro
Quote:
Originally Posted by NASCARaddicted View Post
Personally, I keep all the books that I scanned as xhtml. I think it is very unlikely that xhtml-format will cease to work from one day to another.
It's a very good idea to keep the original page scans - eg as PDF - so that they can be used for proofing against. OCR is far from perfect.
HarryT is offline   Reply With Quote
Old 03-04-2012, 04:06 PM   #13
jmaejr
Banned
jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.
 
Posts: 132
Karma: 566638
Join Date: Aug 2011
Location: Wouldn't you like to know.
Device: Sony PRS-350:Sony PRS-T1:Rooted Nook Tablet
Based on the consensus of this board, if you own the tree copy of the book it is okay to download a copy of the e-book...regardless of the source. So why go to the trouble of scanning, proofing, and formatting when the book is probably already out there somewhere? Just a question...
jmaejr is offline   Reply With Quote
Old 03-04-2012, 04:10 PM   #14
HarryT
eBook Enthusiast
HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.HarryT ought to be getting tired of karma fortunes by now.
 
HarryT's Avatar
 
Posts: 61,012
Karma: 38191453
Join Date: Nov 2006
Location: UK
Device: Kindle PW2, iPad Retina Mini, iPhone 4, MS Surface Pro
Quote:
Originally Posted by jmaejr View Post
Based on the consensus of this board, if you own the tree copy of the book it is okay to download a copy of the e-book...regardless of the source.
I really don't think you're right in saying that that's a "consensus"? Most people would, I think, call that "piracy". Owning a paper book certainly doesn't entitle you to a "free" eBook, any more than owning a hardback entitles you to a free paperback.
HarryT is offline   Reply With Quote
Old 03-04-2012, 07:56 PM   #15
jmaejr
Banned
jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.jmaejr ought to be getting tired of karma fortunes by now.
 
Posts: 132
Karma: 566638
Join Date: Aug 2011
Location: Wouldn't you like to know.
Device: Sony PRS-350:Sony PRS-T1:Rooted Nook Tablet
Quote:
Originally Posted by HarryT View Post
There would be only one set of circumstances in which I'd consider it ethically justifiable to pirate a book:

1. If no commercial eBook was available.

and:

2. I'd bought the paper book.

In those circumstances, I'd have no qualms about downloading a pirated eBook. However, if a commercial eBook did then become commercially available, I'd buy it.

So I'd have to answer "on occasions". Very rare occasions.


On one particular poll almost 65% of the people said they 'pirate' books they have in tree format currently.
Quote:
I knew someone who downloaded a book that they already owned in paper format. (Format-shifting.)
Almost 60% of the people said they would 'pirate' a book if it was not available in electronic format.
Quote:
I knew someone who downloaded a book because the book didn't legally exist in electronic form. (Unavailability.)
That SEEMS to be the consensus...at least the majority opinion here.

Those two reasons apply to the OP, they have a book that A) They own and B) it is not available in electronic format.

Using that as a basis for my question, I merely asked 'why go to the trouble of scanning, proofing, and formatting when the book is probably already out there somewhere?'
jmaejr is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Story HD and Google Books scanned free books wilsonch iRiver Story 8 12-14-2011 10:23 PM
Scanned books to Epub, best software? Student1 Workshop 4 02-27-2009 03:08 PM
Small scanned books Paul Moews iRex 22 02-05-2009 05:58 PM
Ok I have scanned pdf books....but DeathtoToasters Sony Reader 38 11-04-2008 07:51 PM
Scanned books - a rant FuzzyGamer Sony Reader 31 04-01-2008 03:39 PM


All times are GMT -4. The time now is 01:26 PM.


MobileRead.com is a privately owned, operated and funded community.