| 
			
			 | 
		#1 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 43 
				Karma: 28554 
				Join Date: Mar 2013 
				
				
				
				Device: Kindle Keyboard, KPW2 
				
				
				 | 
	
	
	
		
		
			
			 
				
				Converting a scanned book from 1DollarScan to ePub
			 
			
			
			Hello guys, 
		
	
		
		
			This is a sample page from one of the books scanned using 1DollarScan (600 dpi): https://www.dropbox.com/s/j18r16ed7t...0Page.pdf?dl=0 I was thinking of trying Custom Book Scanning for the following reasons: 1. They offer ePub/MOBI for $10 more. 2. Their PDF scan is supposedly 1200 dpi. I saw posts of users here trying to convert their PDF to ePub by first converting it to HTML by Abbyy Fine Reader. Here's that page converted to HTML (Please refer to attachment). 1. Based on the results, I feel that an ePub would be terrible for my book. 2. Also, I read that scanning with higher DPI hurts OCR. Is that true? The main usage of these eBooks are just for text searching. I would have hard copies of the same books. Would really appreciate any comments on this.. So sorry for the long post!  
		Last edited by adrenaline; 09-29-2014 at 02:48 AM.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#2 | 
| 
			
			
			
			 Resident Curmudgeon 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 80,782 
				Karma: 150249619 
				Join Date: Nov 2006 
				Location: Roslindale, Massachusetts 
				
				
				Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			No matter how you go from PDF > ePub, you have to A/B compare the PDF to the ePub. You have A/B compare every character, every space, every punctuation mark, EVERYTHING in order to make 100% sure your ePub has no errors added by the conversion. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	I've seen too many PDF > ePub conversion where you know the source was PDF and the errors are due to the conversion.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#3 | 
| 
			
			
			
			 Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 43 
				Karma: 28554 
				Join Date: Mar 2013 
				
				
				
				Device: Kindle Keyboard, KPW2 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Thanks a lot JSWolf. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	What do you think about the 1200 dpi scanning compared to 600? Thanks again.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#4 | |
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,358 
				Karma: 5766642 
				Join Date: Aug 2010 
				
				
				
				Device: Nook 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 Don't most book scanning services include an OCR option? It's a heck of a lot easier to covert a Word file to epub than PDF.  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#5 | 
| 
			
			
			
			 eBook Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,560 
				Karma: 93980341 
				Join Date: Nov 2006 
				Location: UK 
				
				
				Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Moved to the "Workshop" forum.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#6 | 
| 
			
			
			
			 Wizard 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,520 
				Karma: 121692313 
				Join Date: Oct 2009 
				Location: Heemskerk, NL 
				
				
				Device: PRS-T1, Kobo Touch, Kobo Aura 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Sorry, not completely true. For most books 300dpi would be sufficient, but it really depends on the source. I scan at 400dpi and get a lot less OCR errors, especially with older pockets and paperback. Everything over 600dpi would be overkill. Downside is the decrease in scanning speed. I find that 400dpi is a good tradeoff.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#7 | 
| 
			
			
			
			 I am what I am 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,625 
				Karma: 62235665 
				Join Date: Sep 2011 
				
				
				
				Device: iPad3, Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			Hi adrenaline  
		
	
		
		
		
		
		
		
		
		
		
		
	
	![]() I routinely buy out of print ebooks and send them for scanning. Based on my own experience: 1. No scanning service can convert a pdf into a decent epub/mobi. There are too many OCR errors in scanning to even consider this, so save the $10. 2. A pdf scan of 1200 dpi for an ebook is overkill and just produces a monstrously large file that will choke most programs. 3. I buy the books to read, so my workflow is to convert the pdf to html in Abbyy Finereader and then to convert the html to epub using Sigil. I have a pretty good idea of what to look for now, so the whole process is not that tedious and time consuming. My accuracy rate is about 95%, which is sufficient for me since I do the conversion for my own use only (I'd rather spend the time reading instead of comparing every single character). If you only need the ebooks to search text, would not a simple scan to pdf with OCR work? Why would you need to further convert them to epub/mobi?  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#8 | |
| 
			
			
			
			 eBook Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,560 
				Karma: 93980341 
				Join Date: Nov 2006 
				Location: UK 
				
				
				Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#9 | 
| 
			
			
			
			 I am what I am 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 6,625 
				Karma: 62235665 
				Join Date: Sep 2011 
				
				
				
				Device: iPad3, Voyage 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			1 character in 20? I think not. Anyway 95% was just a guess because I rarely have more than 20 errors in an entire book  
		
	
		
		
		
		
		
		
		
		
		
		
	
	  I just wanted to stress that I'm happy with the results (and time saved) not comparing character for character.
		 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#10 | |
| 
			
			
			
			 eBook Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,560 
				Karma: 93980341 
				Join Date: Nov 2006 
				Location: UK 
				
				
				Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
  .
		 | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#11 | 
| 
			
			
			
			 Color me gone 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089 
				Karma: 1445295 
				Join Date: Apr 2008 
				Location: Central Oregon Coast 
				
				
				Device: PRS-300 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			The main problem I see with super high dpi is that every smudge, dot, etc on the page becomes an character you have to get rid of.  The only benefit at all might be for pictures if there are lots of them and they are very high quality. I don't think this is the common situation for out of print books, and if you are talking about eink, a colossal waste of time since the resolution is so low.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#12 | 
| 
			
			
			
			 Fanatic 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 563 
				Karma: 403106 
				Join Date: Aug 2014 
				
				
				
				Device: PRS-T1 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I've seen a lot of scanned books in my life. 
		
	
		
		
		
		
		
		
		
		
		
		
	
	Frankly, I would rather type them by hand than to correct their spelling mistakes and/or paginations. I believe a lot of the people that answered are English natives. Well, any OCR software can be trained to recognize 26 letters, but to non-ASCII users (like Bangla above) the errors a ten fold increased. For diacritics, it even be that scanning errors (like random black dots) may create a new character. A good example of my opinion can be found in archive.org. Compare the PDF (scanned but a text layer) and the EPUB files.  | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#13 | |
| 
			
			
			
			 Bookmaker & Cat Slave 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,503 
				Karma: 158448243 
				Join Date: Apr 2010 
				Location: Phoenix, AZ 
				
				
				Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
 I certainly would not consider typing a book instead of scanning it. No offense, but I find the idea crazy. Take a high-quality scan, a good A/B, run it through Toxaris' program, and you have a very, very high quality starting place. The problem we see on these forums--all the time--is that nobody ever wants to do the "grunty" work of correcting the scanned material. Everybody wants a magic bullet. It doesn't exist. Hitch  | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#14 | 
| 
			
			
			
			 Color me gone 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,089 
				Karma: 1445295 
				Join Date: Apr 2008 
				Location: Central Oregon Coast 
				
				
				Device: PRS-300 
				
				
				 | 
	
	
	
		
		
		
		
		 
			
			I have seen it over and over again that a mistaken scan will produce perfectly plausible and grammatically correct, but wrong, output.
		 
		
	
		
		
		
		
		
		
		
		
		
		
	
	 | 
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
| 
			
			 | 
		#15 | |
| 
			
			
			
			 eBook Enthusiast 
			
			![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 85,560 
				Karma: 93980341 
				Join Date: Nov 2006 
				Location: UK 
				
				
				Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6 
				
				
				 | 
	
	
	
		
		
		
		
		 Quote: 
	
   ) is to do a word by word manual comparison of the original document with the OCR'd text. This is extremely labour-intensive: I've had years of practice at it, and I reckon I can proof-read around about 15 pages an hour with a typical novel, so that would be about 33h work for a 500-page book.
		 | 
|
| 
		 | 
	
	
	
		
		
		
		
			 
		
		
		
		
		
		
		
			
		
		
		
	 | 
![]()  | 
            
        
            
            
  | 
    
			 
			Similar Threads
		 | 
	||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Any ever use a book scanning service like 1dollarscan.com? | apastuszak | General Discussions | 6 | 06-22-2014 11:38 AM | 
| Converting large book from azw3 to epub failes | gameman | Conversion | 5 | 12-15-2013 10:10 AM | 
| truncation problem converting mobi book to epub | Joe9O | Conversion | 3 | 02-08-2013 11:40 AM | 
| Converting from a 1DollarScan pdf (saved as word doc) | BeccaPrice | Conversion | 4 | 01-07-2013 09:14 AM | 
| scanned book to epub | langmarp | General Discussions | 3 | 06-28-2010 09:44 AM |