|  08-04-2013, 11:19 PM | #1 | 
| Junior Member  Posts: 2 Karma: 10 Join Date: Aug 2013 Device: Nook | 
				
				Illegible EPUB Text
			 
			
			I have a problem where the text of some EPUBs are rendering incorrectly in Nook PC (as well as calibre E-book reader).  I have experienced the same issue with books downloaded from B&N as well as Gutenberg. As an example from the book "A Greek-English lexicon of the New Testament: being Grimm's Wilke's Clavis ..." From Content.opf: Book digitized by Google and uploaded to the Internet Archive by user tpb. From the HTML page metadata: <meta content="abbyy to epub tool, v0.2" name="generator"/> <meta content="application/xhtml+xml; charset=utf-8" http-equiv="Content-Type"/> Example rendering: Ιι 1ΐ35 κιίΓνϊνβϋ 1οη§ βηοιι^ ίοΓ Λε ςορ>τϊ§1ιΙ Ιο οχρϊτο 3ΐΐ(3 ΐΗο Ιχ)ο1; Ιο οηΙΟΓ ΐΗο ριιΒΠς »1οπΐ3Ϊη. Α B&N tells me that the file is corrupt, but I've seen this in many different EPUBs. It seems to be an issue with rendering Unicode characters (there is a mixture of Greek and English in the above example). Any ideas? | 
|   |   | 
|  08-05-2013, 01:05 AM | #2 | 
| Grand Sorcerer            Posts: 5,763 Karma: 24088559 Join Date: Dec 2010 Device: Kindle PW2 | 
			
			The garbage text is most likely caused by the automated OCR of non-Latin text. I'd recommend downloading this very similar PG Greek English NT lexicon instead. (In order to read this book on your Nook, you'll most likely have to embed a Greek font, e.g. Galatia SIL, which you can embed automatically with Calibre or manually with Sigil.) | 
|   |   | 
|  08-06-2013, 12:01 AM | #3 | 
| Junior Member  Posts: 2 Karma: 10 Join Date: Aug 2013 Device: Nook | 
			
			Thank you Doitsu.  Your answer is the first meaningful answer I have been given.  It is clear that you took great care to understand my problem. I believe you are correct about the cause of the garbage text in the Ebooks, since Google has removed them from the Google Books site. I previously had downloaded your suggestion from PG and the installation of Galatia SIL fixed the fonts issue I was having. However, there are still problems with line feeds, so I may just convert the UTF-8 version. Thanks again. | 
|   |   | 
|  08-06-2013, 05:27 AM | #4 | |
| Wizard            Posts: 2,306 Karma: 13057279 Join Date: Jul 2012 Device: Kobo Forma, Nook | 
			
			As Doitsu mentioned, if the text is outside of the Latin character set, it is most likely to be a much lower quality OCR. Quote: 
 Then you take into account markings/scanning artifacts/water damage/aging of the book, and the automatic OCR becomes even worse. Images -> Text is an incredibly hard area to get algorithms to do correctly without lots of human assistance. Project Gutenberg books are fed through multiple rounds of human assisted checking/editing, to try to get as accurate a conversion as possible. So if possible, try to look to Project Gutenberg first. A lot more information on Project Gutenberg's process can be found here: http://www.pgdp.net/c/faq/ProoferFAQ.php Last edited by Tex2002ans; 08-06-2013 at 05:33 AM. | |
|   |   | 
|  | 
| Thread Tools | Search this Thread | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Italics in Epub text | Jamestoo | Sigil | 7 | 11-09-2011 03:16 AM | 
| Text --> ePub | rpmazur | Conversion | 6 | 10-19-2011 07:23 AM | 
| Center align text in epub | virtual_ink | ePub | 23 | 08-31-2011 06:27 AM | 
| EPUB Overlapping Text - Please Help | coaver | Calibre | 16 | 07-27-2010 12:40 AM | 
| Justified text in ePub? | kiwik | ePub | 5 | 03-07-2009 02:35 PM |