Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 03-11-2014, 01:41 AM   #1
marst
Connoisseur
marst began at the beginning.
 
marst's Avatar
 
Posts: 72
Karma: 10
Join Date: Mar 2011
Location: Left coast, USA
Device: Kobo Forma; Android tablet w/Mantano Reader
AZW1/HTMLZ-related errors

This newbie's attempts to convert an AZW1 file to EPUB isn't working out so well. Or rather, though the conversion works ok, what happens to the book just after it's imported into the calibre library isn't so good -- and the errors are passed along into the EPUB file. After doing a few searches here for "AZW1" I have the feeling that the format is troublesome in the best of circumstances, but perhaps I've misread...

Opened in calibre, the book appears not as AZW or AZW1 but as HTMLZ. I can convert it successfully to EPUB format, but the problems have already occurred. Examples:

Some numerals are coming out incorrectly. Text such as Essay 10 comes out as Essay io. The text August 22 and 29, 2005 is rendered as August 22 and zz, zoos. All italics are lost. Those are the errors I spotted just within the first couple of pages. There must be a ton of errors throughout the book.

Is there something I might do to improve this? Thanks in advance...
marst is offline   Reply With Quote
Old 03-11-2014, 01:49 AM   #2
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
.azw1 is the Topaz format. From the Wiki: Topaz:

Quote:
Most agree that Topaz is a collection of glyphs arranged on pages, along with an unproofed OCR text version. It is used to make older books and foreign language available quickly, since conversion is essentially automatic from scans of the pages of a book, but it reflows very well.
Those problems are very much to be expected, and you will have to proofread the book yourself.
eschwartz is offline   Reply With Quote
Old 03-11-2014, 04:22 AM   #3
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by marst View Post
After doing a few searches here for "AZW1" I have the feeling that the format is troublesome in the best of circumstances, but perhaps I've misread...
The azw1 format or topaz is an amazing format used by Amazon for automatically scanning in old books. When read by a Kindle app or device the result is usually an excellent representation of the original book.

You can read up on the creation of topaz from the person (screen-name: Fluffy) who created it in posts 800-812 and on his blog, archived here. Very interesting read.

Quote:
Originally Posted by marst View Post
Opened in calibre, the book appears not as AZW or AZW1 but as HTMLZ.
I'm sure the plugin you got from Apprentice Alf packaged it this way when you added it to calibre.

Quote:
Originally Posted by marst View Post
I can convert it successfully to EPUB format, but the problems have already occurred. Examples:

Some numerals are coming out incorrectly. Text such as Essay 10 comes out as Essay io. The text August 22 and 29, 2005 is rendered as August 22 and zz, zoos. All italics are lost. Those are the errors I spotted just within the first couple of pages. There must be a ton of errors throughout the book.
The errors you are seeing is the result of the OCR scan of the book when it was created. Since the book is essentially a series of glyphs or images, which when viewed by Amazon devices or apps looks very close to the original, the only thing the OCR text was used for (according to its creator) was search functionality. Amazon made no attempt to proof any of this text. The text from some topaz books is great and the text from others is chock-full of errors.

Quote:
Originally Posted by marst View Post
Is there something I might do to improve this? Thanks in advance...
As already suggested, if you want an epub, your only choice is to manually proof the book yourself.

Your other choices are to read them on your Kindle or Kindle App, find the stand alone tools and read the extracted glyphs in their original book form via Firefox, or convert those glyphs to PDF.

Last edited by DoctorOhh; 03-11-2014 at 04:40 AM.
DoctorOhh is offline   Reply With Quote
Old 03-12-2014, 03:09 AM   #4
marst
Connoisseur
marst began at the beginning.
 
marst's Avatar
 
Posts: 72
Karma: 10
Join Date: Mar 2011
Location: Left coast, USA
Device: Kobo Forma; Android tablet w/Mantano Reader
Thanks very much for these replies. I might well just go ahead and proof it myself.

I made the time-wasting mistake of trying to cope with the book's iBooks version (which is EPUB, but what a bizarre way Apple has of putting that together) -- and gave up, the reason having to do with that "protection"-related topic we are discouraged from discussing here. It seemed a punishing prospect...

It's strange that the publisher opted for both Kindle and iBooks format, but didn't release an EPUB file of the Barnes and Noble variety. (Those, I can open in my Android e-reader directly following a short login procedure.)

I've had a few Kindle books that must have been done using this Topaz format -- judging by the above descriptions. One book had so many bizarre and ghastly typos (clearly OCR-related) that it was painful to read. Amazon, to its credit, was willing to take the thing back and refund the purchase price. I was a typesetter, back in the days when dinosaurs roamed the earth. We were required to proof our stuff before sending it out the door -- before the customer's own proofreaders started in on it. Standards appear to have changed...

Thanks again.
marst is offline   Reply With Quote
Old 03-12-2014, 05:41 AM   #5
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by marst View Post
I've had a few Kindle books that must have been done using this Topaz format -- judging by the above descriptions. One book had so many bizarre and ghastly typos (clearly OCR-related) that it was painful to read. Amazon, to its credit, was willing to take the thing back and refund the purchase price. I was a typesetter, back in the days when dinosaurs roamed the earth. We were required to proof our stuff before sending it out the door -- before the customer's own proofreaders started in on it. Standards appear to have changed...
I'm glad Amazon refunded your purchase cost, but just to be clear the topaz format when viewed using a Kindle will not show any of the OCR flaws we're discussing. Since it is imaged based it will appear almost exactly as the originally published book from which it was scanned. The OCR inside of topaz books is not reading but to aid in searching the book by keyword.

Additionally if you used the stand alone "tools" you could view a perfect copy of the original book in Firefox which would assist you greatly in proofing the book.
DoctorOhh is offline   Reply With Quote
Old 03-12-2014, 03:25 PM   #6
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Indeed, if you saw errors, it will be because the publishers use OCR to create a regular Kindle mobi7/8-formatted book. It happens a lot with backlisted titles, because they don't bother to proofread, which truly is a travesty.
eschwartz is offline   Reply With Quote
Old 03-13-2014, 04:46 AM   #7
marst
Connoisseur
marst began at the beginning.
 
marst's Avatar
 
Posts: 72
Karma: 10
Join Date: Mar 2011
Location: Left coast, USA
Device: Kobo Forma; Android tablet w/Mantano Reader
Quote:
Originally Posted by DoctorOhh View Post
just to be clear the topaz format when viewed using a Kindle will not show any of the OCR flaws we're discussing.
As it happened, the errors appeared when I viewed the book on a Kindle 3.

Quote:
Additionally if you used the stand alone "tools" you could view a perfect copy of the original book in Firefox which would assist you greatly in proofing the book.
With this most recent book, it appears to be just fine on the Kindle 3 -- which would also help me proofread it.
marst is offline   Reply With Quote
Old 03-13-2014, 04:49 AM   #8
marst
Connoisseur
marst began at the beginning.
 
marst's Avatar
 
Posts: 72
Karma: 10
Join Date: Mar 2011
Location: Left coast, USA
Device: Kobo Forma; Android tablet w/Mantano Reader
Quote:
Originally Posted by eschwartz View Post
Indeed, if you saw errors, it will be because the publishers use OCR to create a regular Kindle mobi7/8-formatted book. It happens a lot with backlisted titles, because they don't bother to proofread, which truly is a travesty.
Failure to proofread seems to be all the rage these days with print books and e-books alike, and yes, it's a travesty. I've often wondered how that service is priced out. What does it cost a publishing company to have someone proofread, say, a 250-page book? Do proofreaders charge by the hour? Per 'x' words?

Publishers don't care to respond to e-mail messages pointing out when a book is riddled with typos. Well, I did get a reply once, which amounted to: Well, you know, we've been kind of busy. Not terribly convincing.

Last edited by marst; 03-13-2014 at 04:51 AM.
marst is offline   Reply With Quote
Old 03-13-2014, 05:15 AM   #9
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by marst View Post
As it happened, the errors appeared when I viewed the book on a Kindle 3.
This confirms that it wasn't a topaz formatted book, just very sloppy work by whoever created or scanned it.
DoctorOhh is offline   Reply With Quote
Reply

Tags
azw1, htmlz


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
0.9.36 related server errors khahoon Calibre 0 06-26-2013 03:26 PM
Is it possible to Add .azw1 files to calibre? AlexBell Calibre 8 05-13-2012 03:23 AM
Identifying Topaz/AZW1 files prior to purchase texasnightowl Amazon Kindle 1 09-05-2008 04:53 PM
AZW1 books wallcraft Amazon Kindle 12 08-27-2008 05:54 PM
Kindle .azw1 file bwit Amazon Kindle 4 08-05-2008 11:05 AM


All times are GMT -4. The time now is 08:03 AM.


MobileRead.com is a privately owned, operated and funded community.