.azw1 or .tpz, antiquated Topaz sample file

PoP · 10-11-2025, 01:59 PM

Searched to no avail for a sample ebook in Topaz format.

Is there anyone who could make one available? The content is not important, it's just for testing. The sample could be scrambled but would need to be DRM free.

(Alternately, a reference to an Amazon sample *in that format* that I could download and transfer to my Kindle Keyboard 3 would be adequate -- I couldn't locate one).

Quoth · 10-13-2025, 07:40 AM

Topaz is a special kind of OCR where instead of trying to convert to text, each glyph is stored and similar ones replaced by one in a table.
Unlike a scan encapuslated as a PDF, the table of glyphs is used to build the reflowable page. The advantage over full OCR is that if the matching is set so there are almost no false positives (which would be abysmal for regular OCR) there is no need for human proofing.

It will result in a huge table compared to proofed full OCR and a subsetted Unicode characterset, but will work with no training and any font/alphabet.

Quality is poor compared to human proofed OCR or a decent scan, but worked on the 167 dpi 6" viziplex 4 level screen on K1.

So the only viable conversion is to an image /TIFF/PDF image (calibre won't do it). Then the image might be OCRed.

The K 3 probably can read topaz.

I've no idea how the DRM worked. It may have used mobipocket or more likely Amazon mobi drm.

This (with loads of nasty 3rd party javascript) lists topaz titles. I suspect amazon has removed most now. https://www.kboards.com/threads/the-...az-format.171/

I doubt Calibre scramble plugin works with Topaz. Topaz has no text.

EDIT
see https://wiki.mobileread.com/wiki/Topaz

PoP · 10-13-2025, 08:42 AM

^ @Quoth Thanks for the research.

Quote:

Originally Posted by Quoth

Topaz is a special kind of OCR where instead of trying to convert to text, each glyph is stored and similar ones replaced by one in a table.
Unlike a scan encapuslated as a PDF, the table of glyphs is used to build the reflowable page. The advantage over full OCR is that if the matching is set so there are almost no false positives (which would be abysmal for regular OCR) there is no need for human proofing.

It will result in a huge table compared to proofed full OCR and a subsetted Unicode characterset, but will work with no training and any font/alphabet.

Quality is poor compared to human proofed OCR or a decent scan, but worked on the 167 dpi 6" viziplex 4 level screen on K1.

So the only viable conversion is to an image /TIFF/PDF image (calibre won't do it). Then the image might be OCRed.

Yep.

Quote:

Originally Posted by Quoth

The K 3 probably can read topaz.

Correct.

Quote:

Originally Posted by Quoth

I've no idea how the DRM worked. It may have used mobipocket or more likely Amazon mobi drm.

[EDIT 2025-10-13 14:05]
"The Tool" can apparently assuredly remove topaz drm, but the Calibre plugin then converts the file to HTLMZ and does not preserve the AZW1 format.

Quote:

Originally Posted by Quoth

I doubt Calibre scramble plugin works with Topaz. Topaz has no text.

Correct, the scrable plugin only accept AZW3 EPUB and KEPUB.

Quoth · 10-14-2025, 04:47 AM

I think that Amazon has removed most or all of the Topaz titles. There used to be some that were also "free".

I think it was a brilliant idea and maybe it could be done better today to convert PDFs or other fixed layout that's not comics to reflowable. Except you'd need a custom app, you wouldn't actually create a Topaz file. Such a custom app is obviously simple for iOS & Android but possible on Kobo without a jailbreak.

I've not seen Topaz, but from what I've read, it wasn't implemented well, or didn't work as good as I'd have expected.

DiapDealer · 10-14-2025, 03:23 PM

Quote:

Originally Posted by PoP

"The Tool" can apparently assuredly remove topaz drm, but the Calibre plugin then converts the file to HTLMZ and does not preserve the AZW1 format.

At the time, it was deemed rather pointless for the plugin to preserve a drm-free version of the original topaz format. Only Kindle devices could render them (and they could never be converted into other formats without the mess that resulted from the conversion to HTMLZ).

The plugin never even made an attempt to convert the topaz format itself. It wasn't at all feasible. It only used the underlying (and usually quite terrible and unformatted) original (sometimes OCRed) text that was only included to make it possible for the Kindle search function to still work.

I had several of them at one time. Most of them rendered quite beautifully on Kindles. You just couldn't really do anything with them other than read them.

Quoth · 10-14-2025, 03:38 PM

Quote:

Originally Posted by DiapDealer

I had several of them at one time. Most of them rendered quite beautifully on Kindles. You just couldn't really do anything with them other than read them.

Of course actually reading is the main point of ebooks on an ereader.

I'm glad to hear that most worked well on the Kindle. The idea behind it also seems to be part of DjVu (1998), though those are fixed layout and can convert to image based PDF (1992 and open standard since 2008).

It seems a shame it's gone as the idea of scanning pages and having reflowable content on a smaller screen with no human proofing seems useful.

Quoth · 10-14-2025, 03:55 PM

This is the bit where Topaz and DjVu are similar:

Quote:

The JB2 encoding method identifies nearly identical shapes on the page, such as multiple occurrences of a particular character in a given font, style, and size. It compresses the bitmap of each unique shape separately, and then encodes the locations where each shape appears on the page. Thus, instead of compressing a letter "e" in a given font multiple times, it compresses the letter "e" once (as a compressed bit image) and then records every place on the page it occurs.

https://en.wikipedia.org/wiki/DjVu

In contrast a scanned image in a PDF might simply be encapsulated TIFF.

Quote:

A PDF file is often a combination of vector graphics, text, and bitmap graphics. The basic types of content in a PDF are:

Typeset text stored as content streams (i.e., not encoded in plain text);
Vector graphics for illustrations and designs that consist of shapes and lines;
Raster graphics for photographs and other types of images; and
Other multimedia objects.

In later PDF revisions, a PDF document can also support links (inside document or web page), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded contents that can be handled using plug-ins.

PDF combines three technologies:

An equivalent subset of the PostScript page description programming language but in declarative form, for generating the layout and graphics.
A font-embedding/replacement system to allow fonts to travel with the documents.
A structured storage system to bundle these elements and any associated content into a single file, with data compression where appropriate.

https://en.wikipedia.org/wiki/PDF

DjVu beats PDF if the source is only scanned from paper, though like PDF it can have an OCR text layer to help search.

PDF beats DjVu for output from maths typesetting, vector art, wordprocessing DTP etc.

Both show a WYSIWG rendering of what might be printed on paper, and for DjVu the intention was a same size paper source. Normally the actual page size is pre-encoded into both.

Topaz takes the compression/encoding idea of DjVu (and the OCR overlay for search that scanned to PDF + OCR also has), but instead of replicating the original layout it reflows and re-paginates for the actual screen.

EDIT:
Also of course Topaz and DjVu the work is done by the creator's tools (the readers rendering is simple), whereas non-scanned PDF (with postscript from laTex, vector art etc), azw3/KF8, KFX, epub2 and especially epub3 (with javascript, reflowable and fixed layout) require more work. The mobi /KF7 is HTML3 and has no CSS, so is pretty simple to render, especially as it only has three font faces (serif, sans and monospace) each in normal, bold, italic and bold italic.

Quoth · 10-14-2025, 04:13 PM

I find only one mention of Topaz on Wikipedia

Quote:

Format support by device
Main article: Kindle File Format

The first Kindle could read unprotected Mobipocket files (MOBI, PRC), plain text files (TXT), Topaz format books (TPZ) and Amazon's AZW format.

The Kindle 2 added native PDF capability with the version 2.3 firmware upgrade.[28] The Kindle 1 could not read PDF files, but Amazon provides experimental conversion to the native AZW format,[29] with the caveat that not all PDFs may format correctly.[30] The Kindle 2 added the ability to play the Audible Enhanced (AAX) format. The Kindle 2 can also display HTML files.

The fourth and later generation Kindles, Touch, Paperwhite (all generations), Voyage and Oasis (all generations) can display AZW, AZW3, TXT, PDF, unprotected MOBI, and PRC files natively. HTML, DOC, DOCX, JPEG, GIF, PNG, and BMP are usable through Amazon's conversion service. The Keyboard, Touch, Oasis 2 & 3, Kindle 8 & 9, and Paperwhite 4 can also play Audible Enhanced (AA, AAX). All Kindle models from the Kindle Paperwhite 2 and newer can display KFX files natively. KFX is Amazon's successor to the AZW3 format.

Kindles cannot natively display EPUB files.

https://en.wikipedia.org/wiki/Amazon...port_by_device
The K3 also supports azw3/KF8 if the firmware is updated.
The international DXG (B009) actually was released after the B008 K3, but it's simply the gen 2 DX with the screen upgraded from Vizplex to Pearl, so it never got azw3/KF8 support.

Quoth · 10-14-2025, 04:24 PM

And of course the PDF spec later added the successor to DjVu's JBIG compression as an option, the JBIG2 compression. It can be x4 better than TIFF for 2 level images like text or fax.

However I think most scan to PDF software free on Windows doesn't use it, judging from the size.

If we were re-inventing Topaz, then JBIG2 is freely available.

PoP · 10-15-2025, 08:23 AM

@DiapDealer, history from the horse's mouth, thanks!

Quote:

Originally Posted by DiapDealer

At the time, it was deemed rather pointless for the plugin to preserve a drm-free version of the original topaz format. Only Kindle devices could render them (and they could never be converted into other formats without the mess that resulted from the conversion to HTMLZ).

I looked at the htmlz. I find the conversion of topaz images blocks to .svg quite ingenious and remarquable. Kudo for the "mess".

Quote:

Originally Posted by DiapDealer

The plugin never even made an attempt to convert the topaz format itself. It wasn't at all feasible. It only used the underlying (and usually quite terrible and unformatted) original (sometimes OCRed) text that was only included to make it possible for the Kindle search function to still work.

Too bad the plugin can't just deDRM and optionally skip the conversion. Good point, Topaz contained "hidden" text after all, to accomodate the search.

Quote:

Originally Posted by DiapDealer

I had several of them at one time. Most of them rendered quite beautifully on Kindles. You just couldn't really do anything with them other than read them.

I was given one by a friend, along with the serial number of is now defunct K3, the converted htmlz reads perfectly. I would have liked to compare on a real device, maybe test annotations, bookmarks, notes ...etc.

10-11-2025, 01:59 PM	#1
PoP curly᷂͓̫̙᷊̥̮̾ͯͤͭͬͦͨ ʎʌɹnɔ Posts: 3,025 Karma: 50506929 Join Date: Dec 2010 Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀" Device: K3₃.₄.₃ PW3&4₅.₁₃.₃	.azw1 or .tpz, antiquated Topaz sample file Searched to no avail for a sample ebook in Topaz format. Is there anyone who could make one available? The content is not important, it's just for testing. The sample could be scrambled but would need to be DRM free. (Alternately, a reference to an Amazon sample in that format that I could download and transfer to my Kindle Keyboard 3 would be adequate -- I couldn't locate one).

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
tpz file errors importing at command line but works in the GUI	hillcountryfare	Calibre	5	09-22-2023 11:04 AM
Fixing typos in Topaz file	mr ploppy	Amazon Kindle	2	05-23-2011 06:14 PM
Can Calibre convert .azw1 or .tpz to .mobi or .azw?	robcohen	Calibre	3	08-30-2009 05:17 PM
Identifying Topaz/AZW1 files prior to purchase	texasnightowl	Amazon Kindle	1	09-05-2008 04:53 PM
Kindle .azw1 file	bwit	Amazon Kindle	4	08-05-2008 11:05 AM

10-13-2025, 07:40 AM	#2
Quoth Still reading Posts: 15,590 Karma: 114630515 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	Topaz is a special kind of OCR where instead of trying to convert to text, each glyph is stored and similar ones replaced by one in a table. Unlike a scan encapuslated as a PDF, the table of glyphs is used to build the reflowable page. The advantage over full OCR is that if the matching is set so there are almost no false positives (which would be abysmal for regular OCR) there is no need for human proofing. It will result in a huge table compared to proofed full OCR and a subsetted Unicode characterset, but will work with no training and any font/alphabet. Quality is poor compared to human proofed OCR or a decent scan, but worked on the 167 dpi 6" viziplex 4 level screen on K1. So the only viable conversion is to an image /TIFF/PDF image (calibre won't do it). Then the image might be OCRed. The K 3 probably can read topaz. I've no idea how the DRM worked. It may have used mobipocket or more likely Amazon mobi drm. This (with loads of nasty 3rd party javascript) lists topaz titles. I suspect amazon has removed most now. https://www.kboards.com/threads/the-...az-format.171/ I doubt Calibre scramble plugin works with Topaz. Topaz has no text. EDIT see https://wiki.mobileread.com/wiki/Topaz

10-14-2025, 04:47 AM	#4
Quoth Still reading Posts: 15,590 Karma: 114630515 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	I think that Amazon has removed most or all of the Topaz titles. There used to be some that were also "free". I think it was a brilliant idea and maybe it could be done better today to convert PDFs or other fixed layout that's not comics to reflowable. Except you'd need a custom app, you wouldn't actually create a Topaz file. Such a custom app is obviously simple for iOS & Android but possible on Kobo without a jailbreak. I've not seen Topaz, but from what I've read, it wasn't implemented well, or didn't work as good as I'd have expected.

10-14-2025, 04:24 PM	#9
Quoth Still reading Posts: 15,590 Karma: 114630515 Join Date: Jun 2017 Location: Ireland Device: All 4 Kinds: epub eink, Kindle, android eink, NxtPaper	And of course the PDF spec later added the successor to DjVu's JBIG compression as an option, the JBIG2 compression. It can be x4 better than TIFF for 2 level images like text or fax. However I think most scan to PDF software free on Windows doesn't use it, judging from the size. If we were re-inventing Topaz, then JBIG2 is freely available.