![]() |
#16 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
The only thing I know for sure is that converting a file in calibre will not induce random spelling errors. True the encoding can cause problems but I've never mistaken bad encoding for spelling errors. I have had older versions of the DeDRM tools not fully or correctly remove the drm and the result was a book that had what looked like garbled text intermittently through the book. Updating the tools used corrected this problem. |
|
![]() |
![]() |
![]() |
#17 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
|
so a topaz book has 2 levels - the visible text - which is actually images - and a hidden version, for indexing/searches, which is the result of an OCR process applied to the images.
calibre viewer converts and displays the latter; Kindle for PC presumably displays the former ? and amazon don't tells us what format we're buying ? |
![]() |
![]() |
![]() |
#18 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
Yes, but only after a DeDRM tool/plugin converts the OCR portion to htmlz first. Yes, the azw you display in Kindle for PC displays the glyphs. Amazon tells you that your buying a drm book that will work on your Kindle or Kindle for ... application. If you download the sample of the book you can open it up in a text editor to see what it is under the covers. Below are the first lines of two purchased books viewed in a text editor.
TPZ0 = Topaz BOOKMOBI = Mobi Last edited by DoctorOhh; 09-17-2011 at 02:41 AM. |
|
![]() |
![]() |
![]() |
#19 |
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
|
Okay, I'm not at home at the moment to check the file headers, but why would a brand new just published book be in topaz format? It's unlikely. I'll check it out when I'm home.
|
![]() |
![]() |
![]() |
#20 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Topaz is a newer format than MOBI... Why wouldn't a new just published book use the latest ebook format Amazon is pushing?
|
![]() |
![]() |
![]() |
#21 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
If anything Topaz has been getting more popular as Amazon's success increases. edit: It probably also doesn't bother Amazon or the publisher that attempts to strip the DRM result in a sub-par user experience - e.g. this thread... Last edited by ldolse; 09-23-2011 at 12:27 PM. |
|
![]() |
![]() |
![]() |
#22 |
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Sep 2011
Device: Kindle 3G
|
Here is how I convert Topaz books to pdfs. It's much cleaner, and avoids the OCR which your drm tool is introducing.
https://www.mobileread.com/forums/sho...65#post1759765 |
![]() |
![]() |
![]() |
#23 | |
US Navy, Retired
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
|
Quote:
I know you are aware of this but for the record its not calibre's DRM tool or Mobileread's DRM tool so saying your drm tool is a little ambiguous. Also The DRM tool doesn't introduce OCR, the OCR data is part of the original Topaz book created by Amazon. That said, what is the final size of your converted PDF book using your method. Also is the Epub created through Sigil still in the 180meg range? |
|
![]() |
![]() |
![]() |
#24 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,837
Karma: 6120478
Join Date: Nov 2009
Device: many
|
Hi,
I tried his approach using prince but simplified it by slightly modifying the tools to *not* output the arrows and zoom info (and easy change btw) and then simply did the following: prince *.svg -o mybook.pdf The prince program will properly merge the pages into one pdf very nicely (so no need for a separate pdf merge program). I also cropped it with BRISS which works very nicely too. The problem is as you guessed ... the resulting file sizes. 1. The original topaz ebook was only 4.1 meg in size. 2. After unpacking, you can see the original xml files (text-based) and image folder and it takes up only 14.9 meg and after zipping just 5.8 meg. This is the raw xml (text) description of the ebook that the svg images are built from. 3. The folder of svg files and images was over 59 meg. If you zipped it up (and .svgz is an allowed format for svg files) you end up with 17.8 meg. Not too bad in comparison to the original 4.1 meg. The problem is in pdf form (after using prince and briss) the book required over 101 meg! So converting simple text based drawing commands into images and storing the images actually takes up much much more space than the text which describes how to draw the pages images themselves!! In addition, you lose all of the OCR information which means you can't search it, and of course as a set of images, it can not be reflowed. Too bad other ebook readers do not simply draw each page on the fly (pretty much what the Amazon e-reader does) from text based svg info. Or even better, if we could get the Calibre program to grok the text-based xml files, then no growth in file sizes would be necessary and the output (svg, versus ocr, versus pdf) could be generated directly from the true xml files that describe the ebook. It also gives you an appreciation of just how well designed the topaz format really is in comparison to the pdf format for e-book applications. KevinH Last edited by KevinH; 09-28-2011 at 11:55 AM. Reason: updated with more info and fixed typos |
![]() |
![]() |
![]() |
#25 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
There's definitely ways to get the pdf size down, but it probably requires using some other packages. Check this thread on diybookscanner.org:
My workflow for almost djvubind-equivalent PDFs... I think you could skip the OCR part in that workflow (or possibly leverage the existing ocr text somehow), but convert the SVG to Black & White TIFF, stick that in the pdf, and then run pdfsizeopt. I'm guessing this process would give you a 10-20 meg pdf. edit: didn't realize pdfsizeopt is linux/mac only... Last edited by ldolse; 09-29-2011 at 03:58 AM. |
![]() |
![]() |
![]() |
Tags |
calibre, epub conversion errors, mobi conversion |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Disable TOC for Mobi conversions | BRGriff | Conversion | 5 | 06-10-2011 05:21 PM |
Spelling errors and such | starrlamia | General Discussions | 29 | 11-29-2010 03:59 AM |
best program for correcting typos / spelling in epub & mobi books ? | cybmole | Calibre | 15 | 11-16-2010 06:22 AM |
Conversions from RTF (to mobi/epub) | Gwen Morse | Calibre | 6 | 10-14-2010 06:00 AM |
Conversion to Mobi to ePub errors | erik_reader | Conversion | 5 | 08-07-2010 02:03 AM |