Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-17-2011, 12:55 AM   #16
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by ldolse View Post
Unless they changed the drm plugin to do something radically different it doesn't create a .mobi file from topaz books, it creates a .zip or .htmlz file, depending on the plugin version. That's why I suggested the OP check the edit metadata screen. So to view it in the Calibre viewer it would be converted from one of these to ePub to view it.
You are absolutely correct.

The only thing I know for sure is that converting a file in calibre will not induce random spelling errors. True the encoding can cause problems but I've never mistaken bad encoding for spelling errors.

I have had older versions of the DeDRM tools not fully or correctly remove the drm and the result was a book that had what looked like garbled text intermittently through the book. Updating the tools used corrected this problem.
DoctorOhh is offline   Reply With Quote
Old 09-17-2011, 02:11 AM   #17
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
so a topaz book has 2 levels - the visible text - which is actually images - and a hidden version, for indexing/searches, which is the result of an OCR process applied to the images.
calibre viewer converts and displays the latter; Kindle for PC presumably displays the former ?

and amazon don't tells us what format we're buying ?
cybmole is offline   Reply With Quote
Old 09-17-2011, 02:38 AM   #18
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by cybmole View Post
so a topaz book has 2 levels - the visible text - which is actually images - and a hidden version, for indexing/searches, which is the result of an OCR process applied to the images.
Correct. I link to the history of Topaz in this post.

Quote:
Originally Posted by cybmole View Post
calibre viewer converts and displays the latter;
Yes, but only after a DeDRM tool/plugin converts the OCR portion to htmlz first.

Quote:
Originally Posted by cybmole View Post
Kindle for PC presumably displays the former ?
Yes, the azw you display in Kindle for PC displays the glyphs.

Quote:
Originally Posted by cybmole View Post
and amazon don't tells us what format we're buying ?
Amazon tells you that your buying a drm book that will work on your Kindle or Kindle for ... application.

If you download the sample of the book you can open it up in a text editor to see what it is under the covers. Below are the first lines of two purchased books viewed in a text editor.
  • TPZ0 cdictšVFPcdkey3
  • Picking_Cotton KnoKnp BOOKMOBI
I've bolded the pertinent area that tells you which format the underlying book is created in.

TPZ0 = Topaz
BOOKMOBI = Mobi

Last edited by DoctorOhh; 09-17-2011 at 02:41 AM.
DoctorOhh is offline   Reply With Quote
Old 09-23-2011, 07:04 AM   #19
dawnybros
Junior Member
dawnybros began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2009
Device: Sony PRS600
Okay, I'm not at home at the moment to check the file headers, but why would a brand new just published book be in topaz format? It's unlikely. I'll check it out when I'm home.
dawnybros is offline   Reply With Quote
Old 09-23-2011, 07:07 AM   #20
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by dawnybros View Post
Okay, I'm not at home at the moment to check the file headers, but why would a brand new just published book be in topaz format? It's unlikely.
Topaz is a newer format than MOBI... Why wouldn't a new just published book use the latest ebook format Amazon is pushing?
user_none is offline   Reply With Quote
Old 09-23-2011, 12:20 PM   #21
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by dawnybros View Post
Okay, I'm not at home at the moment to check the file headers, but why would a brand new just published book be in topaz format? It's unlikely. I'll check it out when I'm home.
Topaz actually provides a publisher that started with a print book a faster avenue to market than Mobi. I see lots of 'new' ebooks on Amazon start as Topaz. I've seen a number of users report that they get converted to a proper mobi ebook months after the initial publishing, but you won't get the newer format unless you complain to Amazon.

If anything Topaz has been getting more popular as Amazon's success increases.

edit: It probably also doesn't bother Amazon or the publisher that attempts to strip the DRM result in a sub-par user experience - e.g. this thread...

Last edited by ldolse; 09-23-2011 at 12:27 PM.
ldolse is offline   Reply With Quote
Old 09-27-2011, 07:35 PM   #22
Fschumaur
Junior Member
Fschumaur began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2011
Device: Kindle 3G
Here is how I convert Topaz books to pdfs. It's much cleaner, and avoids the OCR which your drm tool is introducing.

https://www.mobileread.com/forums/sho...65#post1759765
Fschumaur is offline   Reply With Quote
Old 09-27-2011, 09:16 PM   #23
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by Fschumaur View Post
Here is how I convert Topaz books to pdfs. It's much cleaner, and avoids the OCR which your drm tool is introducing.

https://www.mobileread.com/forums/sho...65#post1759765
Thanks very much for the link.

I know you are aware of this but for the record its not calibre's DRM tool or Mobileread's DRM tool so saying your drm tool is a little ambiguous. Also The DRM tool doesn't introduce OCR, the OCR data is part of the original Topaz book created by Amazon.

That said, what is the final size of your converted PDF book using your method. Also is the Epub created through Sigil still in the 180meg range?
DoctorOhh is offline   Reply With Quote
Old 09-28-2011, 11:11 AM   #24
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 9,069
Karma: 6361556
Join Date: Nov 2009
Device: many
Hi,

I tried his approach using prince but simplified it by slightly modifying the tools to *not* output the arrows and zoom info (and easy change btw) and then simply did the following:

prince *.svg -o mybook.pdf

The prince program will properly merge the pages into one pdf very nicely (so no need for a separate pdf merge program). I also cropped it with BRISS which works very nicely too.

The problem is as you guessed ... the resulting file sizes.

1. The original topaz ebook was only 4.1 meg in size.

2. After unpacking, you can see the original xml files (text-based) and image folder and it takes up only 14.9 meg and after zipping just 5.8 meg. This is the raw xml (text) description of the ebook that the svg images are built from.

3. The folder of svg files and images was over 59 meg. If you zipped it up (and .svgz is an allowed format for svg files) you end up with 17.8 meg. Not too bad in comparison to the original 4.1 meg.

The problem is in pdf form (after using prince and briss) the book required over 101 meg!

So converting simple text based drawing commands into images and storing the images actually takes up much much more space than the text which describes how to draw the pages images themselves!! In addition, you lose all of the OCR information which means you can't search it, and of course as a set of images, it can not be reflowed.

Too bad other ebook readers do not simply draw each page on the fly (pretty much what the Amazon e-reader does) from text based svg info. Or even better, if we could get the Calibre program to grok the text-based xml files, then no growth in file sizes would be necessary and the output (svg, versus ocr, versus pdf) could be generated directly from the true xml files that describe the ebook.

It also gives you an appreciation of just how well designed the topaz format really is in comparison to the pdf format for e-book applications.

KevinH

Last edited by KevinH; 09-28-2011 at 11:55 AM. Reason: updated with more info and fixed typos
KevinH is offline   Reply With Quote
Old 09-29-2011, 03:55 AM   #25
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123457
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
There's definitely ways to get the pdf size down, but it probably requires using some other packages. Check this thread on diybookscanner.org:
My workflow for almost djvubind-equivalent PDFs...


I think you could skip the OCR part in that workflow (or possibly leverage the existing ocr text somehow), but convert the SVG to Black & White TIFF, stick that in the pdf, and then run pdfsizeopt.

I'm guessing this process would give you a 10-20 meg pdf. edit: didn't realize pdfsizeopt is linux/mac only...

Last edited by ldolse; 09-29-2011 at 03:58 AM.
ldolse is offline   Reply With Quote
Reply

Tags
calibre, epub conversion errors, mobi conversion

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Disable TOC for Mobi conversions BRGriff Conversion 5 06-10-2011 05:21 PM
Spelling errors and such starrlamia General Discussions 29 11-29-2010 03:59 AM
best program for correcting typos / spelling in epub & mobi books ? cybmole Calibre 15 11-16-2010 06:22 AM
Conversions from RTF (to mobi/epub) Gwen Morse Calibre 6 10-14-2010 06:00 AM
Conversion to Mobi to ePub errors erik_reader Conversion 5 08-07-2010 02:03 AM


All times are GMT -4. The time now is 11:54 PM.


MobileRead.com is a privately owned, operated and funded community.