|
|
Thread Tools | Search this Thread |
03-20-2011, 09:56 AM | #1 |
Member
Posts: 12
Karma: 10
Join Date: Mar 2011
Device: PC (Linux)
|
How can I convert topaz ebook from multiple xhtml's (SVG) to single pdf?
I purchased a Kindle ebook that turned out to be topaz formatted. I don't like reading Kindle ebooks in Kindle for PC (running in Wine in Linux), so I passed this book through the KindleBooks.pyw program from DRM_tools_v3.7 and then used Calibre to convert the .zip file containing a single html file plus .css, .opf and images into a Mobi ebook. However, in the process some formatting is lost and the text is corrupted through Amazon OCR errors.
KindleBooks.pyw also produced a ...SVG.zip file that contains an "img" folder with many .jpg and .svg images, a folder "svg", and a file index_svg.xhtml. The "svg" folder holds all of the book's pages as individual xhtml files that I can inspect with my webbrowser (and navigate through with javascript); they represent the original scanned images of all the book's pages. I would like now to convert and assemble all these individual pages to a single pdf file that I can read with Adobe Reader, e.g. as "single page continuous". How can I do that? Many thanks, Rob |
03-25-2011, 07:32 AM | #2 | |
Member
Posts: 12
Karma: 10
Join Date: Mar 2011
Device: PC (Linux)
|
I haven't found a satisfactory solution to this problem. I'd also posted my query to Apprentice Alf's blog, and some_updates responded as follows:
Quote:
Thanks, some_update, for your good suggestions. 1. I was able to import the SVG data by adding index_svg.xhtml to Calibre and then converting the resulting zip to pdf. After 40 min of grinding away, Calibre produced a 210 MB single pdf of the 300 page book that did contain the original scanned images of all the pages (before OCR) but also the javascript navigation triangles and zoom buttons plus a third of a blank page inserted after every book page. That’s not really what I wanted. 2. The …SVG.zip output from KindleBooks.pyw (in the SVG folder) contained xhtml images of all book pages, not svg images, and Inkscape couldn’t handle these. To crop these images and remove the javascript code, white space, etc., I would have had to edit every xhtml page file with an html editor. I played around with this a bit in Mozilla Seamonkey Composer but then gave up, just couldn’t handle it. 3. Spellchecking and fixing the html file produced by Amazon through OCR also wasn’t feasible, as the text contains numerous Sanskrit and Tibetan terms (transliterated into Roman script) many of which had been corrupted by the OCR process and would have to be fixed by hand. So thanks again for your help but I haven’t found a satisfactory solution to this problem. I’ll be very leery to purchase another Kindle book that’s Topaz DRM’ed if that restricts me to reading it only in Kindle apps such as Kindle for PC. But then, how does one know beforehand whether a given Kindle book is Topaz-encrypted? Last edited by rglk; 03-25-2011 at 07:37 AM. |
|
Advert | |
|
03-25-2011, 01:38 PM | #3 | |
Wizzard
Posts: 11,517
Karma: 33048258
Join Date: Mar 2010
Location: Roundworld
Device: Kindle 2 International, Sony PRS-T1, BlackBerry PlayBook, Acer Iconia
|
Quote:
Also, for step 2), it sounds like the cruft you have to strip out is auto-generated and probably fairly uniform, so if you have any scripting skills, you might be able to whip up an auto-converter to make your life easier. Hope this helps, and welcome to MobileRead! |
|
11-28-2011, 04:33 PM | #4 |
Junior Member
Posts: 1
Karma: 10
Join Date: Nov 2011
Location: London, UK
Device: eReader
|
This seems to work...
I know this is an old thread, but something relevant that seems to work so I can use puchased Kindle content on either my Kindle or my eReader is:
1) following instructions in Apprentice Alf's tools_v4.8.zip/ DeDRM_for_Mac_and_Win / WinApp_2.8 / ReadMe_DeDRM_WinApp.txt, using: a) ActiveState ActivePython-2.7.2.5-win32-x86 . Community Edition b) pycrypto-2.3.win32-py2.7 c) DeDRM_WinApp_2.8 out of Apprentice Alf's tools_v4.8 d) Calibre 0.8.28 2) following instructions and specifying the encrypted Topaz file X.azw in DeDRM produced 3 outputs: a) X_SVG (zipped folder) b) X_XML (zipped folder) c) X_nodrm.htmlz (unzipped file) 3) As reported in this thread, progress with the SVG and XML folders was tedious, but reimporting the nodrm-htmlz file into Calibre allows easy exporting (eg: a PDF file with just adequate layout or an ePub file with reasonable layout and all figures and the search functionality intact). K. Last edited by knever; 11-28-2011 at 04:35 PM. Reason: correction |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
multiple xhtml's to pdf | monkeyman224 | Amazon Kindle | 3 | 10-16-2010 02:39 AM |
Converting multiple HTML files into a single hyperlinked PDF? | Jürgen Hubert | Reading and Management | 6 | 01-11-2010 07:44 AM |
How do you handle multiple stories in a single book? | Sabardeyn | Calibre | 1 | 06-24-2009 02:42 PM |
Convert multiple images(comics) to PDF - MAC | stustaff | Sony Reader | 2 | 11-28-2007 10:31 AM |
Convert offline websites into a single pdf? | magogo | Sony Reader | 7 | 05-12-2007 12:05 PM |