09-27-2011, 07:31 PM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Sep 2011
Device: Kindle 3G
|
Guide for converting Kindle Topaz (xhtml with svg) to PDF
So, recently I was convert a book I purchased for my kindle into pdf so I could take it on my computer and read it w/o the Kindle 4 PC and I used an automagic tool to strip DRM and convert it to htmlz, but the OCR looked like crap, even though the file looked alright on my Kindle.
Fortunately, the unDRM tool had an output that was many xhtml pages (one for each page of the book) and the xhtml had svgs (Scalable Vector Graphics) to display the letters and images. I could view the book just fine using Firefox, but that was clumsy and required me to port the 300+ files around. Trying to use Calibre failed miserably (I got an error a quarter the way through, but even then the book was 180MB in size). So, I messed around with many different tools and finally arrived at a solution. It's not the most elegant solution in the world, but it is (roughly) cross platform. Note: This guide assumes that you have already (and ethically) stripped the DRM from the topaz book in question. This thread is not meant to discuss said stripping, there are other forums for that. Say it with me now I will NOT strip DRM from a book I do not own, nor will I "liberate" books for the intent of piracy. Got it? Good. Requirements and things you need: -The book. My tool outputs the following message and leaves three formats. Code:
Book Successfully generated Creating NoDRM HTMLZ Archive Creating SVG HTMLZ Archive Creating XML ZIP Archive - An operating system that is either Windows XP, Ubuntu 10.4, or MacOS 10.6 or older. One of the programs (Prince) does not like Windows 7, so if that's all you got, pray it works in compatibility mode or use a VM. Mac people, I have not tested your system so, cross your fingers. -Notepad++ (Open Source) Download Here -Prince (Free for non commercial use)This is the most finicky program of the bunch -PDF Split and Merge (Open Source) Download Here -Jave JRE An unattended Ninite Installer/Updater -Briss (Open Source, requires JRE) Project Home Page Step 0:Install all above programs and verify that they run. Step 1: Feel free to open the index_svg.xhtml page that is in the root of the unzipped book in Firefox or Safari. The book should look like it does on your ereader. Verify that in the same directory as index_svg.xhtml, you see at least two folders svg and img. You may see more. Step 2: Looking at your book in firefox, you may notice some artifacts that are not your book, mainly the go forward and go back buttons and a zoomin zoomout sort of thing. Let's get rid of those using notepad++ (inspired by this site). Open up page0000, page0001 and whatever your last page is in notepad++. Hopefully, your page0000 and page0001 are not very busy (maybe a title page or something) because that makes identifying the artifacts easier. In between your documents' "Body tags" (i.e. <body>[lots and lots of stuff</body>) you should see at least 4 "a" tags, one that says Code:
<a href="javascript:ppage();"><[more stuff]</a> Code:
<a href="javascript:npage();"><[more stuff]</a> Code:
<div><a href="javascript:zoomin();">zoom in</a> - <a href="javascript:zoomout();">zoom out</a></div> We want to remove all of the ppage() scripts and all of the zoom scripts, so let's do that now. You may notice that the page0000 has a different ppage() than page0001, page0002 and all the rest of the pages and that's because if you are at the beginning of the book, you can't go any more backward. Remove the entire ppage() script from page0000 manually. Check in Firefox that you didn't delete something important and then copy the ppage() script from page0001. Go to Search>Find in Files and paste the script into find, blank out replace, and choose the directory of all the individual pages. You can optionally put in a filter of *.xhtml Open up some of the pages in firefox and you should notice that the arrow to go backwards is gone. Do the same thing for the zoomin and zoom out. You may want to change the background color to white so you can do a replace Code:
<body onLoad="setsize();" style="background-color:#777; Code:
<body onLoad="setsize();" style="background-color:#FFF; Code:
<a href="javascript:npage();"><svg id="nextsvg" viewBox="0 0 100 300" xmlns="http://www.w3.org/2000/svg" version="1.1" style="background-color:#777"><polygon points="5,5,5,295,95,150" fill="#AAAAAA" /></svg></a> If you choose the wrong one to remove, your page will not be displayed at all! Check in Firefox that your book looks like you want it to. Step 3: After that doozy of a step 2, it gets easier from here. Open up Prince and add all of your pages to the queue and convert them. Check that the output rendered correctly (Don't worry about the margins if you have them, we'll banish those later). This may take a little bit. You should now have n separate pdfs, where n is the number of pages in your book. Step 4: Let's merge all the single page pdfs into one big one! Open up PDFSAM and select the merge/extract option and load all of the single pdfs into it, type in a file name and click Run. This shouldn't take too long. Step 5: Adjust the Margins. Prince tends to print your xhtml into a letter sized 8.5 x 11 piece of "paper" and that tends to leave us with big margins. Open up Briss and load your single pdf from step 4 into it. You will see all of your even pages and your odd pages overlaid on top of each other and you can adjust the cropping margins to whatever you want by dragging the upper left and lower right hand corners of the shaded "1" area. The preview function here is your friend. Tell it to crop and you are done! Other things to do: Change the PDF metadata with becypdfmetaedit. Take your output from step 2 and load it into sigil and make an epub! Note: this does not translate well into other formats, but it appears that epubs don't mind svgs. |
09-28-2011, 05:54 PM | #2 |
Guru
Posts: 850
Karma: 2641698
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
|
Note that PrinceXML can create PDF's of a custom size, and with custom margins. You would need to create a custom css file to do this.
|
Advert | |
|
01-02-2012, 12:06 PM | #3 |
Junior Member
Posts: 2
Karma: 10
Join Date: Nov 2008
Device: none
|
thanks for the instructions.
i tried one topaz ebook and the images of the final PDF file were missing. the log of prince says "\svg\????.img can't open input file:No such file or directory". the solution is to copy the images to the same folder of the .xhtml files, and change the code in .xhtml files. replace (xlink:href="../img/) with (xlink:href="). instead of Notepad++, i recommend ultrareplace. |
02-17-2012, 12:35 PM | #4 |
Junior Member
Posts: 1
Karma: 10
Join Date: Feb 2012
Device: iPod touch
|
Converting TOPAZ ebook to PDF
Here's my creating PDF ebook flow:
OS: Fedora 14 Tools: Calibre, Foxit Phantom 1. Install latest version of Calibre (0.8.40) 2. Install Foxit Phantom through Wine 3. Unzip the SVG HTMLZ Archive (rename .HTMLZ -> .ZIP) 4. Add ebook to Calibre by openning index_svg.xhtml 5. Convert ebook to PDF files (the final PDF output may be 100MB) 6. Split the final PDF per one page (Open Foxit Phantom > Tools > Split) 7. Delete the blank pages and "zoom in-zoom out" pages (1/2 of total numbers of PDF files) 8. Use Foxit Phantom to merge all split PDF files to 1 final PDF files 9. Crop the next, previous button in PDF file. And now you have your PDF file from SVG files . Compare it's size with the original TOPAZ file !!!. |
06-12-2012, 04:32 AM | #5 |
Connoisseur
Posts: 57
Karma: 230
Join Date: Sep 2011
Device: Boox M90, Sony PRS-300, PB360+, HP Touchpad
|
Thank you Fschumaur and Sherman. It works!
If you use Fschumaur's method above and find that the output from PrinceXML in step 3 does not render correctly, you can change the size of the pdf pages in the output: http://www.princexml.com/doc/8.0/page-size. Instead of creating a separate css file, I added @page { size: A3 } to C:\Program Files\Prince\Engine\style\common.css. Now the contents of the resulting pdf files do not spill over into the next page. |
Advert | |
|
03-27-2015, 01:35 AM | #6 |
Junior Member
Posts: 1
Karma: 10
Join Date: Mar 2015
Device: android
|
Here is a bash script that does a direct conversion without having to 'print' them. It relies on hxselect, inkscape, and pdfunite which are commonly found in most Linux repos. Use it by passing the directory containing index_svg.xhtml as the first argument and it will generate a pdf subdirectory.
Code:
#!/bin/bash mkdir $@/pdf for f in $@/svg/page*html do file=${f##*/} shortname=${file%%.xhtml} echo "processing $shortname" (cd $@/pdf ; cat ../svg/$file | hxselect '#svgimg' | inkscape -f /dev/stdin -A $shortname.pdf 2>/dev/null) done pdfunite $@/pdf/* $@/pdf/joined.pdf |
Tags |
htmlz, index_svg.xhtml, pdf, svg, topaz |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How can I convert topaz ebook from multiple xhtml's (SVG) to single pdf? | rglk | Workshop | 3 | 11-28-2011 04:33 PM |
Converting Topaz | sadievan | Amazon Kindle | 5 | 09-27-2011 07:40 PM |
Converting multiple text files to xhtml? | Spotnik | Sigil | 19 | 04-12-2011 10:37 PM |
Converting SVG graphics | navels | Sigil | 5 | 03-15-2011 09:58 PM |
Converting from Topaz, finally | chorpler | Kindle Developer's Corner | 104 | 02-23-2010 01:45 AM |