Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 09-27-2011, 07:31 PM   #1
Fschumaur
Junior Member
Fschumaur began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Sep 2011
Device: Kindle 3G
Post Guide for converting Kindle Topaz (xhtml with svg) to PDF

So, recently I was convert a book I purchased for my kindle into pdf so I could take it on my computer and read it w/o the Kindle 4 PC and I used an automagic tool to strip DRM and convert it to htmlz, but the OCR looked like crap, even though the file looked alright on my Kindle.

Fortunately, the unDRM tool had an output that was many xhtml pages (one for each page of the book) and the xhtml had svgs (Scalable Vector Graphics) to display the letters and images. I could view the book just fine using Firefox, but that was clumsy and required me to port the 300+ files around. Trying to use Calibre failed miserably (I got an error a quarter the way through, but even then the book was 180MB in size).

So, I messed around with many different tools and finally arrived at a solution. It's not the most elegant solution in the world, but it is (roughly) cross platform.

Note: This guide assumes that you have already (and ethically) stripped the DRM from the topaz book in question. This thread is not meant to discuss said stripping, there are other forums for that.

Say it with me now I will NOT strip DRM from a book I do not own, nor will I "liberate" books for the intent of piracy. Got it? Good.

Requirements and things you need:
-The book. My tool outputs the following message and leaves three formats.
Code:
Book Successfully generated
Creating NoDRM HTMLZ Archive
Creating SVG HTMLZ Archive
Creating XML ZIP Archive
We need the SVG (HTMLZ) archive. Extract it. It's just a renamed zip file.
- An operating system that is either Windows XP, Ubuntu 10.4, or MacOS 10.6 or older. One of the programs (Prince) does not like Windows 7, so if that's all you got, pray it works in compatibility mode or use a VM. Mac people, I have not tested your system so, cross your fingers.
-Notepad++ (Open Source) Download Here
-Prince (Free for non commercial use)This is the most finicky program of the bunch
-PDF Split and Merge (Open Source) Download Here
-Jave JRE An unattended Ninite Installer/Updater
-Briss (Open Source, requires JRE) Project Home Page

Step 0:Install all above programs and verify that they run.

Step 1: Feel free to open the index_svg.xhtml page that is in the root of the unzipped book in Firefox or Safari. The book should look like it does on your ereader. Verify that in the same directory as index_svg.xhtml, you see at least two folders svg and img. You may see more.

Step 2: Looking at your book in firefox, you may notice some artifacts that are not your book, mainly the go forward and go back buttons and a zoomin zoomout sort of thing. Let's get rid of those using notepad++ (inspired by this site).
Open up page0000, page0001 and whatever your last page is in notepad++. Hopefully, your page0000 and page0001 are not very busy (maybe a title page or something) because that makes identifying the artifacts easier. In between your documents' "Body tags" (i.e. <body>[lots and lots of stuff</body>) you should see at least 4 "a" tags, one that says
Code:
<a href="javascript:ppage();"><[more stuff]</a>
Two that say
Code:
<a href="javascript:npage();"><[more stuff]</a>
and one that says
Code:
<div><a href="javascript:zoomin();">zoom in</a> - <a href="javascript:zoomout();">zoom out</a></div>
For me the zoomin, zoomout one was very near the bottom of the text.
We want to remove all of the ppage() scripts and all of the zoom scripts, so let's do that now. You may notice that the page0000 has a different ppage() than page0001, page0002 and all the rest of the pages and that's because if you are at the beginning of the book, you can't go any more backward. Remove the entire ppage() script from page0000 manually. Check in Firefox that you didn't delete something important and then copy the ppage() script from page0001.

Go to Search>Find in Files and paste the script into find, blank out replace, and choose the directory of all the individual pages. You can optionally put in a filter of *.xhtml Open up some of the pages in firefox and you should notice that the arrow to go backwards is gone. Do the same thing for the zoomin and zoom out. You may want to change the background color to white so you can do a replace
Code:
<body onLoad="setsize();" style="background-color:#777;
with
Code:
<body onLoad="setsize();" style="background-color:#FFF;
Since there are two npage() scripts (one from clicking the forward arrow and one from clicking the page), we need to remove the proper one. It should be the same in page0000 as it is in page0001 as it is in page0024 ... Recall that the last page has its own unique one as well. For referance, mine looks like
Code:
<a href="javascript:npage();"><svg id="nextsvg" viewBox="0 0 100 300" xmlns="http://www.w3.org/2000/svg" version="1.1" style="background-color:#777"><polygon points="5,5,5,295,95,150" fill="#AAAAAA" /></svg></a>
the id="nextsvg" is a dead give away that this generates the next page and isn't this page's main content.
If you choose the wrong one to remove, your page will not be displayed at all!

Check in Firefox that your book looks like you want it to.

Step 3: After that doozy of a step 2, it gets easier from here. Open up Prince and add all of your pages to the queue and convert them. Check that the output rendered correctly (Don't worry about the margins if you have them, we'll banish those later). This may take a little bit. You should now have n separate pdfs, where n is the number of pages in your book.

Step 4: Let's merge all the single page pdfs into one big one! Open up PDFSAM and select the merge/extract option and load all of the single pdfs into it, type in a file name and click Run. This shouldn't take too long.

Step 5: Adjust the Margins. Prince tends to print your xhtml into a letter sized 8.5 x 11 piece of "paper" and that tends to leave us with big margins. Open up Briss and load your single pdf from step 4 into it. You will see all of your even pages and your odd pages overlaid on top of each other and you can adjust the cropping margins to whatever you want by dragging the upper left and lower right hand corners of the shaded "1" area. The preview function here is your friend. Tell it to crop and you are done!

Other things to do:
Change the PDF metadata with becypdfmetaedit.
Take your output from step 2 and load it into sigil and make an epub! Note: this does not translate well into other formats, but it appears that epubs don't mind svgs.
Fschumaur is offline   Reply With Quote
Old 09-28-2011, 05:54 PM   #2
sherman
Guru
sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.sherman ought to be getting tired of karma fortunes by now.
 
Posts: 850
Karma: 2641698
Join Date: Aug 2008
Location: Taranaki - NZ
Device: Kobo Aura H2O, Kobo Forma
Note that PrinceXML can create PDF's of a custom size, and with custom margins. You would need to create a custom css file to do this.
sherman is offline   Reply With Quote
Advert
Old 01-02-2012, 12:06 PM   #3
charles5410
Junior Member
charles5410 began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2008
Device: none
thanks for the instructions.
i tried one topaz ebook and the images of the final PDF file were missing.
the log of prince says "\svg\????.img can't open input file:No such file or directory".

the solution is to copy the images to the same folder of the .xhtml files, and change the code in .xhtml files. replace (xlink:href="../img/) with (xlink:href=").

instead of Notepad++, i recommend ultrareplace.
charles5410 is offline   Reply With Quote
Old 02-17-2012, 12:35 PM   #4
kid1412_net
Junior Member
kid1412_net began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Feb 2012
Device: iPod touch
Converting TOPAZ ebook to PDF

Here's my creating PDF ebook flow:

OS: Fedora 14
Tools: Calibre, Foxit Phantom

1. Install latest version of Calibre (0.8.40)
2. Install Foxit Phantom through Wine
3. Unzip the SVG HTMLZ Archive (rename .HTMLZ -> .ZIP)
4. Add ebook to Calibre by openning index_svg.xhtml
5. Convert ebook to PDF files (the final PDF output may be 100MB)
6. Split the final PDF per one page (Open Foxit Phantom > Tools > Split)
7. Delete the blank pages and "zoom in-zoom out" pages (1/2 of total numbers of PDF files)
8. Use Foxit Phantom to merge all split PDF files to 1 final PDF files
9. Crop the next, previous button in PDF file.

And now you have your PDF file from SVG files . Compare it's size with the original TOPAZ file !!!.
kid1412_net is offline   Reply With Quote
Old 06-12-2012, 04:32 AM   #5
BCotton
Connoisseur
BCotton doesn't litterBCotton doesn't litterBCotton doesn't litter
 
Posts: 57
Karma: 230
Join Date: Sep 2011
Device: Boox M90, Sony PRS-300, PB360+, HP Touchpad
Thumbs up

Thank you Fschumaur and Sherman. It works!

If you use Fschumaur's method above and find that the output from PrinceXML in step 3 does not render correctly, you can change the size of the pdf pages in the output: http://www.princexml.com/doc/8.0/page-size. Instead of creating a separate css file, I added @page { size: A3 } to C:\Program Files\Prince\Engine\style\common.css. Now the contents of the resulting pdf files do not spill over into the next page.
BCotton is offline   Reply With Quote
Advert
Old 03-27-2015, 01:35 AM   #6
gmer
Junior Member
gmer began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Mar 2015
Device: android
Here is a bash script that does a direct conversion without having to 'print' them. It relies on hxselect, inkscape, and pdfunite which are commonly found in most Linux repos. Use it by passing the directory containing index_svg.xhtml as the first argument and it will generate a pdf subdirectory.

Code:
#!/bin/bash
mkdir $@/pdf

for f in $@/svg/page*html
do
	file=${f##*/}
	shortname=${file%%.xhtml}
	echo "processing $shortname"
	(cd $@/pdf ; cat ../svg/$file | hxselect '#svgimg' | inkscape -f /dev/stdin -A $shortname.pdf 2>/dev/null)
done

pdfunite $@/pdf/* $@/pdf/joined.pdf
gmer is offline   Reply With Quote
Reply

Tags
htmlz, index_svg.xhtml, pdf, svg, topaz

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How can I convert topaz ebook from multiple xhtml's (SVG) to single pdf? rglk Workshop 3 11-28-2011 04:33 PM
Converting Topaz sadievan Amazon Kindle 5 09-27-2011 07:40 PM
Converting multiple text files to xhtml? Spotnik Sigil 19 04-12-2011 10:37 PM
Converting SVG graphics navels Sigil 5 03-15-2011 09:58 PM
Converting from Topaz, finally chorpler Kindle Developer's Corner 104 02-23-2010 01:45 AM


All times are GMT -4. The time now is 09:26 PM.


MobileRead.com is a privately owned, operated and funded community.