01-25-2010, 08:48 AM | #346 |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
This method has worked beautifully for me. I was able to generate an html version of one of my books. Given that I am an academic, I was happy to see that the format actually maintains the original structire of the book (pages, I mean). One of my main sources of unhappiness with kindle formatted books has been that reading them on my DX does not maintain the original page layout, which means I cannot cite from it -- no way to know which page I'm on.
I noticed an svg folder filled with xhtmls, which are rendered fantastically using firefox (I'm on ubuntu). How can I combine those xhtml's to a single PDF? There must be a way to do it, and filesize does not interest me for now. Any suggestions on how to convert the individual files containing SVG to a single PDF? |
01-25-2010, 10:07 AM | #347 | |
Addict
Posts: 357
Karma: 1112
Join Date: Oct 2008
Location: Euroland
Device: PocketBook 360°, BeBook (Hanlin V3), iRex DR1000S, iPad
|
Quote:
Acrobat can also take multiple files and create a single PDF, so you can select all the little xhtml files, then select the correct order (if they are not logically named to already be in order) and then generate the PDF. As you have (I think) a collection of esentially pages (each xhtml) from the Topaz, the PDF created should match the pagination of the original. If I was to create a PDF from a set of ePub xhtml files, the pagination within chapters (or whatever defines the xhtml file splits) would probably not match. |
|
Advert | |
|
01-25-2010, 10:43 AM | #348 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
Quote:
|
|
01-25-2010, 11:31 AM | #349 | |
Addict
Posts: 241
Karma: 2617
Join Date: Mar 2009
Location: Greenwood, SC
Device: Kindle 2
|
Quote:
This is the only way I know how: 1) Use the "-r" flag on gensvg.py to generate the raw SVG images (without the xhtml/javascript wrapper). 2) Use Illustrator to batch convert the SVG files into PDF files. 3) Use Acrobat Pro to combine the PDF files into one. There are major drawbacks to this, however: 1) This requires a Mac or PC and very expensive copies of Illustrator and Acrobat Pro. 2) Illustrator sucks at rendering SVG correctly, and many of the pages are poor looking. 3) The filesize (even though you stated that you didn't care) is outrageous. I converted 65 pages into a 38Meg PDF file. An Open-Source alternative would be to use InkScape to render the SVG files into PDF. I don't have Inkscape installed on any of my machines, so I don't know how good the output is. However, I do know that SVG is Inkscape's default file format so it ought to be reasonably good. This page has a tutorial on using Inkscape and pdftk to create a pdf from multiple SVG images (and since it's command-line-based instead of GUI, this would be much quicker than the above). |
|
01-25-2010, 11:37 AM | #350 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
Quote:
Perfect. I'm so pleased. This actually allows me to get around a serious issue (for academics, at least) with textbooks, since I can now get to a point where I can transform bought books so that they become 'citeable'. |
|
Advert | |
|
01-25-2010, 11:51 AM | #351 | |
Addict
Posts: 241
Karma: 2617
Join Date: Mar 2009
Location: Greenwood, SC
Device: Kindle 2
|
Quote:
|
|
01-25-2010, 12:30 PM | #352 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Clarknova,
Would it be any help (pdf file size-wise) to start with the html version of the book with only critical areas converted to svg's but the main part of the book being straight html. For example, a new version of flatxml2html using the code used for the ornate letter A issue can automatically create svg images for just the "fixed" regions on the page and put img src style links to them right into the html while letting the bulk of the document remain html. This did wonders for the need to hand edit anything in my book but at the expense of more svg images and less ability to search for things (since they might be in images). The question is would this result in a significantly reduced in size pdf (once converted)? Or would this buy us nothing? Thanks, KevinH |
01-25-2010, 12:48 PM | #353 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
Quote:
First I'm going to take a look at the PDF's we can produce, and then move on from there. There's nothing to stop me from feeding that PDF back through OCR and output a text-based PDF with images. Edit: using ubuntu, I installed the librsvg2-bin package, which I used for conversion. The commandline I used -- in svg directory -- was "for i in page*.svg; do rsvg-convert -a -f pdf $i -o `echo $i | sed -e ' s/svg$/pdf/'`; done" This created individual pdf's for each page. A total of 305 pages, at 197 megabytes. I combined those using Acrobat, and then ran 'optimize for OCR'. The resulting file is beautiful, with all images, and smooth, and weighs in at 3407K. Awesome. Last edited by Coconut; 01-25-2010 at 02:02 PM. |
|
01-25-2010, 03:18 PM | #354 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi Coconut,
Exactly what is "optimize for OCR"? Is this an Acrobat Pro function? Is there opensource that can do the same thing? KevinH |
01-25-2010, 03:37 PM | #355 | |
Addict
Posts: 241
Karma: 2617
Join Date: Mar 2009
Location: Greenwood, SC
Device: Kindle 2
|
Quote:
Unfortunately, I have no use for PDFs (since PDF isn't an ebook format). But for people that do, this is certainly an option, providing they have Acrobat Pro. (Acrobat's OCR is neat. There are a few more OCR errors than in the original Topaz file, but it attempts to preserve style -- though not very well...) Kevin: All of the open source OCR stuff is pretty obsolete and useless. The errors tend to be way more than in the Topaz file or even Adobe's OCR. Personally, I find the genhtml to be the most usable. I just have to convert (using imagemagick or illustrator or inkscape or whatever) the Monogram and Table svgs that get generated into PNG/JPEG so I can create an ePub out of the data. |
|
01-25-2010, 03:48 PM | #356 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
Quote:
For OCR I actually used Finereader, which does a great job. The pdf I end up with is essentially error free. Finereader can also export to a variety of formats (paged and non-paged). I would not be surprised if html outputted by it surpasses what we've been able to produce, since it retains formatting. I'll try that later. Do we have a standard text to use for conversion and comparison of different methods? It's really the only way to determine what works best. Last edited by Coconut; 01-25-2010 at 03:51 PM. |
|
01-25-2010, 03:53 PM | #357 |
Addict
Posts: 241
Karma: 2617
Join Date: Mar 2009
Location: Greenwood, SC
Device: Kindle 2
|
You must have a different version than me. CS4 has OCR functions, and then just the "Optimize Scanned PDF" which shrinks down my giant raster images (after I've done OCR) into something more manageable.
|
01-25-2010, 06:07 PM | #358 |
Sigil Developer
Posts: 7,645
Karma: 5433388
Join Date: Nov 2009
Device: many
|
> I just have to convert (using imagemagick or illustrator or inkscape or whatever) the Monogram and Table svgs that get generated into PNG/JPEG so I can create an ePub out of the data.
I thought that svg was part of the epub spec? As long as they are not animations, I thought svg graphics did not need to be converted to png or jpeg when used in epub? At least that is what the Mobileread Wiki says. I will make one and see if it works on my Sony reader. Thanks, Kevin |
01-25-2010, 09:20 PM | #359 |
Junior Member
Posts: 7
Karma: 10
Join Date: Jan 2010
Device: Kindle DX
|
|
03-19-2010, 02:33 PM | #360 |
Addict
Posts: 265
Karma: 89314
Join Date: Nov 2009
Location: Southern Illinois
Device: eSlick, Pocketbook IQ, iPad, Kobo Aura, Kobo Aura ONE
|
feeling stupid
So, I have some experience stripping DRM. I've done it from PDB, MOBI, and EPUB. I'm working now on Kindle Topaz. I do not understand the directions that I have found. Specifically what to do with this line:
cmbtc_dump.py -d -o TARGETDIR [-p pid] YOURBOOKNAMEHERE Do you do this in commandprompt, just like for pdb or mobi books? I'm not getting it to work at all. Not even an error message. It just takes me back to my command prompt: c:\python26 Any help would be appreciated |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
discovering and loving this fb.2 reader.. | oncdoc | Astak EZReader | 2 | 04-19-2010 06:05 PM |
K4 Mac or PC Where are K4PC files? | lmittell | Amazon Kindle | 3 | 01-06-2010 01:04 AM |
Where is the PID on Pocket Pro, ADE and K4PC? | rxsz | Astak EZReader | 7 | 12-20-2009 05:29 AM |
Free on Kindle - Discovering Dani | koland | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 09-28-2009 09:57 AM |
Kindle PID from Mobi PID - can anyone do it? | delphidb96 | Workshop | 2 | 04-27-2009 04:42 PM |