View Full Version : Convert PDFs WITH formatting


njustn
12-03-2007, 02:56 PM
So, want to convert a pdf for kindle reading with formatting? I certainly did, and this post http://www.mobileread.com/forums/showthread.php?t=16636 on these fine forums gave me the way. So all that needs to happen to get it done is make the pdf image based rather than text based. This does have a cost, the output is not searchable, and there are no links for a table of contents, but for scientific papers and other highly complex formatting situations, I think it's worth it until they get us something better.

Now, unfortunately most PDF tools don't support making non-searchable PDFs because it's the reverse of what one normally wants PDF for. Luckily there are tools meant for other things :D .

Imagemagick can convert from pdf to just about anything, and from just about anything to pdf, but because its internals use rasterized images... you got it, every pdf that comes out is image based! So

convert -density 120x120 <source>.pdf <output>.pdf

generates a nice image based pdf file in a few seconds. Send that through the conversion email address for your kindle and even the most complex (correct) pdf will look just right on your kindle.

Now, the qualification in the previous sentence is because imagemagick has a bug related to certain unsupported forms of compression, so every once in a blue moon this can fail, but for the most part it produces a beautiful (though unfortunately large) output.

Also, if you find that the text is a little faint or blurry after the conversions, try upping the density, it will go as high as you want, but increasing it will also make the file larger, so keep that in mind. (1200x1200 took 4GB of ram before the conversion crashed...:oops2: so really, be careful)

tklaus
12-03-2007, 11:16 PM
How do image PDF's look on the Kindle? Does it just shrink it down to fit a page on Kindle's 6" screen? If so, is it still readable?

njustn
12-04-2007, 12:21 AM
I would say they look pretty good. It splits the page in half horizontally if it can. Actually, that's the cool part, it splits as near to half as it can without splitting through anything, so it never cuts lines or pictures in half, really neat actually, have an example azw/source pdf file of this if anyone's interested.

Anyway, once it's split, it displays it rotated 90 degrees to the left on the screen, so it's kinda like viewing a PDF on the sony reader rotated. I find the text large enough for me to read (though not by much) and the images overall quite crisp. The only thing I would be wary of is that the resolution of the source pdf needs to be as high and as sharp as possible, if it's overly smoothed or low res (anything under 100dpi) it really ceases to be readable. That said, I have now read about 30 pages of formatted scientific papers this way, and I think it works great.

For image PDFs of things like manga or comics I think it might be a different story though... since the system seems unwilling to split though a non-blank part of a page, which is pretty cool, it might just leave the whole page alone, which would make it too compressed to read reasonably given the resolution of the screen. Which brings me to the two big gripes with this method, no zoom and pan, and no text so no rerendering just progressive quality loss.

So you can see for yourself here's a photo of a particularly complicated piece of a scientific paper I was reading the other day rendered on the kindle. http://people.cs.vt.edu/~njustn/DSC00355.JPG

njustn
12-04-2007, 12:37 AM
actually, picture is worth a thousand words right? here's a photo of a page that gave other methods no end of grief (the graphs are EPS, and thus every label is text... yeah...) rendered on the kindle with this method. Link because the image is rather large. Page section shown is slightly more than half for the reason mentioned in my last post.

http://people.cs.vt.edu/~njustn/DSC00355.JPG

njustn
12-04-2007, 12:44 AM
accidental double post.

ajju
12-07-2007, 08:01 AM
NJustN: with some double column academic papers the size of the image is too small for me to be able to read text. I am guessing reducing the density should fix this. Will report back.

catsittingstill
12-08-2007, 08:14 PM
You have a good way to convert complex pdfs :)
The results aren't searchable :(

Can they be annotated? You could still put a bunch of quick tags in the annotations on the order of : "Purine receptor paper, contains x-ray structure, first author Harrison, journal Cell." If I understand correctly, the annotations *would* be searchable, which would be useful if you want to hunt up all the Purine receptor papers on short notice.

njustn
12-14-2007, 12:41 PM
ajju: the density parameter is the resolution per inch of source material to render into the result image. It's the same as the dots per inch value on a printer, higher will be easier to read because it will be less pixelated. If you need it physically larger, the options unfortunately are physical magnification or doing what I tried first for reading pdfs, you can split the document into 4 sections per page and send them as seperate images, unfortunately each of those is counted as a document on the kindle, and they don't get split as intelligently as the amazon tool. I can tell you how to do that, but it's really not a good solution at all, especially for 3 column journals ><.

catsittingstill: You can add notes and bookmarks, but because there is no text, they are only specific to a half page granularity. It actually still works well for me that way, but it's not anywhere near perfect.

I'm trying to find a way to subdivide the pdf before sending it in to make this work a little better, but nothing seems to be working thusfar. I'll post again if I find anything promising.

kriz
06-05-2008, 02:10 PM
I've only recently gotten my Kindle with a primary purpose of using it to store and view journal articles. This way I don't have to print them out or keep switching windows from Acrobat to Word while writing articles. I have tried most of the methods listed on this page with varying results. So far the best method I have come up with to convert journal articles which are usually 2 column with images to a Kindle readable format is via the use of Acrobat Professional. By performing a "Save As" to HTML format and then using the MobiReader on the outputed HTML file, the articles get converted to single column, retain images (although they can be a bit small), and they are still searchable. Right now I'm looking for a free, command line tool that performs that PDF -> HTML conversion as well as Acrobat so that I can batch convert my files.