View Single Post
Old 09-05-2006, 10:25 PM   #3
vranghel
Addict
vranghel began at the beginning.
 
vranghel's Avatar
 
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
Quote:
Originally Posted by Antartica
I've been looking for an easy way to convert pdfs. Until now I was using a pdf2html program and processing the result, with mixed results. For the curious, this is what I used to convert some pdfs so they become nice to read on the Iliad (11cmx15cm, etc):
pdftohtml ( http://pdftohtml.sourceforge.net ), some ad-hoc scripts, tidy (http://tidy.sourceforge.net/ ), gnuhtml2latex (http://packages.debian.org/unstable/text/gnuhtml2latex ) and lyx ( http://www.lyx.org ). The results are acceptable but it's a lengthy process (about an hour for each book, mostly to adapt the ad-hoc scripts so they join lines correctly and detect chapter headings).

I've found an alternative: a plug-in for Abiword (a lean and portable wordprocessor) that imports pdf with some heuristics (and the heuristics seems to be well chosen, as to be general aplicable). It supports styles, multiple columns, etc.

It's incredible. As an example the author posts some images of before (pdf) importing and after (Abiword), see the attached images.

For a description of what it does:
http://www.abisource.com/twiki/bin/v...luginWithStyle

To download the sources of the pdf import plug-in and try it:
http://jauco.nl/blog/

Caution: I've just found it, so I have not tested it yet. As I have some spare time I'll try it ;-).

Tell me what you think about about it ;-).

Seems that my programming illiteracy is quite advanced: how the hell am i supposed to install the patch?

http://www.jauco.nl/SoC/abiword-pdf-style-0.3.patch
http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch

Those two are supposed to be the plugins, but when i click on them it opens a text file. There's no .dll no .exe no nothin'

I'd really appreciate some help from someone more knowledgeable.
vranghel is offline   Reply With Quote