View Single Post
Old 09-05-2006, 03:27 AM   #1
Antartica
Evangelist
Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.Antartica ought to be getting tired of karma fortunes by now.
 
Antartica's Avatar
 
Posts: 423
Karma: 1517132
Join Date: Jun 2006
Location: Madrid, Spain
Device: quaderno, remarkable2, yotaphone2, prs950, iliad, onhandpc, newton
Wink howto: importing PDFs to a word processor

I've been looking for an easy way to convert pdfs. Until now I was using a pdf2html program and processing the result, with mixed results. For the curious, this is what I used to convert some pdfs so they become nice to read on the Iliad (11cmx15cm, etc):
pdftohtml ( http://pdftohtml.sourceforge.net ), some ad-hoc scripts, tidy (http://tidy.sourceforge.net/ ), gnuhtml2latex (http://packages.debian.org/unstable/text/gnuhtml2latex ) and lyx ( http://www.lyx.org ). The results are acceptable but it's a lengthy process (about an hour for each book, mostly to adapt the ad-hoc scripts so they join lines correctly and detect chapter headings).

I've found an alternative: a plug-in for Abiword (a lean and portable wordprocessor) that imports pdf with some heuristics (and the heuristics seems to be well chosen, as to be general aplicable). It supports styles, multiple columns, etc.

It's incredible. As an example the author posts some images of before (pdf) importing and after (Abiword), see the attached images.

For a description of what it does:
http://www.abisource.com/twiki/bin/v...luginWithStyle

To download the sources of the pdf import plug-in and try it:
http://jauco.nl/blog/

Caution: I've just found it, so I have not tested it yet. As I have some spare time I'll try it ;-).

Tell me what you think about about it ;-).
Attached Thumbnails
Click image for larger version

Name:	pdf.png
Views:	749
Size:	134.7 KB
ID:	1488   Click image for larger version

Name:	abw.png
Views:	742
Size:	151.2 KB
ID:	1489  

Last edited by Antartica; 09-05-2006 at 03:29 AM.
Antartica is offline   Reply With Quote