Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > More E-Book Readers > iRex

Notices

Reply
 
Thread Tools Search this Thread
Old 09-05-2006, 03:27 AM   #1
Antartica
Evangelist
Antartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-books
 
Posts: 415
Karma: 754
Join Date: Jun 2006
Location: Madrid, Spain
Device: iliad, onhandpc, newton, zaurus
Wink howto: importing PDFs to a word processor

I've been looking for an easy way to convert pdfs. Until now I was using a pdf2html program and processing the result, with mixed results. For the curious, this is what I used to convert some pdfs so they become nice to read on the Iliad (11cmx15cm, etc):
pdftohtml ( http://pdftohtml.sourceforge.net ), some ad-hoc scripts, tidy (http://tidy.sourceforge.net/ ), gnuhtml2latex (http://packages.debian.org/unstable/text/gnuhtml2latex ) and lyx ( http://www.lyx.org ). The results are acceptable but it's a lengthy process (about an hour for each book, mostly to adapt the ad-hoc scripts so they join lines correctly and detect chapter headings).

I've found an alternative: a plug-in for Abiword (a lean and portable wordprocessor) that imports pdf with some heuristics (and the heuristics seems to be well chosen, as to be general aplicable). It supports styles, multiple columns, etc.

It's incredible. As an example the author posts some images of before (pdf) importing and after (Abiword), see the attached images.

For a description of what it does:
http://www.abisource.com/twiki/bin/v...luginWithStyle

To download the sources of the pdf import plug-in and try it:
http://jauco.nl/blog/

Caution: I've just found it, so I have not tested it yet. As I have some spare time I'll try it ;-).

Tell me what you think about about it ;-).
Attached Thumbnails
Click image for larger version

Name:	pdf.png
Views:	396
Size:	134.7 KB
ID:	1488   Click image for larger version

Name:	abw.png
Views:	413
Size:	151.2 KB
ID:	1489  

Last edited by Antartica; 09-05-2006 at 03:29 AM.
Antartica is offline   Reply With Quote
Old 09-05-2006, 02:04 PM   #2
TadW
Uebermensch
TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.TadW ought to be getting tired of karma fortunes by now.
 
TadW's Avatar
 
Posts: 2,582
Karma: 1094606
Join Date: Jul 2003
Location: Italy
Device: Kindle
If the images depict the general conversion quality of this plugin, then I am really impressed. It's better than most commercial solutions I've seen.

I am curious to hear how it works for you.
TadW is offline   Reply With Quote
 
Enthusiast
Old 09-05-2006, 10:25 PM   #3
vranghel
Addict
vranghel began at the beginning.
 
vranghel's Avatar
 
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
Quote:
Originally Posted by Antartica
I've been looking for an easy way to convert pdfs. Until now I was using a pdf2html program and processing the result, with mixed results. For the curious, this is what I used to convert some pdfs so they become nice to read on the Iliad (11cmx15cm, etc):
pdftohtml ( http://pdftohtml.sourceforge.net ), some ad-hoc scripts, tidy (http://tidy.sourceforge.net/ ), gnuhtml2latex (http://packages.debian.org/unstable/text/gnuhtml2latex ) and lyx ( http://www.lyx.org ). The results are acceptable but it's a lengthy process (about an hour for each book, mostly to adapt the ad-hoc scripts so they join lines correctly and detect chapter headings).

I've found an alternative: a plug-in for Abiword (a lean and portable wordprocessor) that imports pdf with some heuristics (and the heuristics seems to be well chosen, as to be general aplicable). It supports styles, multiple columns, etc.

It's incredible. As an example the author posts some images of before (pdf) importing and after (Abiword), see the attached images.

For a description of what it does:
http://www.abisource.com/twiki/bin/v...luginWithStyle

To download the sources of the pdf import plug-in and try it:
http://jauco.nl/blog/

Caution: I've just found it, so I have not tested it yet. As I have some spare time I'll try it ;-).

Tell me what you think about about it ;-).

Seems that my programming illiteracy is quite advanced: how the hell am i supposed to install the patch?

http://www.jauco.nl/SoC/abiword-pdf-style-0.3.patch
http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch

Those two are supposed to be the plugins, but when i click on them it opens a text file. There's no .dll no .exe no nothin'

I'd really appreciate some help from someone more knowledgeable.
vranghel is offline   Reply With Quote
Old 09-06-2006, 12:10 PM   #4
Antartica
Evangelist
Antartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-books
 
Posts: 415
Karma: 754
Join Date: Jun 2006
Location: Madrid, Spain
Device: iliad, onhandpc, newton, zaurus
Quote:
Originally Posted by vranghel
Seems that my programming illiteracy is quite advanced: how the hell am i supposed to install the patch?

http://www.jauco.nl/SoC/abiword-pdf-style-0.3.patch
http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch

Those two are supposed to be the plugins, but when i click on them it opens a text file. There's no .dll no .exe no nothin'

I'd really appreciate some help from someone more knowledgeable.
Some background : patch(1) is an UNIX utility usually used to merge some modifications into the source code of a released version of a program. And those files ("patches") are generated with the diff(1) utility. So the files are named patches or diffs.

Patches are usually geared to programmers or advanced users, not afraid of downloading source code and compilling it himself. It's really not very difficult if you have the right tools.

So this is a patch in the old UNIX way. In Windows is more common to say "patch" refering to a package of replacement files needed to upgrade a program.

And more to the point: search below for detailed instructions to how to apply the patch and compile the program (in Linux, that is what I've installed; in Windows+Cygwin it should be slightly different)... but the instructions are incomplete right now, as I've found that the patched poppler library fails to compile using gcc 3.3.5 :-( .

Anyway, in the next message I say how to get to that error

Last edited by Antartica; 09-06-2006 at 12:20 PM.
Antartica is offline   Reply With Quote
Old 09-06-2006, 12:16 PM   #5
Antartica
Evangelist
Antartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-books
 
Posts: 415
Karma: 754
Join Date: Jun 2006
Location: Madrid, Spain
Device: iliad, onhandpc, newton, zaurus
Exclamation

(Partial and ) Detailed Debian GNU/Linux 3.0 "Sarge" instructions (what I've done):

For patching, compiling and installing the required poppler library:

$ su
# apt-get install cdbs gnome-pkg-tools libgtk2.0-dev libqt3-mt-dev automake1.9 dh-make build-essential dpkg-dev libjpeg62-dev libz-dev fakeroot libxml2-dev
# exit
$ mkdir src.poppler
$ cd src.poppler
$ wget http://poppler.freedesktop.org/poppler-0.5.3.tar.gz
$ wget http://www.jauco.nl/SoC/poppler-pdf-style-0.3.patch
$ tar -xvzf poppler-0.5.3.tar.gz
$ cd poppler-0.5.3
$ patch -p1 < ../poppler-pdf-style-0.3.patch
$ ln -s /usr/include/libxml2/libxml poppler/
$ echo "s" | dh_make
$ sed -i "s/configure /configure --enable-zlib --enable-xpdf-headers/g" debian/rules
$ chmod a+x debian/rules
$ fakeroot debian/rules binary

This should have generated a .deb file that you can install, but it failed to compile, with the following error:

g++ -DHAVE_CONFIG_H -I. -I. -I.. -I. -I.. -I../goo -I/usr/include/freetype2 -Wall -Wno-unused -g -O2 -MT ABWOutputDev.lo -MD -MP -MF .deps/ABWOutputDev.Tpo -c ABWOutputDev.cc -fPIC -DPIC -o .libs/ABWOutputDev.o
ABWOutputDev.cc: In member function `void ABWOutputDev::ATP_recursive(xmlNode*)
':
ABWOutputDev.cc:804: error: declaration of `void
ABWOutputDev::cleanUpNode(xmlNode*, bool)' outside of class is not
definition
It seems to be some construct that is not legal in gcc 3.3.5... I hope to have tomorrow some time to try to debug the offending file, but don't count on it :-(

After being able to compile the poppler library, it is necessary to do the same with the abiword sources... so there is quite a bit of work left to do.

BTW: Maybe this post should be in hacks/devel :-?

Last edited by Antartica; 09-06-2006 at 12:25 PM.
Antartica is offline   Reply With Quote
Old 09-06-2006, 02:20 PM   #6
vranghel
Addict
vranghel began at the beginning.
 
vranghel's Avatar
 
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
Quote:
Originally Posted by Antartica
Some background : patch(1) is an UNIX utility usually used to merge some modifications into the source code of a released version of a program. And those files ("patches") are generated with the diff(1) utility. So the files are named patches or diffs.

Patches are usually geared to programmers or advanced users, not afraid of downloading source code and compilling it himself. It's really not very difficult if you have the right tools.

So this is a patch in the old UNIX way. In Windows is more common to say "patch" refering to a package of replacement files needed to upgrade a program.

And more to the point: search below for detailed instructions to how to apply the patch and compile the program (in Linux, that is what I've installed; in Windows+Cygwin it should be slightly different)... but the instructions are incomplete right now, as I've found that the patched poppler library fails to compile using gcc 3.3.5 :-( .

Anyway, in the next message I say how to get to that error

Thanks Antarctica for taking the time to explain.
Unfortunately at the 2nd post you have lost me....all that code is chinese to me

So there seems to be a some kind of error in the patch as it will not compile.
Hopefully it is not a big issue, because the ideea of the plugin is wonderful and i'd really want to see it in action
vranghel is offline   Reply With Quote
Old 09-06-2006, 02:23 PM   #7
DHer
Addict
DHer doesn't litterDHer doesn't litter
 
Posts: 261
Karma: 156
Join Date: Jul 2006
Device: iliad
That sounds great.

Well, if someone manages to compile it with patch for windows, *please* upload the executable. I didn't manage.
DHer is offline   Reply With Quote
Old 09-06-2006, 07:17 PM   #8
vranghel
Addict
vranghel began at the beginning.
 
vranghel's Avatar
 
Posts: 285
Karma: 10
Join Date: Apr 2006
Location: Vancouver, Canada
Device: Proud Iliad owner
Quote:
Originally Posted by DHer
That sounds great.

Well, if someone manages to compile it with patch for windows, *please* upload the executable. I didn't manage.

I second that!
vranghel is offline   Reply With Quote
Old 11-06-2006, 03:42 AM   #9
Jauco
Junior Member
Jauco began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2006
Device: HTC Wizard
How to install the pdf to abiword processor

Hey, I didn't see this thread earlier but I like to positive tone

I'm the guy trying to write the pdf plugin. ATM if you can't install the patch, you probably don't want to, because the program is buggy as some infernal place.

The past 2 months where increadibly busy for me, so I didn't do much work on it but once I get most of the bugs out of the code, I will try to get it released with the windows version of abiword.

Greets,

Jauco
Jauco is offline   Reply With Quote
Old 11-06-2006, 04:22 AM   #10
Antartica
Evangelist
Antartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-booksAntartica has learned how to read e-books
 
Posts: 415
Karma: 754
Join Date: Jun 2006
Location: Madrid, Spain
Device: iliad, onhandpc, newton, zaurus
Hi Jauco!

Thanks for taking the time to register here and replying :-)

Quote:
Originally Posted by Jauco
Hey, I didn't see this thread earlier but I like to positive tone

I'm the guy trying to write the pdf plugin. ATM if you can't install the patch, you probably don't want to, because the program is buggy as some infernal place.
Mmm... anyway I would greatly appreciate the needed info for compiling it and experimenting with the buggy version O:-).

I only need a bit of information:

1. The linux distribution/linux version that you're using to compile
2. The compiler version
3. The libpoppler and abiword version

I hope that with that information I will be able to replicate your compilation success ;-)

Quote:
Originally Posted by Jauco
The past 2 months where increadibly busy for me, so I didn't do much work on it but once I get most of the bugs out of the code, I will try to get it released with the windows version of abiword.
Great! Thanks for taking the time to do such a needed plugin :-)

Antartica
Antartica is offline   Reply With Quote
Old 11-06-2006, 11:03 AM   #11
Jauco
Junior Member
Jauco began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Nov 2006
Device: HTC Wizard
I'm using a vanilla ubuntu linux "dapper drake"

compiler : whichever came with dapper drake (4.0.3 I think)
poppler source: cvs from back then. I'd suggest using the latest release
abiword source: Doesn't matter. latest release will be fine.
Jauco is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-600 Word and PDFs always named 'Tell me' houndstooth Sony Reader 3 07-24-2010 04:42 AM
PRS-505 Word Processor Template for Sony prs505 sized readers BookCat Sony Reader 2 04-22-2010 01:42 AM
Iliad Book Edition: a viable word processor? lotusindigo iRex 12 08-10-2009 10:32 PM
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 crutledge ePub Books 0 03-20-2009 08:09 AM
Keyboard and Word Processor Devlar iRex 2 06-11-2007 03:43 AM


All times are GMT -4. The time now is 07:55 AM.


MobileRead.com is a privately owned, operated and funded community.