View Full Version : Converting PDFs


macrotor
03-29-2003, 12:07 PM
Okay, I'm just going to get this thread started before I forget. Currently, I'm trying to get pdf2html to work on MacOS X. I'm getting assistance from their sourceforge forum. I have one more glitch to nail and it will be done. The basic idea is that it will convert PDFs to html, then iSilo converts the files (with pictures) to an ebook. Good stuff if I can get it to work.

Jim

daught
03-30-2003, 05:10 PM
Jim,

Where did you get this software? Version Tracker OS X or from one of the Palm download sites? If it works, it sounds like a great alternative to Acrobat Reader for Palm. Thanks.

Gary

macrotor
03-30-2003, 08:37 PM
It's actually unix software. You can find it at:
http://www.sourceforge.net/projects/pdftohtml

As I said, I can't seem to get it to compile just yet. Once I do, I'll post instructions.

I'm trying to replace the Adobe Palm Reader because:
1) It is SLOW
2) It crashes my Tungsten W every time I try to view an image or quit the application.

Anyhow, if anybody else find s a solution, let me know! I'll keep working on this.

Alexander Turcic
03-31-2003, 02:46 PM
Originally posted by macrotor
It's actually unix software. You can find it at:
http://www.sourceforge.net/projects/pdftohtml

As I said, I can't seem to get it to compile just yet. Once I do, I'll post instructions.

I'm trying to replace the Adobe Palm Reader because:
1) It is SLOW
2) It crashes my Tungsten W every time I try to view an image or quit the application.

Anyhow, if anybody else find s a solution, let me know! I'll keep working on this.

There is also a graphical Windows frontend for pdfTohtml at http://www.wminds.com/downloads/pdf2htmlgui/ . You're absolutely right, using iSilo for reading converted .pdf files would be the dream solution.

Alexander Turcic
03-31-2003, 02:52 PM
There is also a Windows shareware app called PDF2HTML at

http://www.verypdf.com/pdf2htm/index.html

Curious how that one performs though.

Alexander Turcic
03-31-2003, 04:13 PM
Forget my last post. This tool by Verypdf seems to only create jpg files of the pdf. Not useful at all for our purposes.

Alexander Turcic
04-04-2003, 06:35 AM
Today I stumbled over a very good PDF -> HTML convertor called Txt2Latex :D

You can find version 0.49 here (http://people.freenet.de/smaarlin/). Unfortunately, the site is in German; if you don't understand it, try to read the babelfish'ed version (http://babelfish.altavista.com/babelfish/urltrurl?lp=de_en&url=http%3A%2F%2Fpeople.freenet.de%2Fsmaarlin).

Download here (http://people.freenet.de/smaarlin/txt2latexv049.rar).

Requirements: Windows, Java

Alexander Turcic
04-04-2003, 06:39 AM
Here is another PDF -> HTML which I haven't tried myself:

http://ctdeveloping.com/ctdeveloping/products/pdfconverter_info.asp

Alexander Turcic
04-04-2003, 06:46 AM
Gemini Solo is supposed to be one of the best PDF -> HTML convertors, you can find info here: http://www.iceni.com/soloSet.html

Alexander Turcic
04-04-2003, 10:12 AM
Gemini Solo looks REALLY good. There is a crippled demo available at http://www.iceni.com/downloadSet.html

What is really nice that Gemini automatically identifies multicolumn texts and generates a single-flowed text from it. Very nice for reading on the PDA.

Gemini seems also very fast from what I can tell after my first few tests.

macrotor
04-04-2003, 02:37 PM
Got pdftohtml to work on MacOS X! And it's free! I'll need a little time to set up the instructions. Wow, it does the graphics and everything. Woo hoo!

macrotor
04-04-2003, 02:40 PM
Just a comment a quick comment about pdf converter software: I think it would be best to just try to format the contact as exact as possible to the original pdf document. Let iSilo do all the re-formatting. That way you aren't trying to juggle reformatting between two pieces of software and get unexpected results.

Oops! I need to change my sig. I finally got my Tungsten W!

Trucido
04-05-2003, 06:12 AM
Originally posted by Alexander
There is also a Windows shareware app called PDF2HTML at

http://www.verypdf.com/pdf2htm/index.html

Curious how that one performs though.

I have tried this one. The results is not that great with lots of mis-alignment.

daught
04-05-2003, 05:29 PM
Originally posted by macrotor
Got pdftohtml to work on MacOS X! And it's free! I'll need a little time to set up the instructions. Wow, it does the graphics and everything. Woo hoo!

Way to go, Jim! Please keep us OS Xers informed on your progress!

Gary

BasilC
04-06-2003, 04:19 PM
Originally posted by Alexander
Gemini Solo looks REALLY good. There is a crippled demo available at http://www.iceni.com/downloadSet.html

What is really nice that Gemini automatically identifies multicolumn texts and generates a single-flowed text from it. Very nice for reading on the PDA.

Gemini seems also very fast from what I can tell after my first few tests.

It ought to be really good - it costs £159!

BasilC
04-06-2003, 04:22 PM
Originally posted by BasilC
It ought to be really good - it costs £159!

Make that $159 - but it's still serious money!

BasilC
04-06-2003, 06:13 PM
Originally posted by Alexander
Gemini Solo is supposed to be one of the best PDF -> HTML convertors, you can find info here: http://www.iceni.com/soloSet.html

I've just downloaded and tried this, but I can't get it to convert more than the first five pages, and even those didn't come out perfectly.

Am just trying out pdaConverter 1.1, which is available via PalmGear. Slightly puzzled so far. It looks as if this was basically intended as a gui for conversion to of web pages for viewing on Plucker, before Plucker Desktop was available. But it does claim to be able to convert pdf. We shall see... Anyone had experience of this and able to give some advice?

Alexander Turcic
04-06-2003, 06:30 PM
I have heard of pdaConverter, but I haven't tried it because I think it directly converts to palm .DOC format. Right?

Anyways, what problems did you have with Gemini Solo ("those didn't come out perfectly")?

BasilC
04-07-2003, 06:42 PM
Originally posted by Alexander
I have heard of pdaConverter, but I haven't tried it because I think it directly converts to palm .DOC format. Right?

It also claims to be able to convert to Plucker format, but I couldn't get it to work, I think it was writtetn for an older version of Plucker. I was pretty mystified by how to use pdaConverter in general, but did get it to produce a text-only document that works well in iSilo (but presumably isn't compressed). I think that in view of your expertise in assessing iSilo vs. Plucker, you might want to have a look at it. It may just be a question of producing clearer instructions.

Originally posted by Alexander Anyways, what problems did you have with Gemini Solo ("those didn't come out perfectly")?

I tried it with three random pdf files. All three were reduced to the first five pages (maybe you have to register to get full documents?), but in other respects the first two were rendered very well. The third was in Italian, and words were run together like this:

"“Afuriadidiscorrereediragionarcisu,queitremonellif inironoperpersuadersiche,avendereiloro libri di scuola, facevano un'operazione d'oro.Lostessogiorno,Cesare,conunfagottosottoilbra ccio,andòincercadiunrivenditoredilibriusati: e quand'ebbe in tasca le tre lire, gli parve di aver toccato il cielo con un dito.Laserachedovevanoandarealteatro,finserotuttie trediavereungransonno:ecomefecerobene la loro parte in commedia!...“Io non posso più tenere gli occhi aperti”, diceva Cesare.“Io dormo e cammino”, diceva Orazio.“Un sonno come stasera, non l'ho avuto mai”, diceva Pierino.“Seavetesonno”,disselaloromamma,“èunamalat tiachesiguariscepresto!Andatealettoenon se ne parli più.”I tre ragazzi non se lo fecero ripetere: presero il loro candeliere e si chiusero in camera.“È meglio che ci vestiamo subito”, disse Cesare.“E poi?”“E poi s'entra a letto.”“Equandovienelamammaadarciilsolitobacioditu ttelesere?...SecitrovavestitidaRigoletti?...”“Che discorsi! Prima di chiamar la mamma, si spenge la candela.”“E se la mamma entra in camera col suo bravo lume acceso?”“Hai ragione. Bisogna ricordarsi di star coperti perbene fino al collo...”"

OK, Italian has some quite long words, but not that long!

It could be that there was something wrong with the original pdf document. I'll try it out a few times more when I get a chance.

Alexander Turcic
04-07-2003, 06:57 PM
Yup, only the first 5 pages are converted in the unregistered version. And I admit, it is very pricy. My point is to find all available pdf->html convertors and then sum up the best.

Maybe you should really try to convert some other .pdf files as well. I've tried a couple of English ones and the outcome was very nice.

I am sure there are other, maybe better, alternatives out there. Maybe someone can find the time and compile this (http://www.turcic.com/forums/showthread.php?postid=1667#post1667) here for the PC and MAC and make it available to everyone to try.

BasilC
04-07-2003, 07:28 PM
Here's a free one to try: PDFTEXTextractor (http://ctdeveloping.com/ctdeveloping/products/pdftextExt_info.asp)

It appears to be completely free, but only if you trial one of their other applications. Haven't downloaded or tried it yet.

Alexander Turcic
04-15-2003, 04:31 PM
As mentioned earlier, a good free solution to convert PDF to HTML is the open-source project pdftohtml (http://sourceforge.net/projects/pdftohtml/).

If you are a Windows user, you can get the Windows binaries from here (http://sourceforge.net/project/showfiles.php?group_id=45839&release_id=137910)
(you can try to download the current latest v0.35 directly from here (http://easynews.dl.sourceforge.net/sourceforge/pdftohtml/pdftohtml_0_35-win32.zip))

This version is console, i.e. text-based. If you prefer a nice and easy graphical interface for it, get it from here (http://www.wminds.com/downloads/pdf2htmlgui/) (note that the console version is still required).

Maybe someone feels like posting his/her impressions of this tool.

Alex

Pride Of Lions
04-15-2003, 07:23 PM
...PorDiBle (http://pordible.ethelthefrog.net) because it's easy and converts stuff to other stuff. PDB to HTML, TXT to PDB, this to that and that to this.

I use it whenever I download a TXT from Project Gutenberg and convert it to PDB so my DeepReader can find it.

Works wonders.
POL9A

Saud
04-21-2003, 05:08 AM
Originally posted by Alexander

Maybe someone feels like posting his/her impressions of this tool.



i posted on this issue here (http://discussion.brighthand.com/palmhandhelds/showthread.php?threadid=27111)
though its abit of a repeatition of what has been already suggested on this thread

Alexander Turcic
04-21-2003, 05:51 AM
Originally posted by Saud
i posted on this issue here (http://discussion.brighthand.com/palmhandhelds/showthread.php?threadid=27111)
though its abit of a repeatition of what has been already suggested on this thread
Ah thanks for the link. I saw you posted something about IntraPDF (http://www.intrapdf.com/) which looks also very promising. It costs money but at least there is a Trial version to test.

Alexander Turcic
04-23-2003, 07:11 AM
This one looks *very* promising:

http://www.scansoft.com/PDFConverter/Beta/

and

http://www.scansoft.com/pdfconverter/beta/download/

Description:
Unlock the information trapped in PDF files! The ScanSoft PDF Converter is a new plug-in for Microsoft Office 2003 that allows you to open and convert PDF files from within Microsoft Word. It automatically converts PDF files into documents that you can edit, maintaining column flow while separating text from graphics and tables. Instantly re-purpose entire documents, or cut/paste charts and graphs into Microsoft Word - quickly and easily.

sUnShInE
04-23-2003, 04:58 PM
Wow. That sounds a bit disturbing in some ways. I know alot of people (myself included) who use Adobe Acrobat *.pdf format for things like legal documents or other materials which need to be transmitted electronically, but cannot be altered. The whole point of putting them into that format was so they can't be deconstructed.

hesire
04-26-2003, 11:15 PM
to all ppl who may find it useful..


im no xpert but i knows theres this app called

Iceni Solo from http://www.iceni.com

its a tool/ for Adobe acrobat to convert PDFs to html/Doc/txt

and once u have done that u can convert it via IsiloX..


There are only two bad notices:
1.I think is german, so all manuals are in GERMAN
2. I havent found a way to find a FREE copy .


anyway hope someone finds this useful

hesire
cya :D

Alexander Turcic
04-27-2003, 06:09 AM
(Hesire, I moved your posting to this thread where we discuss PDF->HTML conversions)

Indeed, Gemini Solo by Iceni is a great tool to convert PDF files and you can find a demo as mentioned here (http://www.turcic.com/forums/showthread.php?postid=1772#post1772). Unfortunately, it is not free and the demo you download is crippled (unless you have the appropriate registration number).

You should also have a look at new product by Scansoft (http://www.turcic.com/forums/showthread.php?postid=2199#post2199) which looks very promising from what it says.

Cheers!!

hesire
04-27-2003, 12:15 PM
hehehe sorry

i should have read this oine first


cya

macrotor
04-27-2003, 02:54 PM
Okay, I got pdftohtml to work on MacOSX. I was hoping to take a little more time and make it work directly from the Print window using the PDF scripting feature, but I just haven't had the time. In any case, here is how to compile it:

Get the sourcecode from:
http://pdftohtml.sourceforge.net/

It was at version 0.35 when I did this. If it is now a later version, than the following fix will not be necessary:
In the "src" directory, you will find a file called HtmlOutputDev.cc. Open it in a text editor and go to line 791. Remove the "= 1" from the part that reads "firstPage = 1". Save and close.

Okay, open up your terminal. We have to set OS X to use the older gcc compiler. Type the follwoing command:
sudo gcc_select 2

Now it's time to move into the pdftohtml directory and compile it using "make all".

Reset your compiler to the new version:
sudo gcc_select 3

There should be a new "pdftohtml" executable files. That file is all you need, so put it somewhere in your path. Type "pdftohtml -help" to get basic instructions on how to use it.

I always use the "-c" option so that I have a near exact replica of the formatted PDF. I then let iSilo do all the stripping. It depends on how fast you want it to work.

At this point, You'll have to help me experiment on what are the best settings. Exact replicas don't always fit on a Palm screen well. I hope this works for you!

daught
05-08-2003, 07:23 PM
Jim,

Is there any way you can provide us very, extremely, terrifyingly timid terminal users with your compiled OS X compatible version of pdftohtml?

Gary

macrotor
05-09-2003, 07:35 PM
Okay, I'll post the binary here. However, I tested it on a clean system and found that you still need to install GhostScript. The absolute best way to do this is by using fink (http://fink.sourceforge.net).
You can use the Fink Commander application so that you can avoid the terminal. Just install the BINARY version of Ghostscript 8.00, then put this file somewhere in your path (like /sw/bin). I tried one standalone ghostscript installer, but it lacked png support, so you have to use Fink to get a full install.

I'm afraid this doesn't get any easier if you want it for free. The opensource community requires a little elbow-grease from their users!

Here is an example command to get an accurate page for iSilo. Mind you, if you don't care about the tables and fixed formatting, then remove the '-c' option.

pdftohtml -c -noframes example.pdf example.html

This will create the example.html files and all the png graphics in the current directory. It would be great to wrap this in an installer with a GUI, but I have a pregnant wife that believes I have better things to do! At least I can finally carry all my tech manuals in my pocket with a searchable index. I hope this gets you started!

Alexander Turcic
06-16-2003, 01:38 AM
If you can afford it: Adobe Acrobat 6.0 allows you to export any PDF file to HTML 3.2, HTML 4.01 with CSS, DOC, RTF, TXT, XML 1.0. Very cool!

cbarnett
06-16-2003, 05:42 AM
I have access Acrobat v5 at work, if I want it. Can you export like that in v5?

Alexander Turcic
06-16-2003, 08:51 AM
Nope the great export functionality is new in Acrobat 6.0. We have it here at work and I just tested a few PDF files... the outcome is amazing and I don't have to tell you how much I enjoy reading them with iSilo now :)

BasilC
06-16-2003, 07:09 PM
Originally posted by Alexander
If you can afford it: Adobe Acrobat 6.0 allows you to export any PDF file to HTML 3.2, HTML 4.01 with CSS, DOC, RTF, TXT, XML 1.0. Very cool!


The free Adobe Reader 6 allows you to save a pdf as a text file. It seems to work OK, except that it leaves in page headers and footers and page numbers. Adobe Reader 6 will also read pdf files out loud (in an American accent, naturally)!

Incidentally, how much does Acrobat cost? Does the standard version do the conversions to HTML or just the Professional version?

BasilC
06-16-2003, 07:13 PM
Originally posted by BasilC
Incidentally, how much does Acrobat cost?

I just found out the price on the Adobe website. Standard is £287, Professional £440. Forget it!

wumpi
08-07-2003, 04:31 AM
Do you guys know Raphael Fetzer's pdaConverter (http://www.freewarepalm.com/utilities/pdaconverter.shtml)?

pdaConverter simplifies the creation of documents for PalmOS based handhelds and works only under MS Windows. The daily synchronisation of webpages/channels can be done, too.

pdaConverter supports the following filetypes:jpg, gif, png, pdf, html, rtf, hlp, wpd, txt, Aportis Doc
And can produce these filetypes:Plucker, Aportis Doc, zTXT

Really useful if you use Plucker!

BasilC
08-07-2003, 06:57 PM
I downloaded pdaConverter, but it won't install properly on my computer, for some reason. Pity, it looks interesting.

macrotor
08-12-2003, 04:33 PM
Well, pdftohtml has been less than wonderful. If you want formatted text, it's fine. However, it converts all graphics on a page into a single background graphic. Not very good for iSilo.

Oh well, I'll keep looking.

vitalyb
09-26-2003, 11:17 AM
Where is the option in Acrobat 6?

Alexander Turcic
01-11-2004, 04:52 PM
PDF Plain Text Extractor (http://www.retsinasoftware.com/extract-convert-pdf-to-text.htm) is another tool that can extract plain text from PDF files without any PDF SDK or other third party lib's help.

You don't need any products from Adobe (neither Adobe Acrobat Reader nor Adobe Acrobat) installed on your computer. P2T focus on text extraction from pdf file. It analyzes the raw pdf file directly and extract plain text from it. The layout of the document is reserved.

Haven't tested it yet, trial is available though; full costs $59.95 :(

BasilC
01-12-2004, 05:49 PM
I finally managed to install pdaConverter. I can get it to convert pdf text to Plucker format pretty well, and it's then much quicker to read than using Adobe Reader for Palm. However, I can't get it to convert images, even though I tick the option to do this. In general, it looks like a very useful program, the only trouble is that there isn't a usable manual. Anyone figured out how to make full use of it?

sas
01-14-2004, 10:53 AM
.... However, I can't get it to convert images, even though I tick the option to do this.

You mean pictures included in the original PDF, or any picture? I was never able to convert images from the PDF either, but for JPGs and GIFs it works fine both with add file / from clipboard and through system integration.

If you did not find enough information in the help file - why not to e-mail Raphael, or post here?

Enjoy

BasilC
01-14-2004, 07:23 PM
You mean pictures included in the original PDF, or any picture? I was never able to convert images from the PDF either, but for JPGs and GIFs it works fine both with add file / from clipboard and through system integration.

If you did not find enough information in the help file - why not to e-mail Raphael, or post here?

Enjoy

Yeah, you're right, I was just being lazy, I'll get in touch with him.

I did mean pictures embedded in pdf files. When it comes to just moving jpgs or gifs, I use HandStory, which is great in this respect. You just right click, then you get a whole host of resizing and other options.

rsprim
01-14-2004, 09:58 PM
Nope the great export functionality is new in Acrobat 6.0. We have it here at work and I just tested a few PDF files... the outcome is amazing and I don't have to tell you how much I enjoy reading them with iSilo now :)

Alexander how are you converting them to read in iSilo? Are you saving them as html and converting them to a .pdb using iSiloX?

Robert

Alexander Turcic
01-15-2004, 03:20 AM
Exactly. However, make sure you have installed the full Adobe Acrobat (which is not free); the Adobe Acrobat Reader cannot export to .html (at at least to my knowledge).

Alexander Turcic
01-27-2004, 01:47 PM
I just stumbled over another "PDF->many formats" convertor.

ABC Amber PDF Converter (http://www.thebeatlesforever.com/processtext/abcpdf.html) allows you to convert PDF to any document format (HTML, CHM, RTF, HLP, TXT, DOC, DBF, XML, CSV, XLS, MCW, WPS, SAM, RFT, WS4, WS7, WRI, etc.) easily and quickly. You can export all pages or just selected pages, as plain text or as preview pictures.

Haven't installed it yet; a demo version is available (limited to 30 days + 5 pages export only). Full costs $12.95.

ignatz
03-14-2004, 11:19 PM
There is also a graphical Windows frontend for pdfTohtml at http://www.wminds.com/downloads/pdf2htmlgui/
This link appears to be toast. But it has reappeared at http://guiguy.wminds.com/downloads/pdf2htmlgui/. Haven't tried it yet, but I have a couple of pdfs that I'd like to convert. I'll report back.

ignatz
03-15-2004, 12:28 AM
Hmmm, pdf2html works okay I reckon, but it doesn't perform quite as I'd like. It created <br>'s at the end of each line, and I'd much rather those went away so that the formatting could work itself out a little better. And there's no easy way to pull those out without losing the relevent ones (is there?). Didn't have any images in my document, so no issues there. I also didn't like the horizontal line between each physical page, though those are easy to find/replace away. But hey, it did the job for free. Love that. Now I might have to see if I can get my boss to get us Acrobat 6. Hee hee...

cawinters
03-16-2004, 12:58 PM
I know this is slightly off topic as it doesn't convert to HTML, but I have numerous very large PDFs that were just too large to use with Acrobat Reader on PPC. I converted them with Repligo and it works well. The files are 1/3 the size, contain pics and are well formatted

cbarnett
03-16-2004, 05:17 PM
Repligo is very nice. I will be registering the PPC version when v2 is released in the next month or so, as it finally does all I want; text selection, searching, and BOOKMARKS!!! :D

Craig.

Bob Russell
07-27-2004, 11:50 PM
Based on the info above, I tried PDF2HTMLgui. I used it on the Pimlico Datebook5 manual, which is fairly complicated, and has lots of pictures with text around them (screenshots mostly).

It works great using the method I describe in what follows. But as you can see at the bottom of this post, I can't convert it into a nice iSilo document.
Any ideas? (Hope I didn't repeat too much from the rest of the thread... I did read it but too quickly.)

The download I used was at: http://guiguy.wminds.com/downloads/pdf2htmlgui/down.html

The idea is that this program is just a gui shell, and the guts are contained in two additional required programs (as indicated on that page) that must also be downloaded:
GhostScript (v8.14) and PDFtoHTML

You need to be sure to grab the windows binary executables, not the source files.

To install, you install ghostscript, and extract PDFtoHTML files into a directory of your choice (that's the command line program). Then extract the gui program .exe into the same directory as the command line program.

When you run it, you'll need to tell it where to find the ghostscript executable. There were two to choose from, and I took the first alphabetically. I think it was without the "c", but I can't remember.

You then give it the source file .pdf location (make sure the directory names are not too long because it will choke if the command line is more than 255 chars, which includes the full path and filename of the source and destination, plus more).

It supplies a candidate destination. You can choose a bunch of options I don't understand, but the key option is the "generate complex document" which is visible as a conversion checkbox after you choose "more options". When I didn't choose it, the conversion was just text and didn't accomplish much for me at all.

The complex option gives me a "framed" page HTML result with index on left side and single pages on the right side, and it looks gorgeous. You might want to choose another folder for the result because there can be a LOT of files.

The conversion went pretty quickly, and was very effective. Note also that the complex option requires the ghostscript executable.

THE REMAINING PROBLEM...
I haven't figured out the remaining trick of getting it into iSilo... when I do a straight conversion of the index page, I thought I'd get a nice document. But all I got was pages with the pictures and separate text. Not pictures on a page with text surrounding it at the right location.

Interestingly, it opens and displays fine with Internet Explorer, but with MS Word, it shows improperly just like in the converted iSilo document.

One would think that the HTML conversion works just like any other iSiloX conversion with images (after all that work to get images in the document, you probably don't want to lose them now!) And I always convert to a memory card destination. But the results were not right.

Help?

ignatz
07-28-2004, 12:04 AM
Bob, have you had a look at the HTML? There must be something strange in there... Perhaps you can manually edit it?

Colin Dunstan
07-28-2004, 03:03 AM
Bob, can you attach your html file (perhaps zipped) here?

macrotor
09-03-2004, 01:27 PM
I have returned from active duty! Whoa, this thread is STILL going? Wow.
First off, pdf2html will never do graphics well for iSilo because it converts all the graphics into a merged background picture with text placed over it. iSilo will either ignore the background, or separate the text from the graphics. Adobe Acrobat is much more intelligent (get what you pay for, I guess).

It's good to be back. Got iSilo 4.1 loaded on my shiny new Tungsten|C! How may I be of service?

macrotor
09-03-2004, 01:28 PM
Yikes, I guess I better change my sig. Got the Tungsten|W. Used the heck out of it. Now using Tungsten|C. Heh.

hacker
09-03-2004, 03:57 PM
The output of Adobe 6.0 to HTML 3.2 isn't all that it is cracked up to be. Try it on a very complicated PDF, and take a look at the results.

The first page or two might look good, but when you get down to page 300, 500, 800, and later... things get really ugly. It starts to disjoint words, text, reflows paragraphs wrong, and lots of other weird things. It also seems to ignore some graphics in complicated PDF documents, claiming that they are missing required attribute (which I find odd, since Adobe Acrobat created the original PDF in the first place!)

The best approach is to use a combination of tools, and then go in and hand edit the whole thing back again to a usable format for reading on a PDA or handheld device.

BasilC
09-03-2004, 08:09 PM
I'm on the verge of registering Repligo 2. It's the best solution to the pdf problem that I've found to date, including Adobe Reader, which is pretty poor.

macrotor
09-04-2004, 12:19 AM
If you seriously work with PDF files on your Palm, then you really can't go wrong with Repligo. It really is the best at PDF conversion. However, I don't use PDF enough to make it worth the cost.

hacker
09-04-2004, 05:54 PM
I just downloaded the latest pdaConverter's latest beta, and it appears he STILL isn't complying with the license of quite a bit of the software he includes in there. He's got Plucker, Python, and quite a few other Free Software and Open Source products and projects in there, but he neglects to include the proper COPYING file and other related licenses, as required by those licenses.

He's been out of compliance since at least July of 2002, maybe earlier, when we talked to him about this exact issue.

I'll raise it again with him and see what he "forgot" this time.

abk8445
08-14-2011, 07:10 PM
I am a new member, very illiterate in conversion mechanism, html, epub, etc. and I am learning fast. What I am reading here seems to be helpful to my case. I have few technical and scientific books in PDF format. The TOC, Figures (some with chemical formulas) seem to be very stubborn in getting converted to epub (by Calibre). I will try the instructions in this thread and see if I can get anywhere. It is wonderful to be among experts. Of course any advice is greatly appreciated