Quote:
Originally Posted by fjtorres
Maybe you can.
Have you written a decompiler before? Or a BASIC interpreter? Or a program to convert, say, FORTRAN TO C++?
That is the scope of the problem.
|
I guess we'll have to agree to disagree. I've converted PDF to editable formats quite readily and easily. Only problem is if the PDF is composed of images and even then, it's easy enough to convert to html and from there, I assuming to epub. I still haven't looked into a suitable way to create an epub from cmd line.
Quote:
Originally Posted by fjtorres
See, pdf is not a simple data format. It is a full programming language, derived/extended from Postscript. pdf files are software, not bitmaps or encoded text blocks. You can write games and malware in pdf.
|
None of this is news. Nor is it really relevant.
As an aside, you can't write malware in pdf. Nor do I find it likely you can write games in "PDF". You can encapsulate code but I would like to see malware **written** in *pdf*, whatever that means. I'm aware of PostScript being a programming language but not PDF.
Quote:
Originally Posted by fjtorres
Converting pdf to an editatable format is one of the great challenges of the age. Many have tried, millions in currency have been spent, none have fully succeeded. All require extensive manual cleanup.
If you succeed, people will shower you with cash.
Good luck!
|
I'm assuming you've never used the Poppler tools then. That or your gift for hyperbole is unmatched.
pdftotxt ring a bell? Although personally, I would probably try to convert to XML. to have best chance of perserving italics and bolds.