View Single Post
Old 09-01-2018, 02:46 AM   #49
sealbeater
Banned
sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.sealbeater ought to be getting tired of karma fortunes by now.
 
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
Quote:
Originally Posted by fjtorres View Post
Maybe you can.
Have you written a decompiler before? Or a BASIC interpreter? Or a program to convert, say, FORTRAN TO C++?
That is the scope of the problem.
I guess we'll have to agree to disagree. I've converted PDF to editable formats quite readily and easily. Only problem is if the PDF is composed of images and even then, it's easy enough to convert to html and from there, I assuming to epub. I still haven't looked into a suitable way to create an epub from cmd line.

Quote:
Originally Posted by fjtorres View Post
See, pdf is not a simple data format. It is a full programming language, derived/extended from Postscript. pdf files are software, not bitmaps or encoded text blocks. You can write games and malware in pdf.
None of this is news. Nor is it really relevant.

As an aside, you can't write malware in pdf. Nor do I find it likely you can write games in "PDF". You can encapsulate code but I would like to see malware **written** in *pdf*, whatever that means. I'm aware of PostScript being a programming language but not PDF.


Quote:
Originally Posted by fjtorres View Post
Converting pdf to an editatable format is one of the great challenges of the age. Many have tried, millions in currency have been spent, none have fully succeeded. All require extensive manual cleanup.

If you succeed, people will shower you with cash.

Good luck!


I'm assuming you've never used the Poppler tools then. That or your gift for hyperbole is unmatched.

pdftotxt ring a bell? Although personally, I would probably try to convert to XML. to have best chance of perserving italics and bolds.
sealbeater is offline   Reply With Quote