08-31-2018, 07:54 PM | #46 | |
Wizard
Posts: 3,032
Karma: 52740263
Join Date: Feb 2012
Location: New England
Device: PW 1, 2, 3, Voyage, Oasis 2 & 3, Fires, Aura HD, iPad
|
Quote:
Shari |
|
08-31-2018, 08:35 PM | #47 | |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
As for scripting it out in 20 minutes, I don't know *exactly* how long it would take for me to do it but I don't think it would be overly difficult. Take the 20 minute timeframe as an indicator of the relative difficulty of the problem, not an estimation of exact time spent. Most of it would be generating the epub file and how best to output the pdf for cleaning. |
|
Advert | |
|
08-31-2018, 10:22 PM | #48 | |
Grand Sorcerer
Posts: 11,732
Karma: 128354696
Join Date: May 2009
Location: 26 kly from Sgr A*
Device: T100TA,PW2,PRS-T1,KT,FireHD 8.9,K2, PB360,BeBook One,Axim51v,TC1000
|
Quote:
Have you written a decompiler before? Or a BASIC interpreter? Or a program to convert, say, FORTRAN TO C++? That is the scope of the problem. See, pdf is not a simple data format. It is a full programming language, derived/extended from Postscript. pdf files are software, not bitmaps or encoded text blocks. You can write games and malware in pdf. https://security.stackexchange.com/q...ontain-a-virus This may help get you started: https://nubuntu.org/postscript-vs-pdf http://tailrecursive.org/postscript/postscript.html Converting pdf to an editatable format is one of the great challenges of the age. Many have tried, millions in currency have been spent, none have fully succeeded. All require extensive manual cleanup. If you succeed, people will shower you with cash. Good luck! |
|
09-01-2018, 02:46 AM | #49 | |||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Quote:
As an aside, you can't write malware in pdf. Nor do I find it likely you can write games in "PDF". You can encapsulate code but I would like to see malware **written** in *pdf*, whatever that means. I'm aware of PostScript being a programming language but not PDF. Quote:
I'm assuming you've never used the Poppler tools then. That or your gift for hyperbole is unmatched. pdftotxt ring a bell? Although personally, I would probably try to convert to XML. to have best chance of perserving italics and bolds. |
|||
09-01-2018, 07:00 AM | #50 | |
Grand Sorcerer
Posts: 11,732
Karma: 128354696
Join Date: May 2009
Location: 26 kly from Sgr A*
Device: T100TA,PW2,PRS-T1,KT,FireHD 8.9,K2, PB360,BeBook One,Axim51v,TC1000
|
Quote:
Not at all comparable to what was requested. |
|
Advert | |
|
09-02-2018, 01:32 AM | #51 | |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Also, who said anything about keeping all the imperfections introduced by the text extractor? Have you ever heard of ispell? I already have a perl one-liner that can take care of hyphenated words. Do you even know how to chain commands together? Your thinking is extremely limited. Tell me, is the author of the Poppler tools your New Messiah? EDIT: Also, you never answered my twice asked question. Have you ever used the Poppler tools? EDIT2: I guess you don't know what a "script" is either. Last edited by sealbeater; 09-02-2018 at 01:54 AM. |
|
09-02-2018, 04:49 AM | #52 |
Wizard
Posts: 3,108
Karma: 60231510
Join Date: Nov 2011
Location: Australia
Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7
|
@sealbeater. If it is that easy, do it. I won't bore you with the history but there are many books available only as pdf's. I avoid this format like the plague on e-ink readers. Your comment about ispell in your last post simply showcases your ignorance, as it is showcased at many other places in your posts on this subject. Spell checkers generally are zero help with things like homophones or many OCR errors or layout errors. Nor do they deal with things like page headers and footers including page numbers which are fine with a fixed layout but not with a reflowable epub. Putting it quite bluntly I have never found a tool which takes a pdf as input and produces an intermediate format or an epub which does not require substantial manual editing. Results vary from readable to rubbish.
With a little coding knowledge I'm sure it is trivial to write a script that converts pdf to epub very badly. Do you think it is trivial to write such a script which reliably produces near perfect results? How about even marginally acceptable results? If you do it is time to put up or shut-up. If not, then ..... |
09-02-2018, 05:34 AM | #53 | |||||||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
As I already said, I have neither the time nor the interest. I may tho, I just may...I'm curious to see how good xml to html would be.
Quote:
Quote:
You can think I'm ignorant if you like however. I don't mind. Quote:
Quote:
Quote:
Putting it quite bluntly, no matter what, I've managed to find ways. Quote:
Quote:
As for putting up or shutting up, I'm required to do neither. I don't jump though hoops for you. Even if I were to write such a script, I doubt you would be able to even run it. I doubt you have even heard of half of the tools I could use. |
|||||||
09-02-2018, 07:26 AM | #54 |
Wizard
Posts: 3,108
Karma: 60231510
Join Date: Nov 2011
Location: Australia
Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7
|
@sealbeater. I'm very familiar with the problems involved in converting pdf to intermediate formats and epub, as I spent a little time on it some years ago before concluding it was simply not worth the effort in most cases. I'm quite familiar with all of the tools you are talking about. What you describe is a manual process of using these tools. Using sed to remove headers and footers is easy enough manually. But, as you should know, there is a world of difference between using these tools manually on a case by case basis, and incorporating their use into an effective "one size fits all" script.
You seem to have sufficient skills to accomplish the former but not the latter, despite your apparent high opinion of your skills and your apparent assumption that the rest of us don't have any. I look forward to seeing a working script from you, failing which: A man of words and not of deeds Is like a garden full of weeds Last edited by darryl; 09-02-2018 at 07:38 AM. |
09-02-2018, 08:22 AM | #55 | |||||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Quote:
Quote:
Quote:
Quote:
That's a nice quote, although unattributed. Here is another one. Those who say it can't be done are usually interrupted by others doing it. -- James A. Baldwin |
|||||
09-02-2018, 08:44 AM | #56 |
Grand Sorcerer
Posts: 12,165
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
Why not just stop wasting time in replying here and just code it
Sent from my Nexus 7 using Tapatalk |
09-02-2018, 08:51 AM | #57 |
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Because I don't jump though hoops for Internet forum users and I am doing other things at present. Why don't you give it a shot? I've left enough clues for a good starting point. So have others in this thread, although their response seems to have been ignored.
|
09-02-2018, 10:46 AM | #58 |
Wizard
Posts: 3,108
Karma: 60231510
Join Date: Nov 2011
Location: Australia
Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7
|
@sealbeater
The particular software which was the subject of this particular thread was not exactly a resounding success, nor does any other product appear to exist which performs the task adequately. You obviously don't have the skills to do it. In fact some of your comments make me wonder if you have any meaningful coding skills at all. You clearly have no idea of the complexities involved in writing such a script, yet sneer at others who point it out to you. There is a world of difference between manually converting a few pdf's on a case by case basis and scripting a general solution which gives acceptable results on most occasions. |
09-02-2018, 04:11 PM | #59 | |||
Banned
Posts: 666
Karma: 1752814
Join Date: Jan 2008
Device: Sony Reader PRS-505 : Onyx Boox Max : Sony PRS-900 : Onyx Kepler Pro
|
Quote:
Do what? Write a script? Convert a pdf to epub? As I said, I've already assessed your ability. You are entirely welcome to hold any opinion on me you want. Quote:
Quote:
A world of difference? An entire world? See, it's comments like that that make me judge your ability. |
|||
09-02-2018, 08:07 PM | #60 | ||
Wizard
Posts: 3,108
Karma: 60231510
Join Date: Nov 2011
Location: Australia
Device: Kobo Aura H2O, Kindle Oasis, Huwei Ascend Mate 7
|
Quote:
Quote:
Good luck. |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
PDF in epub? | Floeee | Software | 3 | 10-20-2009 05:52 PM |
PDFTOEPUB BY DNAML- WARNING | mets | News | 0 | 09-21-2009 01:16 PM |
Google releases 1 million public domain books in ePub format | joedevon | News | 25 | 09-02-2009 05:13 PM |