View Single Post
Old 10-08-2018, 09:07 AM   #93
Vroni
Banned
Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'Vroni knows the difference between 'who' and 'whom'
 
Posts: 168
Karma: 10010
Join Date: Oct 2018
Device: Tolino/PRS 650/Tablet
Quote:
Originally Posted by sealbeater View Post
Anything that can be done manually can be scripted.
Well, not at all. Or better said, not yet.

If you want to decide if a number in a text is a left over page number or anything else which belongs to the text, you need contextual information. Just because it is a number you cant just delete it. may be its page number which needs to go away. May be a paragraph ends with that page number and the next paragraph has to start on its own. May be the page number dissipated a paragraph and after removing the page number the two objects have to be joined to one paragraph. Or its not a page number, it might be a year, a month, an age or whatever.

I really would like to see a script which can makes such decisions on its own with an accuracy of lets say 95%.

And this is only one of many issues you have when to try to make a gut epub out of a pdf conversion.

As Darryl already mentioned: i've the same impression that you don't have any glue what pdf is. Its not a markup language. It does not differ between text in bold and text in bold which is a headline.

Quote:
Originally Posted by sealbeater View Post
EPUB is just compressed HTML
It isnt. There are some files around. It is XHTML. And it allows only a subset of CSS 2.1. Which makes it more complicated.

Last edited by Vroni; 10-09-2018 at 04:31 AM. Reason: typos
Vroni is offline   Reply With Quote