Quote:
Originally Posted by sealbeater
Anything that can be done manually can be scripted.
|
Well, not at all. Or better said, not yet.
If you want to decide if a number in a text is a left over page number or anything else which belongs to the text, you need contextual information. Just because it is a number you cant just delete it. may be its page number which needs to go away. May be a paragraph ends with that page number and the next paragraph has to start on its own. May be the page number dissipated a paragraph and after removing the page number the two objects have to be joined to one paragraph. Or its not a page number, it might be a year, a month, an age or whatever.
I really would like to see a script which can makes such decisions on its own with an accuracy of lets say 95%.
And this is only one of many issues you have when to try to make a gut epub out of a pdf conversion.
As Darryl already mentioned: i've the same impression that you don't have any glue what pdf is. Its not a markup language. It does not differ between text in bold and text in bold which is a headline.
Quote:
Originally Posted by sealbeater
EPUB is just compressed HTML
|
It isnt. There are some files around. It is XHTML. And it allows only a subset of CSS 2.1. Which makes it more complicated.