MobileRead Forums - View Single Post

Vroni · 10-08-2018, 09:07 AM

Quote:

Originally Posted by sealbeater

Anything that can be done manually can be scripted.

Well, not at all. Or better said, not yet.

If you want to decide if a number in a text is a left over page number or anything else which belongs to the text, you need contextual information. Just because it is a number you cant just delete it. may be its page number which needs to go away. May be a paragraph ends with that page number and the next paragraph has to start on its own. May be the page number dissipated a paragraph and after removing the page number the two objects have to be joined to one paragraph. Or its not a page number, it might be a year, a month, an age or whatever.

I really would like to see a script which can makes such decisions on its own with an accuracy of lets say 95%.

And this is only one of many issues you have when to try to make a gut epub out of a pdf conversion.

As Darryl already mentioned: i've the same impression that you don't have any glue what pdf is. Its not a markup language. It does not differ between text in bold and text in bold which is a headline.

Quote:

Originally Posted by sealbeater

EPUB is just compressed HTML

It isnt. There are some files around. It is XHTML. And it allows only a subset of CSS 2.1. Which makes it more complicated.