View Single Post
Old 10-11-2015, 02:34 AM   #4
otherpasserby
Junior Member
otherpasserby began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Oct 2015
Device: none
Hi Kevin,

Thanks for your reply, I appreciate it.

When converting pdf's into html a lot of the semantics get lost and oftentimes one (or I at least) cannot discriminate between different semantics by means of a regex. Only by comparing the original pdf and the resulting epub can one see what the original semantics was.

For example, my latest conversion was a pdf with verses. At he end of a verse their could be a little note in italic. And between verses there can be commentary in italic. Each note/commentary could contain multiple </i>xyz<i> when words need to be in normal case. So far, I have not been able to write the perfect regex to handle these situations.

After the crude regex work I always find myself comparing the pdf and epub page by page and fix any differences only the eye could detect. And mostly, different solutions are needed for the different scenario's.

So, I was thinking about a plugin which could remain open and showing a list of 'solutions' that could be applied after selecting text in the "Book View" or "Code View". This would very much ease the flow of manually correcting 400 pages.

Regards.
otherpasserby is offline   Reply With Quote