View Single Post
Old 07-07-2020, 09:57 PM   #54
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,445
Karma: 169098492
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Mister L View Post
But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.
My code—in theory—could pull from an epub2 toc.ncx, an epub3 nav.xhtml document or a html table of contents. The problem was the sheer number of special cases that had me wasting more and more time modifying the code as the complexity increased, time that I realized was taking longer than my manual process. I also ran into too many issues where trying to fix the code to work with one ebook broke it for a previously working ebook. Regressions 'Я Us.

Like most programming tasks, it is simple for the person who is not trying to implement it. For the person who is trying to implement it, you find yourself looking for a larger can so all the worms will fit back in.

"All the necessary elements are already in the file"? Bah, humbug. The issues are more that the structure of the epub is different. Even things like where the files are stored in the epub can be a PITA as in recent epub I edited where the text files were partly stored in the root of the archive and partly in a text folder.
DNSB is offline   Reply With Quote