View Single Post
Old 07-07-2020, 03:53 PM   #52
Hitch
Bookmaker & Cat Slave
Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.Hitch ought to be getting tired of karma fortunes by now.
 
Hitch's Avatar
 
Posts: 11,503
Karma: 158448243
Join Date: Apr 2010
Location: Phoenix, AZ
Device: K2, iPad, KFire, PPW, Voyage, NookColor. 2 Droid, Oasis, Boox Note2
Quote:
Originally Posted by Mister L View Post
Yes, depending on where your sources come from it can be a never-ending source of delight to discover all the ways people can "break" something as simple (in theory) as a chapter heading. Special mention for Gutenberg and their "an hr is as good as a new file" structure.

But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.

I really do think it's as close to a perfect solution as it's possible to get to simply find a way to automate copying the original TOC titles back into the files they link to... if you copy the title into an html comment I cannot even see how you could break the file at all, and that would be one single operation so you don't even have to figure out multiple scenarios. Obviously a bit of work would still be required after that to stick this text into the proper tag or add the attribute or whathaveyou but the most fastidious and annoying part would already be done, no copying and pasting by hand between two files, no mucking about with regexes for various wEiRd CaSeS and random spans or one-to-three br's or a's or sup or anything else, and the whole process would be much smoother because you wouldn't have nearly as many variations to adjust for.

I guess I am going to have to do like you and use it to "learn a lot about Python" (lesson 1: apparently Python is what I'll have to learn if I want to make my own plugin). I already have learned the painful lesson about having backup copies during previous "experiments".

How hard is Python to learn? (serious question). I am completely comfortable with html and css but I don't know any programming languages.
As far as I know, more than one person, like DNSB, has tried to automate that and so far, nobody has been successful. You're a formatter; you know full well that the circumstances are always different. If you assume that all your chapter titles will be a chapter number, in H1 and a subtitle in English (or whatever language), as an H2, then sure, you could probably engineer something--but I think that others have tried that and given up.

But, hey, if you find a way, good on ya.

Hitch
Hitch is offline   Reply With Quote