Quote:
Originally Posted by Mister L
Yes, depending on where your sources come from it can be a never-ending source of delight to discover all the ways people can "break" something as simple (in theory) as a chapter heading.  Special mention for Gutenberg and their "an hr is as good as a new file" structure.
But, am I wrong in thinking that you also were using, as your starting point, the html files? I think you're completely right, if you do that, there are too many different possibilities to handle and you'll never manage to make something that can deal with all of them, and it's very very likely you'll break something. Which is exactly why I am not trying to do this using regex. But, BUT! if there is a good TOC in the file already and there could be a way to do a "reverse create TOC" basically, instead of having to resolve all those tricky problems you just go around them. I really believe it must be possible to automate that. Everything you need is already in the toc; the text is there, all you have to do is copy-paste, the link is there, all you have to do is follow it... all the necessary elements are already in the file.
I really do think it's as close to a perfect solution as it's possible to get to simply find a way to automate copying the original TOC titles back into the files they link to... if you copy the title into an html comment I cannot even see how you could break the file at all, and that would be one single operation so you don't even have to figure out multiple scenarios. Obviously a bit of work would still be required after that to stick this text into the proper tag or add the attribute or whathaveyou but the most fastidious and annoying part would already be done, no copying and pasting by hand between two files, no mucking about with regexes for various wEiRd CaSeS and random spans or one-to-three br's or a's or sup or anything else, and the whole process would be much smoother because you wouldn't have nearly as many variations to adjust for.
I guess I am going to have to do like you and use it to "learn a lot about Python" (lesson 1: apparently Python is what I'll have to learn if I want to make my own plugin). I already have learned the painful lesson about having backup copies during previous "experiments".
How hard is Python to learn? (serious question). I am completely comfortable with html and css but I don't know any programming languages.
|
As far as I know, more than one person, like DNSB, has
tried to automate that and so far, nobody has been successful. You're a formatter; you know full well that the circumstances are always different. If you assume that all your chapter titles will be a chapter number, in H1 and a subtitle in English (or whatever language), as an H2, then sure, you could probably engineer something--but I think that others have tried that and given up.
But, hey, if you find a way, good on ya.
Hitch