I am not sure whether this is possible just with regex, I've found some regex + javascript solutions but I don't know how to do that.
I'm hoping to make some kind of system for inserting a pagebreak marker at pre-determined intervals to make a page-list in books which have no paper equivalent.
My idea is, using regex, count X number of characters (including spaces, but text only, no code), and insert a marker after each set, which I can then turn into a properly numbered and formatted epub3 pagebreak.
So for instance, every 1490-1500 characters, insert the marker <span class="pbk" /> (or whatever).
Ideally, I want to avoid inserting it inside words or inside any html code, but it can go between words or between paragraphs.
Ideal results:
(1498 characters)word <span class="pbk" />next word. OK
(1500 characters)</p><span class="pbk" /><p> OK
(1500 characters)<span class="pbk" /></p><p> OK
(1500 characters)</p><p><span class="pbk" /> OK
Undesirable results:
(1499 characters)<<span class="pbk" />/p><p> AVOID
(1498 characters)wo<span class="pbk" />rd AVOID IF POSSIBLE
If there is no way to exclude mid-word insertions I will figure something out but I definitely want to avoid insertions inside html tags.
I tried:
search
((.){20,25})
replace
\1<span class="pbk" />
Search options:
No regex options selected
Text only
(I was using smaller numbers to make it easier to test), but for some reason it does not exclude the code.
I also tried this, hoping to avoid mid-word insertions:
((\w|\s){20,25}\s|\w+\s|$)
But it seems to select only one word + space, regardless of character count, and it excludes punctuation (including apostrophes).
If this is not possible by regex, is it the kind of thing that could be added as a feature, where the number of characters per "page" could be user-defined?
Alternately, if there is any easier way to make digital-only pagebreaks that I haven't heard about I am all ears.
Thanks in advance if anyone has any ideas.
|