Thread: Regex examples
View Single Post
Old 02-13-2024, 07:21 PM   #762
Mister L
Groupie
Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Mister L is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Posts: 179
Karma: 91148
Join Date: Jun 2010
Device: Sony 350
I am not sure whether this is possible just with regex, I've found some regex + javascript solutions but I don't know how to do that.

I'm hoping to make some kind of system for inserting a pagebreak marker at pre-determined intervals to make a page-list in books which have no paper equivalent.

My idea is, using regex, count X number of characters (including spaces, but text only, no code), and insert a marker after each set, which I can then turn into a properly numbered and formatted epub3 pagebreak.

So for instance, every 1490-1500 characters, insert the marker <span class="pbk" /> (or whatever).

Ideally, I want to avoid inserting it inside words or inside any html code, but it can go between words or between paragraphs.


Ideal results:

(1498 characters)word <span class="pbk" />next word. OK

(1500 characters)</p><span class="pbk" /><p> OK

(1500 characters)<span class="pbk" /></p><p> OK

(1500 characters)</p><p><span class="pbk" /> OK


Undesirable results:

(1499 characters)<<span class="pbk" />/p><p> AVOID

(1498 characters)wo<span class="pbk" />rd AVOID IF POSSIBLE

If there is no way to exclude mid-word insertions I will figure something out but I definitely want to avoid insertions inside html tags.



I tried:

search
((.){20,25})
replace
\1<span class="pbk" />

Search options:
No regex options selected
Text only

(I was using smaller numbers to make it easier to test), but for some reason it does not exclude the code.


I also tried this, hoping to avoid mid-word insertions:
((\w|\s){20,25}\s|\w+\s|$)

But it seems to select only one word + space, regardless of character count, and it excludes punctuation (including apostrophes).


If this is not possible by regex, is it the kind of thing that could be added as a feature, where the number of characters per "page" could be user-defined?


Alternately, if there is any easier way to make digital-only pagebreaks that I haven't heard about I am all ears.

Thanks in advance if anyone has any ideas.
Mister L is offline   Reply With Quote