MobileRead Forums - View Single Post - Chapter detection when only digits

Perkin · 09-12-2010, 02:03 PM

Hi, I need help.

I'm trying to convert a load of rtf's to epubs, and most of them have chapters which are only digits on their own line, followed by a title on the next.

And rather than adding 'Chapter ' to all places in the rtf's by hand,
can anyone show me a regex that will allow a chapter that is only numbers to be recognised.

None of the rtf's have line/page numbers, they have been removed, so only chapter #'s are on line of their own.

I've added to the detect chapters regex
re:test(., '\d|\d\d',i)
and that does split the chapters correctly, however it also splits on any paragraph with a single or double digit.

Also would be nice if was able to assign it automatically a <h#> tag.

Any help is appreciated.

09-12-2010, 02:03 PM	#1
Perkin Guru Posts: 655 Karma: 64171 Join Date: Sep 2010 Location: Kent, England, Sol 3, ZZ9 plural Z Alpha Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)	Chapter detection when only digits - regex needed Hi, I need help. I'm trying to convert a load of rtf's to epubs, and most of them have chapters which are only digits on their own line, followed by a title on the next. And rather than adding 'Chapter ' to all places in the rtf's by hand, can anyone show me a regex that will allow a chapter that is only numbers to be recognised. None of the rtf's have line/page numbers, they have been removed, so only chapter #'s are on line of their own. I've added to the detect chapters regex re:test(., '\d\|\d\d',i) and that does split the chapters correctly, however it also splits on any paragraph with a single or double digit. Also would be nice if was able to assign it automatically a <h#> tag. Any help is appreciated.