View Single Post
Old 09-12-2010, 02:03 PM   #1
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 655
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD, iPad (Marvin)
Chapter detection when only digits - regex needed

Hi, I need help.

I'm trying to convert a load of rtf's to epubs, and most of them have chapters which are only digits on their own line, followed by a title on the next.

And rather than adding 'Chapter ' to all places in the rtf's by hand,
can anyone show me a regex that will allow a chapter that is only numbers to be recognised.

None of the rtf's have line/page numbers, they have been removed, so only chapter #'s are on line of their own.

I've added to the detect chapters regex
re:test(., '\d|\d\d',i)
and that does split the chapters correctly, however it also splits on any paragraph with a single or double digit.

Also would be nice if was able to assign it automatically a <h#> tag.

Any help is appreciated.
Perkin is offline   Reply With Quote