View Single Post
Old 03-10-2010, 02:46 AM   #4
TheBard
Bifocal Wearer
TheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exerciseTheBard juggles running chainsaws for a bit of light exercise
 
TheBard's Avatar
 
Posts: 49
Karma: 38902
Join Date: Jan 2010
Location: USA
Device: Kobo Touch, Aura, Clara ...
Well, I'm not DarkKipper, but here are a few regular expressions I use. They have worked on my test files, but could probably be improved or modified:

Delete header/footer that starts with "file///" and ends with either ".txt" or ".htm" or "html"
file:///.+\.(txt|html|htm)

Delete line that starts with "file///" and ends with numbers
file:///.+\d

Combine the two above
file:///.+(\d|(txt|html|htm))

Delete a segment of a line in which the segment ends with a specific string
.* - Baroness Orczy
(the " - Baroness Orczy" is in the line)


Here is one that seems to work, but might need a bit of tweaking. It looks for EITHER a line that starts with "file:///" and ends with numbers, OR a line that starts with a specified string, and deletes the found string. Quite handy when looking for headers / footers that may vary somewhat across a subdirectory
(file:///.+\d|Baroness Orczy.*)


Header with "Generated By ABC ... etc .html (the ABC Amber header)
Generated by.+html

Google "The Regex Coach" for a very nice freeware that is extremely helpful in designing regexes.

Hope these help!
TheBard is offline   Reply With Quote