Quote:
Originally Posted by droylynn
I am trying to convert a different PDF and the author's name is on every other page header but the first name is on one line with a <br> then the surname is on the one below and mixed in with it is page numbers or what I think are page numbers.
I've tried following the examples by copying and pasting the whole thing then using [0-9] three times to cover the variables but it isn't working. Can anyone help with this sneaky problem? 
|
Ok from what you've said it seems it's something like this:
Right? Well In that case what I'd use as regular expression would be:
Quote:
Lewis<br>\nCarroll\s\d{1,3}
|
Let's go through it:
"Lewis" - This matches the string "Lewis", nothing weird about this one.
"<br>" - Matches the "<br>" that is used to make a line break.
"\n" - Depending on how it looks you might have to include this or not. For instance, if your text looks like this:
Then you don't need it. But if it looks like:
As you can see, Carroll is in a new line so you need to include the new line character, otherwise it won't be matched.
"Carroll\s" - matches the string "Carroll" followed by one whitespace character
"\d{1,3}" - matches numbers with 1 to 3 digits. It'd then match 11, but not 1234. I set that to three because must books have less than 999 pages. If you have a really long book you can add another digit by changing the text inside braces to "{1,4}". Same thing if you have a shorter book with less than 99 pages.
And that's it, I think. Try that one and tell me how it went