MobileRead Forums - View Single Post - Probably a dumb question from a total novice

camilou · 08-16-2011, 02:53 PM

Quote:

Originally Posted by droylynn

I am trying to convert a different PDF and the author's name is on every other page header but the first name is on one line with a then the surname is on the one below and mixed in with it is page numbers or what I think are page numbers.

I've tried following the examples by copying and pasting the whole thing then using [0-9] three times to cover the variables but it isn't working. Can anyone help with this sneaky problem?

Ok from what you've said it seems it's something like this:

Quote:

Lewis 
Carroll 23

Right? Well In that case what I'd use as regular expression would be:

Quote:

Lewis \nCarroll\s\d{1,3}

Let's go through it:
"Lewis" - This matches the string "Lewis", nothing weird about this one.
" " - Matches the " " that is used to make a line break.
"\n" - Depending on how it looks you might have to include this or not. For instance, if your text looks like this:

Quote:

"Lewis Carroll"

Then you don't need it. But if it looks like:

Quote:

"Lewis 
Carroll"

As you can see, Carroll is in a new line so you need to include the new line character, otherwise it won't be matched.
"Carroll\s" - matches the string "Carroll" followed by one whitespace character
"\d{1,3}" - matches numbers with 1 to 3 digits. It'd then match 11, but not 1234. I set that to three because must books have less than 999 pages. If you have a really long book you can add another digit by changing the text inside braces to "{1,4}". Same thing if you have a shorter book with less than 99 pages.

And that's it, I think. Try that one and tell me how it went