08-15-2011, 04:20 PM | #1 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Probably a dumb question from a total novice
I have read all the tutorials on RegEx but I am still confused.
I want to remove headers from the PDF files and it looks simple enough for someone who is computer literate and I will be the first to admit that I know next to nothing about programming or strings. In the tutorial where you can remove Title and Author should I be actually stating the name of title and author? ie (Standoff|Lauren Dane) Will this work in the Search and replace section? |
08-15-2011, 04:27 PM | #2 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
are the open and close ( ) there also? Each of those need an \ |
|
08-15-2011, 04:38 PM | #3 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
I was just following the tutorial by Manichean where he shows it as
Code: (Title) (Author) as our two needed expressions. Now we make things simpler by using the vertical bar ("|" is called the vertical bar character): If you use the expression Code: (Title|Author) you'll either get a match for "Title" (on the odd pages) or you'd match "Author" (on the even pages). Well, wasn't that easy? Is this not correct? |
08-15-2011, 04:43 PM | #4 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
I thought that was the text layout you were trying to get rid of. |
|
08-15-2011, 04:52 PM | #5 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Sorry I know I'm confusing lol!
All I was trying to determine is if the details need to be specific to the pdf file I'm converting or if I just need to type in (Title|Author) |
08-15-2011, 05:10 PM | #6 | |
Well trained by Cats
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
(Moby Dick|Herman Melville) No leading spaces, no trailing spaces, only what is between words The () are the grouping markers to REGEX Now you see why I use Sigil to post process I get to set the pattern, see what it matches (and what it does, but shouldn't ). The other advantage, is I get to use multiple passes through the book to clean up, not a one-shot fits all approach |
|
08-15-2011, 07:58 PM | #7 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Thank you for clarifying that. I tried it earlier and it worked so I'm a happy bunny
|
08-16-2011, 12:26 PM | #8 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Sorry! It's me again.
I am trying to convert a different PDF and the author's name is on every other page header but the first name is on one line with a <br> then the surname is on the one below and mixed in with it is page numbers or what I think are page numbers. I've tried following the examples by copying and pasting the whole thing then using [0-9] three times to cover the variables but it isn't working. Can anyone help with this sneaky problem? |
08-16-2011, 02:53 PM | #9 | |||||
Junior Member
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Nook
|
Quote:
Quote:
Quote:
"Lewis" - This matches the string "Lewis", nothing weird about this one. "<br>" - Matches the "<br>" that is used to make a line break. "\n" - Depending on how it looks you might have to include this or not. For instance, if your text looks like this: Quote:
Quote:
"Carroll\s" - matches the string "Carroll" followed by one whitespace character "\d{1,3}" - matches numbers with 1 to 3 digits. It'd then match 11, but not 1234. I set that to three because must books have less than 999 pages. If you have a really long book you can add another digit by changing the text inside braces to "{1,4}". Same thing if you have a shorter book with less than 99 pages. And that's it, I think. Try that one and tell me how it went |
|||||
08-16-2011, 03:11 PM | #10 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
I tend to use \s* to match a variable number of whitespace characters (space, tab, newline etc), and \d* for a variable number of numeric characters.
Note that the * option allows for zero as a valid number of occurrences. If you want to force at least one the use + as the repetition indicator instead of *. |
08-20-2011, 11:22 AM | #11 |
Junior Member
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
Thank you for all your help. You guys are amazing
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
CALIBRE\KINDLE NOVICE QUESTION | Bass Rock | Calibre | 3 | 01-13-2011 04:23 PM |
Total Novice Seeks Help w/ Adding Chapters to Manybooks.net ePub Files | KingSausage | Calibre | 1 | 01-01-2010 05:12 PM |
Probably a dumb question, but ... | stoogeswoman | Fictionwise eBookwise | 20 | 02-20-2009 05:29 PM |
Unutterably Silly There is never a dumb question, only dumb people! | Dr. Drib | Lounge | 17 | 12-29-2008 09:43 AM |
A Novice Question | bspill | Sony Reader | 11 | 10-29-2006 12:53 PM |