Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 08-15-2011, 04:20 PM   #1
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
Probably a dumb question from a total novice

I have read all the tutorials on RegEx but I am still confused.

I want to remove headers from the PDF files and it looks simple enough for someone who is computer literate and I will be the first to admit that I know next to nothing about programming or strings.

In the tutorial where you can remove Title and Author should I be actually stating the name of title and author?
ie (Standoff|Lauren Dane)

Will this work in the Search and replace section?
droylynn is offline   Reply With Quote
Old 08-15-2011, 04:27 PM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by droylynn View Post
I have read all the tutorials on RegEx but I am still confused.

I want to remove headers from the PDF files and it looks simple enough for someone who is computer literate and I will be the first to admit that I know next to nothing about programming or strings.

In the tutorial where you can remove Title and Author should I be actually stating the name of title and author?
ie (Standoff|Lauren Dane)

Will this work in the Search and replace section?
The vertical bar may cause problems. precede (Esc) it with a backslash
are the open and close ( ) there also? Each of those need an \
theducks is online now   Reply With Quote
Old 08-15-2011, 04:38 PM   #3
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
I was just following the tutorial by Manichean where he shows it as

Code:

(Title)
(Author)

as our two needed expressions. Now we make things simpler by using the vertical bar ("|" is called the vertical bar character): If you use the expression
Code:

(Title|Author)

you'll either get a match for "Title" (on the odd pages) or you'd match "Author" (on the even pages). Well, wasn't that easy?

Is this not correct?
droylynn is offline   Reply With Quote
Old 08-15-2011, 04:43 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by droylynn View Post
I was just following the tutorial by Manichean where he shows it as

Code:

(Title)
(Author)

as our two needed expressions. Now we make things simpler by using the vertical bar ("|" is called the vertical bar character): If you use the expression
Code:

(Title|Author)

you'll either get a match for "Title" (on the odd pages) or you'd match "Author" (on the even pages). Well, wasn't that easy?

Is this not correct?
My, you were showing the pattern you were using.
I thought that was the text layout you were trying to get rid of.
theducks is online now   Reply With Quote
Old 08-15-2011, 04:52 PM   #5
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
Sorry I know I'm confusing lol!
All I was trying to determine is if the details need to be specific to the pdf file I'm converting or if I just need to type in (Title|Author)
droylynn is offline   Reply With Quote
Old 08-15-2011, 05:10 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,799
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by droylynn View Post
Sorry I know I'm confusing lol!
All I was trying to determine is if the details need to be specific to the pdf file I'm converting or if I just need to type in (Title|Author)
Those must be the Exact Values of each string.

(Moby Dick|Herman Melville)

No leading spaces, no trailing spaces, only what is between words
The () are the grouping markers to REGEX

Now you see why I use Sigil to post process I get to set the pattern, see what it matches (and what it does, but shouldn't ). The other advantage, is I get to use multiple passes through the book to clean up, not a one-shot fits all approach
theducks is online now   Reply With Quote
Old 08-15-2011, 07:58 PM   #7
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
Thank you for clarifying that. I tried it earlier and it worked so I'm a happy bunny
droylynn is offline   Reply With Quote
Old 08-16-2011, 12:26 PM   #8
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
Sorry! It's me again.

I am trying to convert a different PDF and the author's name is on every other page header but the first name is on one line with a <br> then the surname is on the one below and mixed in with it is page numbers or what I think are page numbers.

I've tried following the examples by copying and pasting the whole thing then using [0-9] three times to cover the variables but it isn't working. Can anyone help with this sneaky problem?
droylynn is offline   Reply With Quote
Old 08-16-2011, 02:53 PM   #9
camilou
Junior Member
camilou began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Aug 2011
Device: Nook
Quote:
Originally Posted by droylynn View Post
I am trying to convert a different PDF and the author's name is on every other page header but the first name is on one line with a <br> then the surname is on the one below and mixed in with it is page numbers or what I think are page numbers.

I've tried following the examples by copying and pasting the whole thing then using [0-9] three times to cover the variables but it isn't working. Can anyone help with this sneaky problem?
Ok from what you've said it seems it's something like this:

Quote:
Lewis<br>
Carroll 23
Right? Well In that case what I'd use as regular expression would be:
Quote:
Lewis<br>\nCarroll\s\d{1,3}
Let's go through it:
"Lewis" - This matches the string "Lewis", nothing weird about this one.
"<br>" - Matches the "<br>" that is used to make a line break.
"\n" - Depending on how it looks you might have to include this or not. For instance, if your text looks like this:
Quote:
"Lewis<br>Carroll"
Then you don't need it. But if it looks like:
Quote:
"Lewis<br>
Carroll"
As you can see, Carroll is in a new line so you need to include the new line character, otherwise it won't be matched.
"Carroll\s" - matches the string "Carroll" followed by one whitespace character
"\d{1,3}" - matches numbers with 1 to 3 digits. It'd then match 11, but not 1234. I set that to three because must books have less than 999 pages. If you have a really long book you can add another digit by changing the text inside braces to "{1,4}". Same thing if you have a shorter book with less than 99 pages.

And that's it, I think. Try that one and tell me how it went
camilou is offline   Reply With Quote
Old 08-16-2011, 03:11 PM   #10
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
I tend to use \s* to match a variable number of whitespace characters (space, tab, newline etc), and \d* for a variable number of numeric characters.

Note that the * option allows for zero as a valid number of occurrences. If you want to force at least one the use + as the repetition indicator instead of *.
itimpi is offline   Reply With Quote
Old 08-20-2011, 11:22 AM   #11
droylynn
Junior Member
droylynn began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Aug 2011
Device: Kindle
Thank you for all your help. You guys are amazing
droylynn is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CALIBRE\KINDLE NOVICE QUESTION Bass Rock Calibre 3 01-13-2011 04:23 PM
Total Novice Seeks Help w/ Adding Chapters to Manybooks.net ePub Files KingSausage Calibre 1 01-01-2010 05:12 PM
Probably a dumb question, but ... stoogeswoman Fictionwise eBookwise 20 02-20-2009 05:29 PM
Unutterably Silly There is never a dumb question, only dumb people! Dr. Drib Lounge 17 12-29-2008 09:43 AM
A Novice Question bspill Sony Reader 11 10-29-2006 12:53 PM


All times are GMT -4. The time now is 08:57 AM.


MobileRead.com is a privately owned, operated and funded community.