Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-12-2010, 07:03 AM   #1
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Problem with regular expressions

I'm having some trouble writing a regular expression to delete page headers in the conversion options. The page header I'm trying to delete basically looks like
Code:
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 of 230</p>
so I figured the regexp needed should look like
Code:
Title</p><p class="calibre1">\nPage [0-9]* of [0-9]*
to match the part from "Title" to the total page number, which is what I want to remove. Now, this works fine if I just use the part up to "\n" or the part after it, which matches the first or the second line I want removed, respectively. But as soon as I try to cobble the two lines together, I don't get any match. I've tried every variation of \n,\s and so forth that I could think of, including slapping some * and ? behind it and fooling around with groups, nothing seems to work.
Seeing as I've never used regular expressions before and just skimmed over the Calibre user manual to piece it together, I'm sure there's something I'm missing, but I cant figure out what it is. What I can figure out is that I somehow don't get how to match a newline. Could anyone help?

Last edited by Manichean; 01-12-2010 at 07:07 AM.
Manichean is offline   Reply With Quote
Old 01-12-2010, 10:31 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,131
Karma: 5381911
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try

Title</p><p class="calibre1">[^<]+</p>
kovidgoyal is offline   Reply With Quote
 
Enthusiast
Old 01-26-2010, 08:55 AM   #3
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Unfortunately, that doesn't work. Same problem, it just gets confused about the linebreak.
I thought about maybe passing a flag that the string it should match is on multiple lines, but I don't know how to do this and currently, I'm too busy to figure it out. I'll post again once I find a solution.
Manichean is offline   Reply With Quote
Old 02-02-2011, 10:21 PM   #4
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
I messed with this a little. I don't know exactly what you are looking for but here is what I have. This should only match on a number followed by a </p> followed by an end of the line.
Search:
of \d+</p>$
Replace
<\p>

So this will find the last line of your three lines with the page number followed by the <\p> at the end of a line. Then replace only the <\p>. It looks like this before:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
sides, and above the jacket collar behind, uncombed. Both beards were short and scant.
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 of 230</p>

The man from the east wore a standard straight sword, the plastic
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

And now after:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
<p class="calibre1">
Title</p><p class="calibre1">
Page 42 </p>
<<<<<<<<<<<<<<<<<<<<<<<<<<<<

This just removes the "of XXX" page numbering part.
Is that what you were after?

Archon
Archon is offline   Reply With Quote
Old 02-02-2011, 11:39 PM   #5
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Did you try this:
Code:
Title</p><p class="calibre1">\s*Page\s*[0-9]+\s*of\s*[0-9]+
Is it showing up correctly as matching in the regex wizard, but not act removing it during conversion? Usually when this happens it's one of two things - there are also non-breaking spaces hiding amongst the real spaces, or there is a bug/limitation where Calibre is showing you html in the wizard that's not exactly the same as the html that is provided to the Search and Replace feature during conversion.

Edit:
Note if non-breaking spaces are your problem you can create a character class to include them. Instead of \s*, use this: [\su00a0]*

Last edited by ldolse; 02-02-2011 at 11:43 PM.
ldolse is offline   Reply With Quote
Old 02-03-2011, 03:40 AM   #6
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 644
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
How about a nice simple
Quote:
Title</p>.*(?=</p>)
Which should match 'Title</p>' and everything upto but not including the next '</p>'
Perkin is offline   Reply With Quote
Old 02-03-2011, 03:57 AM   #7
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
You people do realize that this thread is about a year old? I solved that issue quite some time ago. (The solution was me stopping to be stupid, by the way.)
Manichean is offline   Reply With Quote
Old 02-03-2011, 05:31 AM   #8
Perkin
Guru
Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.Perkin calls his or her ebook reader Vera.
 
Perkin's Avatar
 
Posts: 644
Karma: 64171
Join Date: Sep 2010
Location: Kent, England, Sol 3, ZZ9 plural Z Alpha
Device: Sony PRS-300, Kobo Aura HD
I thought it was odd, that you, who done the regex faq/guide couldn't manage it.
I did look at date, and thought orig post was December.
Perkin is offline   Reply With Quote
Old 02-03-2011, 05:34 AM   #9
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Quote:
You people do realize that this thread is about a year old? I solved that issue quite some time ago. (The solution was me stopping to be stupid, by the way.)
It's never too late to help a brother out. :-)

BTW what was your solution (besides stopping being stupid as you say)?

Maybe we could all learn from your experience.

Archon
Archon is offline   Reply With Quote
Old 02-03-2011, 05:42 AM   #10
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80520
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
The problem was that I didn't use the regex wizard to test it, basically. I tried to use Notepad++, which doesn't allow for multiline regex matching. (I only found that out while writing the guide, actually.) The reason I did that was that I felt Notepad++ would be faster than Calibre, and I didn't fully understand the wizard. Also, had I known about character classes, especially \s, I might have found a solution sooner.
Manichean is offline   Reply With Quote
Old 02-03-2011, 02:27 PM   #11
Archon
Zealot
Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!Archon , Klaatu Barada Niktu!
 
Archon's Avatar
 
Posts: 110
Karma: 5176
Join Date: Dec 2010
Device: Mac OSX, iPad, iPod, & Nook
Thanks for your wisdom.

I will pass that along to my PeeCee using mates.

Archon
Archon is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Regular expressions, Calibre and you- an introduction (Archived) Manichean Conversion 80 11-11-2010 07:37 AM
Help with Regular Expressions ghostyjack Workshop 2 01-08-2010 11:04 AM
Regular Expressions help needed Phil_C Workshop 20 10-03-2009 12:14 AM
BookDesigner v5 and regular expressions ShineOn Sony Reader 11 08-25-2008 04:06 PM


All times are GMT -4. The time now is 02:19 PM.


MobileRead.com is a privately owned, operated and funded community.