Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-26-2010, 11:25 AM   #61
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Manichean View Post
You're right, I must have missed that comment. That means, if I'm not mistaken, that the only place case matters is the search & replace, where it can be controlled with a checkbox, yes?
Regex case sensitivity is on in recipes (but controllable by flags), off in the main searchbar, and controlled by option box in S&R (I'm not completely sure about the conversion situations, as I don't do a lot of those). Case sensitivity is important to understand for regex, so don't remove it, but I'd point out where Calibre turns it off so the new user doesn't get confused.
Starson17 is offline  
Old 09-26-2010, 11:34 AM   #62
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Starson17 View Post
Regex case sensitivity is on in recipes (but controllable by flags), off in the main searchbar, and controlled by option box in S&R (I'm not completely sure about the conversion situations, as I don't do a lot of those). Case sensitivity is important to understand for regex, so don't remove it, but I'd point out where Calibre turns it off so the new user doesn't get confused.
I just tested, conversion is case sensitive... and I just finished editing that part out. Oh, well.

Edit: Now it should be correct again. Put the ignore case- flag back in, re-wrote beginning paragraph.

Last edited by Manichean; 09-26-2010 at 11:50 AM.
Manichean is offline  
Old 09-26-2010, 03:19 PM   #63
karel.sz
Junior Member
karel.sz began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Sep 2010
Location: Pécs, Hungary
Device: Kindle 3
Lightbulb A simple way to clean up HTML header/footer

Thanks for the nice tutorial Manichean!

I've found the following regex snippet very handy to clean the navigation header and footer from the HTML version of ProGit (http://progit.org/, CC by-nc-sa, saved with wget). You only have to find the start of the header/footer div, than tell the regex how many lines to delete:

Code:
<div id="footer">(\n.+){27}\n.+script>
This will delete the opening footer div plus 27 other lines (plus a closing script tag).

HTH
karel.sz is offline  
Old 09-27-2010, 08:09 AM   #64
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Since I'm happy with the post as it is now, and there seem to be no more suggestions, I've removed the notice that the post is still developing and will notice Kovid that it can be included in the documentation. Of course, that doesn't mean that I won't listen to suggestions anymore, just that edits will happen less frequently.

I'm also going to go ahead and create a wiki article out of this, which I'll link from the main Calibre article.
Manichean is offline  
Old 09-27-2010, 09:28 AM   #65
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Overall looking very good Manichean, sorry about the late review. Not sure that backreferences are really required, but you poked some fun at that, so fine by me.

Shouldn't this be two separate paragraphs?:
Quote:
Originally Posted by Manichean View Post
("Whitespace" is a term for anything that won't be printed. These characters include space, tabulator, line feed, form feed and carriage return.) As a last note on sets, you can also define a set as any character but those in the set. You do that by including the character "^" as the very first character in the set.

This is actually incorrect, (1|2)+ will match all those strings. A group doesn't get 'locked' based on the first character it matches.
Quote:
Originally Posted by Manichean View Post
Consider the group "(1|2)" and the set "[12]"- without quantifiers, each will only match either the character "1" or the character "2". But, if you append them with a quantifier, they behave quite distinct: "(1|2)+" will match e.g. the string "1111" or "222", but not the string "12212"- once the group has selected a character, it cannot select another one.
ldolse is offline  
Old 09-27-2010, 09:42 AM   #66
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by ldolse View Post
Overall looking very good Manichean, sorry about the late review. Not sure that backreferences are really required, but you poked some fun at that, so fine by me.
Weeell, I thought so too, at first. But you can do fancy things with them in the search & replace, so they got in.

Quote:
Originally Posted by ldolse View Post
Shouldn't this be two separate paragraphs?:
Oops...


Quote:
Originally Posted by ldolse View Post
This is actually incorrect, (1|2)+ will match all those strings. A group doesn't get 'locked' based on the first character it matches.
Hey, you're right. I just took it out of this post without checking. Just tested in Python, so, if I'm right, "(1|2)+" would actually be equivalent to "[12]+"?
Manichean is offline  
Old 09-27-2010, 09:59 AM   #67
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Manichean View Post
Weeell, I thought so too, at first. But you can do fancy things with them in the search & replace, so they got in.
Right - I forgot about the search and replace function, haven't played with that feature. If you really want to get into back-references you should probably used named back-references - but there are different variations on how that works depending on how the python functions are called. When I get a chance to play with the search and replace feature I can give you some more details.


Quote:
Originally Posted by Manichean View Post
if I'm right, "(1|2)+" would actually be equivalent to "[12]+"?
Right - they're basically identical.
ldolse is offline  
Old 09-27-2010, 10:00 AM   #68
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Manichean View Post
Just tested in Python, so, if I'm right, "(1|2)+" would actually be equivalent to "[12]+"?
Isn't this entire subject a bit esoteric for the beginner? Most beginners just need to know that (John|Bill) can be used, not how the quantifiers apply to OR'd expressions. For single characters, they should just use square brackets and then apply the quantifier. As to the underlying issue, it wouldn't surprise me if different implementations of regex handle this differently.

BTW, I refer to "|" as the "vertical bar," not "pipe" and in the regex context I've seen it formally referred to as the "alternation operator" or, more commonly, just "OR" or the "OR operator" The term "pipe" has a pretty specific meaning in *nixLand (and in WindowsLand) and it has nothing to do with alternation or OR. There could be confusion there.
Starson17 is offline  
Old 09-27-2010, 10:21 AM   #69
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by ldolse View Post
Right - I forgot about the search and replace function, haven't played with that feature. If you really want to get into back-references you should probably used named back-references - but there are different variations on how that works depending on how the python functions are called. When I get a chance to play with the search and replace feature I can give you some more details.
I'm happy with the backreferences as is. It's enough to get the job done, and if anyone needs more, there's always the Python regexp reference (by that point the reader should be able to understand it, I think).
Oh, and by all means play with that feature- chaley did a great job on that, I think.

Quote:
Originally Posted by Starson17 View Post
Isn't this entire subject a bit esoteric for the beginner? Most beginners just need to know that (John|Bill) can be used, not how the quantifiers apply to OR'd expressions. For single characters, they should just use square brackets and then apply the quantifier. As to the underlying issue, it wouldn't surprise me if different implementations of regex handle this differently.
I removed this section entirely, since Python handles the quantifiers like in sets, which should be intuitively obvious, if one is following the introduction. The different implementations don't concern us here, as this is only about regexpes in Calibre

Quote:
Originally Posted by Starson17 View Post
BTW, I refer to "|" as the "vertical bar," not "pipe" and in the regex context I've seen it formally referred to as the "alternation operator" or, more commonly, just "OR" or the "OR operator" The term "pipe" has a pretty specific meaning in *nixLand (and in WindowsLand) and it has nothing to do with alternation or OR. There could be confusion there.
I've only heard the character referenced to as a pipe, in its specific CLI meaning on *nix and Windows as well as just the character. Though I confess that may be because, to use modern parlance, "I hang with them geeks" Although personally, I don't see the possibility for confusion, I'm going to change it to "vertical bar", just to be safe.
Manichean is offline  
Old 09-27-2010, 10:34 AM   #70
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,651
Karma: 5629001
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Minor Nit.

Whitespace is any non-visible printing character, It occupies space

In ASCII text, A non-printing character were things like Bel, SOT,EOT,DC1... They did things, but occupied no space (and caused no carriage or paper motion).
theducks is offline  
Old 09-27-2010, 10:44 AM   #71
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by theducks View Post
Minor Nit.

Whitespace is any non-visible printing character, It occupies space

In ASCII text, A non-printing character were things like Bel, SOT,EOT,DC1... They did things, but occupied no space (and caused no carriage or paper motion).
You're correct, and I'm aware of that. I used "won't get printed" as a colloquial way of saying "not using ink, but doing stuff". Keeping in mind that I want to keep this understandable to the more non-technical crowd, do you think that part should be reworded?
Manichean is offline  
Old 09-27-2010, 10:47 AM   #72
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 5,459
Karma: 831552
Join Date: Jan 2010
Location: France
Device: Many android devices
@Manichean: good work. Writing to explain ideas is very difficult for lots of reasons. I think you found a good balance.

One nit: in a couple of places you have lines that start with punctuation. These always happen after a code box. One is ', for lower- and uppercase characters you'd' and another is '. The other shorthands can be complemented by'.
chaley is offline  
Old 09-27-2010, 11:04 AM   #73
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by chaley View Post
@Manichean: good work. Writing to explain ideas is very difficult for lots of reasons. I think you found a good balance.

One nit: in a couple of places you have lines that start with punctuation. These always happen after a code box. One is ', for lower- and uppercase characters you'd' and another is '. The other shorthands can be complemented by'.
Thanks for catching those. You found the reason, too- it's the same think when I use LaTeX: I tend to proofread the source rather than the final, rendered text, which leads to punctuation remaining after codeblocks (in forums) or equation blocks (in LaTeX).

I should probably change that habit and proofread the finished, rendered text instead...
Manichean is offline  
Old 09-27-2010, 11:44 AM   #74
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
I've only heard the character referenced to as a pipe, in its specific CLI meaning on *nix and Windows as well as just the character.
That's the point. On the CLI, that character "pipes" the output of one program into the input of another. It makes sense in that context to refer to it as a "pipe," but not in other contexts. If you want the full dope: http://en.wikipedia.org/wiki/Vertical_bar
Starson17 is offline  
Old 10-02-2010, 10:24 PM   #75
da_jane
Evangelist
da_jane will become famous soon enoughda_jane will become famous soon enoughda_jane will become famous soon enoughda_jane will become famous soon enoughda_jane will become famous soon enoughda_jane will become famous soon enough
 
Posts: 405
Karma: 692
Join Date: Sep 2006
Device: Samsung Galaxy Note 3 | Kindle Paperwhite | iPad Mini
If I want to remove multiple lines of text, do I enclose my reg expressions in parentheses and then separate the sets by a ?
da_jane is offline  
Closed Thread

Tags
regexp calibre tutorial

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with regular expressions Manichean Conversion 10 02-03-2011 02:27 PM
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Help with Regular Expressions ghostyjack Workshop 2 01-08-2010 11:04 AM
Regular Expressions help needed Phil_C Workshop 20 10-03-2009 12:14 AM
BookDesigner v5 and regular expressions ShineOn Sony Reader 11 08-25-2008 04:06 PM


All times are GMT -4. The time now is 10:25 PM.


MobileRead.com is a privately owned, operated and funded community.