Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Closed Thread
 
Thread Tools Search this Thread
Old 09-21-2010, 05:02 AM   #16
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by chaley View Post
You can use flags, but you must use embedded syntax.
Oh, that works? I feel stupid now- somehow, that never occured to me.

Quote:
Originally Posted by chaley View Post
<professorial_mode>
[...]
</professorial_mode>
That's stretching my knowledge of theoretical computer sciences... I've heard lectures on program verification, algorithms and data structures, but that's about it. I'll have to think about what you wrote there.
Manichean is offline  
Old 09-21-2010, 06:35 AM   #17
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Manichean View Post
Oh, that works? I feel stupid now- somehow, that never occured to me.
You shouldn't feel stupid about not knowing about embedded syntax for flags/qualifiers - the amount of google searching I had to do to find the docs on this was ridiculous. They can be found, but only if you've already figured out the exact descriptive qualifiers used in the python documentation. It's not in the commonly linked python regex doc pages, which talk about using re.compile to pass flags, and re.compile isn't available to a normal Calibre end user.

That, and all the flags are specific to python, and for the most part your tutorial has been about generic regex behavior. Probably best to stay generic, though that may be debatable...
ldolse is offline  
Old 09-21-2010, 10:50 PM   #18
Scott Nielsen
Groupie
Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!Scott Nielsen goes to infinity... and beyond!
 
Posts: 154
Karma: 112134
Join Date: May 2009
Location: Kuala Lumpur
Device: iPad, K3, K4, T1
This thread has been very enlightening. My thanks to OP and everybody who contributed further; I've learned a lot.
Scott Nielsen is offline  
Old 09-21-2010, 10:57 PM   #19
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Reg expressions have always been a little complex for me to understand. Great tutorial by the way. One thing that I have found that works really well and I use it with a lot of the recipes i write for calibre is this site here.
http://www.txt2re.com/index.php3
you can type the string that you want to search for. then hit the language which in our case would be python and it spits out the code to find the regex.
TonytheBookworm is offline  
Old 09-22-2010, 04:08 AM   #20
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Thanks to both of you.
As for that regexp generator, as far as I see, you'd have to manually paste the expression together from the single elements it generates in code, yes? But still, that might be a handy tool.
Manichean is offline  
Old 09-23-2010, 08:31 AM   #21
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Edited again: for style, I tried to clarify distinction between use of parentheses and square brackets (groups vs. sets), added notes on strings in general, added some examples, included some flags.

Still not sure what to do about that part about special characters and escape sequences. Chaley, I generally agree that it's out of place where it is now, but I couldn't figure out where else to put it. I think I'm gonna go ahead and rewrite it into the respective portions, but that's for another pending edit.
Also, still to come: even more examples.

(Things I learned while writing this edit: Calibre likes file extensions in the test field for importing file expressions, and Notepad++ has an entirely different understanding for what \s means than Python does. Next time, I'm going to use Python to test examples right away.)
Manichean is offline  
Old 09-23-2010, 08:44 AM   #22
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,738
Karma: 12488297
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by Manichean View Post
Notepad++ has an entirely different understanding for what \s means than Python does. Next time, I'm going to use Python to test examples right away.)
I'm curious, did you set the language in Notepad++ to Python?
DoctorOhh is offline  
Old 09-23-2010, 09:13 AM   #23
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by dwanthny View Post
I'm curious, did you set the language in Notepad++ to Python?
I'm talking about using the search feature with regular expressions, which the programs help notes to use the Scintilla regexp engine and being fixed to a per-line match. Does changing the language change this behaviour?
Short answer: No, I didn't.
Manichean is offline  
Old 09-23-2010, 09:24 AM   #24
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by chaley View Post
<professorial_mode>
The general case cannot be solved with regular expressions because REs don't have the notion of 'stack'. Said another way, and getting a bit formal, all REs by definition can be translated into a deterministic finite state machine. The important part here is that the number of states is known from the RE, and is fixed for all utterances (text to be matched). Parsing utterances in a palindromic language requires a state for each letter up to the center point so the machine can match the right letter after the center point. Such a machine requires len(utterance)/2 states. Thus the number of states is unbounded, meaning that the grammar for the language cannot be described using an RE.
So, if I understand you correctly, the regular expression would be something like the starting state, each matching string would be it's own (end?) state, and the matching operation would provide transitions between the end states, yes? And because the number of states a regular expression can describe is finite, and, in the general case, len(utterance) is unknown, presumed to be in the range [0,\infty], a palindrome could not be described using a regular expression? If that's correct, I don't think I understand how the * and + quantifiers work, because they, as far as I understand, describe an utterance with len(utterance) \in [0,\infty] and len(utterance) \in [1,\infty], respectively.
(Sorry, I lapsed into some LaTeX there, but, seeing your background, I assume you understand.)
Manichean is offline  
Old 09-23-2010, 10:42 AM   #25
megachirops
Enthusiast
megachirops began at the beginning.
 
Posts: 31
Karma: 12
Join Date: Mar 2010
Device: Kindle 2, Kindle 3
Quote:
Originally Posted by Manichean View Post
Notepad++ has an entirely different understanding for what \s means than Python does. Next time, I'm going to use Python to test examples right away.)
An online regex tester that I've had good luck with:
http://www.pythonregex.com/
megachirops is offline  
Old 09-23-2010, 11:10 AM   #26
chaley
"chaley", not "charley"
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 4,926
Karma: 802238
Join Date: Jan 2010
Location: France
Device: Many android devices
Quote:
Originally Posted by Manichean View Post
So, if I understand you correctly, the regular expression would be something like the starting state, each matching string would be it's own (end?) state, and the matching operation would provide transitions between the end states, yes?
I don't think this is quite right.

First, some words on finite state machines (FSMs). An FSM consists of a set of states, connected by arcs that stand for events. Lets say that these events 'label' the arc. In the kind of machine we are talking about (deterministic), an event can appear on at most one arc out of a given state. Traversing an arc consumes the event. Events that are labels for an arc leaving a particular state cannot be accepted by that state, and their occurrence is an error. The machine is constructed in advance from the pattern, and cannot change depending upon input.

Thus the regular expression 'abc' would look something like

start __a__ state1 __b__ state2 __c__ accept

The regular expression for 'ab*c' would look something like

Code:

start __a__ state1 __b__ state2 __c__ accept
                         |_b_|
Quote:
And because the number of states a regular expression can describe is finite,
The number of states an FSM can *contain* is finite. The number of states an FSM can *describe* is infinite.
Quote:
and, in the general case, len(utterance) is unknown, presumed to be in the range [0,\infty], a palindrome could not be described using a regular expression?
Yes. But as we will see below, it is not the length that is the problem, it is the matching of the characters.
Quote:
If that's correct, I don't think I understand how the * and + quantifiers work, because they, as far as I understand, describe an utterance with len(utterance) \in [0,\infty] and len(utterance) \in [1,\infty], respectively.
Yes, they do. But they have no memory. There is no way to match in a general fashion (using one machine) the strings abba, abcba, and abbbba, all of which are palindromes. The problem is that if we don't know what to do until we get to the end and know what 'len' is. Once we know 'len', we can construct the right machine for that utterance. Alternatively, we can a stack, but that would most definitely not be an FSM, because the label on the arc would only be known during execution. In the first case, we can accept only 'a's after the second 'b', but in the second case we can accept a 'c' and in the third case another 'b'.

Of course, you can write a regexp if you know *all* the utterances that you must match. For example, the regexp 'abba|abcba|abbbba' works fine.

You might be interested in the paper at this link. It gets mathematical at times (lots of times), but it discusses how one translates REs used in searches to FSMs to really do the work.

Last edited by chaley; 09-23-2010 at 11:29 AM.
chaley is offline  
Old 09-23-2010, 11:12 AM   #27
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by megachirops View Post
An online regex tester that I've had good luck with:
http://www.pythonregex.com/
I do have Python installed, it's just that, since I was editing the post in Notepad++, I was too lazy to open another windows. I learned from that.
Manichean is offline  
Old 09-23-2010, 11:27 AM   #28
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by chaley View Post
Of course, you can write a regexp if you know *all* the utterances that you must match. For example, the regexp 'abba|abcba|abbbba' works fine.
I see what you did there
After reading your post, I believe my difficulties lie in understanding FSMs. Reading the introduction to the relevant Wikipedia entry obviously doesn't give an understanding of the subject... I'm going to read up on this, as this is relevant to my (private) interests, but it's going to have to be on the backburner for some while due to some Real Life interference.

Quote:
Originally Posted by chaley View Post
You might be interested in the paper at this linkthis link.It gets mathematical at times (lots of times), but it discusses how one translates REs used in searches to FSMs to really do the work.
Thanks for the link. I think, once I refresh my memory on some of the things I learned in computer sciences, that I should be able to understand that.
Manichean is offline  
Old 09-23-2010, 11:54 AM   #29
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Manichean View Post
Thanks for the link. I think, once I refresh my memory on some of the things I learned in computer sciences, that I should be able to understand that.
I've never had a CS course (or even a programming course), but I'd like to say thanks for the link, too. It was an interesting read. As far as I'm concerned, Charles can set <professorial mode = "ON"> at any time. I always learn something new.

As to the problem, I'm always up for a challenge, particularly one that tells me I "will fail." I'll quote the problem statement from the professor:
Quote:
For fun, try to write a regular expression that matches any palindrome. (http://en.wikipedia.org/wiki/Palindrome. Examples: abcdedcba or 'madam im adam' with spaces ignored.) You will fail.
The first rule of taking a test, even one for fun: read the problem statement very carefully. How about this regular expression for an answer:
Code:
.*
I believe it succeeds at matching "any palindrome."
Starson17 is offline  
Old 09-23-2010, 11:55 AM   #30
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Starson17 View Post
The first rule of taking a test, even one for fun: read the problem statement very carefully. How about this regular expression for an answer:
Code:
.*
I believe it succeeds at matching "any palindrome."
Yes, you're right. Hey, that means that all those theories chaley posted about are wrong! Quick, let's publish a paper!
Manichean is offline  
Closed Thread

Tags
regexp calibre tutorial

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with regular expressions Manichean Conversion 10 02-03-2011 02:27 PM
Custom Regular Expressions for adding book information bigbot3 Calibre 1 12-25-2010 06:28 PM
Help with Regular Expressions ghostyjack Workshop 2 01-08-2010 11:04 AM
Regular Expressions help needed Phil_C Workshop 20 10-03-2009 12:14 AM
BookDesigner v5 and regular expressions ShineOn Sony Reader 11 08-25-2008 04:06 PM


All times are GMT -4. The time now is 10:02 AM.


MobileRead.com is a privately owned, operated and funded community.