04-30-2010, 12:51 AM | #16 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
All of the people who posted on this topic have "some know-how". However, not everyone is willing or able to volunteer a couple of hours of his/her time looking at a problem that is of little personal interest.
Quote:
|
|
04-30-2010, 01:21 AM | #17 |
Connoisseur
Posts: 55
Karma: 10
Join Date: Jan 2010
Device: Nexus One
|
Ok, so I suppose I didn't word that very tactfully. I was just referring that until you came along with an explanation of what the technical cause of the issue was, we all were just applying out knowledge of using regex, but none of us could get it to work, and none of us knew why. And it's been an issue in multiple threads, and a couple bug tickets, so it's not just an isolated issue in this thread.
In any case, thanks for spending your time on it. |
Advert | |
|
04-30-2010, 05:18 PM | #19 | |
Member
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
|
I see that Kovid accepted the fix and will put it in the next release, which I guess will be pretty soon, looking at the release history (I am 1 day old to Calibre and eBooks in general - this is an impressive project, and the frequent releases and fast development are amazing to me).
I may just wait until then, but in the meantime, I just installed the latest version and am running Ubuntu 10.04. Here's a clip of my PDF from the regex test page: Quote:
Here is what I tried, and have been converting it to TXT format for quick viewing, though EPUB results the same: (?ism)\d+</p><p>.*?</p><p>$ (?m)(\d+</p><p>.*?</p><p>) (?mi)(\d+</p><p>.*?</p><p>$) (?mi)(\d+</p><p>$^.*?</p><p>) ...and many other variants... I based these ideas on the regex given on page 1 that was said to work for multi-line, but I can't figure it out. I'm sure it's something obvious that I'm doing wrong, too. Can anyone help? |
|
04-30-2010, 10:03 PM | #20 |
Member
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
|
OK, so, my regex looks right... In the regex tester of the new release, .51, it highlights EXACTLY what I want to remove.
(?ism)(\d+</p><p>.*?</p><p>) However, it doesn't actually work when I do the conversion, and yes, I did check off the box... Edit: OK, I can't actually seem to get any regex to work, now. Do I need to install something in particular for it to work? Last edited by adolson; 05-01-2010 at 12:23 AM. |
Advert | |
|
05-01-2010, 12:26 AM | #21 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
1) If it did work, it would be quite dangerous - it could easily remove text you don't want removed.
2) I still don't understand why most of you people keep using the "m" flag, which is NOT suitable for the usage cases displayed in this thread. For example, Adolson's regexp should only use ?is, not ?ism. |
05-01-2010, 01:40 AM | #22 | |
Member
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
|
Quote:
2) I put the m because chaley's post indicated a regex that was supposed to work, and I based mine on that. I tried without the m as well, it doesn't work either. This one appears to have worked in vim, using the html generated by the debug output. :%s/[0-9]\{1,}<\/p><p>\n\(\s*\S\)\{5,}<\/p><p>//g Last edited by adolson; 05-01-2010 at 03:24 AM. |
|
05-01-2010, 10:24 AM | #23 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Note that the HTML displayed in the regex builder is not absolutely identical to the html that is used in the conversion process, especially with regard to whitespace. So you have to make your regex tolerate differences in whitespace.
|
05-01-2010, 11:57 AM | #24 | |
Grand Sorcerer
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Quote:
|
|
05-01-2010, 09:54 PM | #25 |
Member
Posts: 11
Karma: 10
Join Date: Nov 2009
Device: IPhone 3GS
|
Cheers for that.
Following your suggestions, I looked up the behaviour of ^ and $ in python when using multi-line regex, and found that whilst they do also apply to the start / termination of a string, they also still apply to the start / end of a line. In combination with the s attribute (which makes .* match across multiple lines) and the suggestion of .*? (for a minimalist match), I had a regex which worked in the python tester. The bit I was missing was that I wasn't testing each regex on the conversion each time, as I was using the highlighting in the regex builder to see if it was matching. Thanks for your assistance. prk. |
05-01-2010, 09:56 PM | #26 | |
Member
Posts: 11
Karma: 10
Join Date: Nov 2009
Device: IPhone 3GS
|
Quote:
Thank you so much for diagnosing that, and working out it's the display in the tester which was the issue (once I'd eventually got a working regex). Much appreciated. prk. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
HTML Conversion - Multiline Headers | prky | Calibre | 1 | 07-03-2010 09:24 AM |
What a regex is | Worldwalker | Calibre | 20 | 05-10-2010 05:51 AM |
Help with a regex | A.T.E. | Calibre | 1 | 04-05-2010 07:50 AM |
Multiline Regex Footer | hover | Calibre | 10 | 02-03-2010 04:23 AM |
Regex help... | Bobthebass | Workshop | 6 | 04-26-2009 03:54 PM |