Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-30-2010, 12:51 AM   #16
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by tonyx3 View Post
Ahh, I'm glad someone with some know-how identified the issue.
All of the people who posted on this topic have "some know-how". However, not everyone is willing or able to volunteer a couple of hours of his/her time looking at a problem that is of little personal interest.
Quote:
So are you saying that if the regex is valid across lines, it will work, even if the regex tester in calibre doesn't highlight across lines?
Yes. For example, the regexp I posted removes the multiline footers on OP's PDF.
chaley is offline   Reply With Quote
Old 04-30-2010, 01:21 AM   #17
tonyx3
Connoisseur
tonyx3 began at the beginning.
 
Posts: 55
Karma: 10
Join Date: Jan 2010
Device: Nexus One
Ok, so I suppose I didn't word that very tactfully. I was just referring that until you came along with an explanation of what the technical cause of the issue was, we all were just applying out knowledge of using regex, but none of us could get it to work, and none of us knew why. And it's been an issue in multiple threads, and a couple bug tickets, so it's not just an isolated issue in this thread.

In any case, thanks for spending your time on it.
tonyx3 is offline   Reply With Quote
Advert
Old 04-30-2010, 10:47 AM   #18
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
I gave a fix for the regexp tester to Kovid (ticket #5414). I imagine that the fix (or something like it) will be released without too much delay.
chaley is offline   Reply With Quote
Old 04-30-2010, 05:18 PM   #19
adolson
Member
adolson began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
I see that Kovid accepted the fix and will put it in the next release, which I guess will be pretty soon, looking at the release history (I am 1 day old to Calibre and eBooks in general - this is an impressive project, and the frequent releases and fast development are amazing to me).

I may just wait until then, but in the meantime, I just installed the latest version and am running Ubuntu 10.04. Here's a clip of my PDF from the regex test page:

Quote:
anyone else to know I was using it. Ball Tongue and I would go score some on the sly, after band practice or some other time when the rest of the guys weren’t around. It went on that way for a few 50</p><p>
the final piece</p><p>

months, until I found out something interesting: Munky was doing speed on the days <i>he </i> was off, too. </p><p>
And guess what? So was Jonathan. </p><p>
The red is the part I want to get rid of. That's the page number (footer of the book) and the chapter or title of (header of each page). If I separate them into two, I can't come up with a header regex that works for removing the header line, because it matches the content (some characters and then a </p><p>).

Here is what I tried, and have been converting it to TXT format for quick viewing, though EPUB results the same:

(?ism)\d+</p><p>.*?</p><p>$

(?m)(\d+</p><p>.*?</p><p>)

(?mi)(\d+</p><p>.*?</p><p>$)

(?mi)(\d+</p><p>$^.*?</p><p>)

...and many other variants...

I based these ideas on the regex given on page 1 that was said to work for multi-line, but I can't figure it out. I'm sure it's something obvious that I'm doing wrong, too. Can anyone help?
adolson is offline   Reply With Quote
Old 04-30-2010, 10:03 PM   #20
adolson
Member
adolson began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
OK, so, my regex looks right... In the regex tester of the new release, .51, it highlights EXACTLY what I want to remove.

(?ism)(\d+</p><p>.*?</p><p>)

However, it doesn't actually work when I do the conversion, and yes, I did check off the box...

Edit: OK, I can't actually seem to get any regex to work, now. Do I need to install something in particular for it to work?

Last edited by adolson; 05-01-2010 at 12:23 AM.
adolson is offline   Reply With Quote
Advert
Old 05-01-2010, 12:26 AM   #21
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
1) If it did work, it would be quite dangerous - it could easily remove text you don't want removed.

2) I still don't understand why most of you people keep using the "m" flag, which is NOT suitable for the usage cases displayed in this thread. For example, Adolson's regexp should only use ?is, not ?ism.
pepak is offline   Reply With Quote
Old 05-01-2010, 01:40 AM   #22
adolson
Member
adolson began at the beginning.
 
Posts: 15
Karma: 10
Join Date: Apr 2010
Device: PRS-300
Quote:
Originally Posted by pepak View Post
1) If it did work, it would be quite dangerous - it could easily remove text you don't want removed.

2) I still don't understand why most of you people keep using the "m" flag, which is NOT suitable for the usage cases displayed in this thread. For example, Adolson's regexp should only use ?is, not ?ism.
1) When I put in my query, the Regex Builder highlights only stuff that I want removed. So therefore, it should work, right?

2) I put the m because chaley's post indicated a regex that was supposed to work, and I based mine on that. I tried without the m as well, it doesn't work either.

This one appears to have worked in vim, using the html generated by the debug output.
:%s/[0-9]\{1,}<\/p><p>\n\(\s*\S\)\{5,}<\/p><p>//g

Last edited by adolson; 05-01-2010 at 03:24 AM.
adolson is offline   Reply With Quote
Old 05-01-2010, 10:24 AM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Note that the HTML displayed in the regex builder is not absolutely identical to the html that is used in the conversion process, especially with regard to whitespace. So you have to make your regex tolerate differences in whitespace.
kovidgoyal is offline   Reply With Quote
Old 05-01-2010, 11:57 AM   #24
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by pepak View Post
2) I still don't understand why most of you people keep using the "m" flag, which is NOT suitable for the usage cases displayed in this thread. For example, Adolson's regexp should only use ?is, not ?ism.
Why is it not suitable? Multiline makes '^' and '$' match around internal newlines (see here for details), something useful and in fact suitable for the regexp I used to solve the OP's problem. What behavior are you objecting to?
chaley is offline   Reply With Quote
Old 05-01-2010, 09:54 PM   #25
prky
Member
prky began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2009
Device: IPhone 3GS
Quote:
Originally Posted by pepak View Post
I am not surprised. I told you not to use ^ and $.
Cheers for that.

Following your suggestions, I looked up the behaviour of ^ and $ in python when using multi-line regex, and found that whilst they do also apply to the start / termination of a string, they also still apply to the start / end of a line.

In combination with the s attribute (which makes .* match across multiple lines) and the suggestion of .*? (for a minimalist match), I had a regex which worked in the python tester.

The bit I was missing was that I wasn't testing each regex on the conversion each time, as I was using the highlighting in the regex builder to see if it was matching.

Thanks for your assistance.

prk.
prky is offline   Reply With Quote
Old 05-01-2010, 09:56 PM   #26
prky
Member
prky began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2009
Device: IPhone 3GS
Quote:
Originally Posted by chaley View Post
There is in fact a problem here, but not the one the OP was suggesting.

Second (the problem): the regexp tester does not show multi-line matches. The problem is that it uses the QSyntaxHighlighter, which my experimentation shows to be a line-oriented interface, making highlighting multiline matches impossible. I think that the regex texter should match directly against the text in the QTextEdit box, using something like setTextBackgroundColor to indicate matches. I admit that I haven't hacked the code to try this idea, but it seems plausible. I will file a ticket on this so that someone more acquainted than I am with the widgets can think about a solution.
Sweet.

Thank you so much for diagnosing that, and working out it's the display in the tester which was the issue (once I'd eventually got a working regex).

Much appreciated.

prk.
prky is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
HTML Conversion - Multiline Headers prky Calibre 1 07-03-2010 09:24 AM
What a regex is Worldwalker Calibre 20 05-10-2010 05:51 AM
Help with a regex A.T.E. Calibre 1 04-05-2010 07:50 AM
Multiline Regex Footer hover Calibre 10 02-03-2010 04:23 AM
Regex help... Bobthebass Workshop 6 04-26-2009 03:54 PM


All times are GMT -4. The time now is 06:41 AM.


MobileRead.com is a privately owned, operated and funded community.