![]() |
#31 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
It's a very understandable error for anyone familiar with wildcards, but not quantifiers. (Perhaps it's worth a brief comment in your excellent beginner's tutorial about the difference between wildcards and quantifiers.) |
|
![]() |
![]() |
![]() |
#32 |
Connoisseur
![]() ![]() ![]() ![]() ![]() Posts: 56
Karma: 484
Join Date: Sep 2010
Device: Kindle 3 & Sony PRS-950
|
I have a lot of lines looking like this:
0465002214_Cochran 11/20/08 2:41 PM Page xi What should the regex look like to remove those? And where do I put it? In Structure Detection I have ticked 'Remove Header' and 'Remove Footer'. I wonder what the chapter mark options do ('pagebreak', 'rule', 'both', 'none')? Last edited by varmemester; 09-28-2010 at 07:24 AM. |
![]() |
![]() |
![]() |
#33 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
You could try with the regular epxression Code:
\d+_Cochran\s+\d+/\d+/\d+\s+\d+:\d+\s+PM\s+Page\s+\w+ You only need to tick one of the removal options and then customize the regular expression to fit. The chapter mark option selects how detected chapter breaks are marked: either with a new page, a horizontal line, both, or none of the above. You should also get a helpful text explaining what a certain option does when you hover your mouse cursor above said option. |
|
![]() |
![]() |
![]() |
#34 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Jan 2011
Location: New Zealand
Device: Kobo
|
Hi,
I'm also having trouble with this implementation of regex. I've looked at the tutorial links, and I've used regex elsewhere before. I'm used to using ^ as meaning the start of the line and this is not working for me. I'm converting from pdf, the book in question is a free Doctor Who short story on the bbc web site. It has the bbc logo, the web url and another icon as header or footer on every page. It also has the page number, but I want to get rid of that. Trying to use (^\d+<br>) to match 2<br> at the start of a line only but it isn't working. If I remove the ^ it finds it but also page numbers from the contents page which are at the end of the line. Is there some other indicator of "start of line" that I should use? Cheers, Damian |
![]() |
![]() |
![]() |
#35 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Try looking for a newline character before the text you want to remove:
Code:
\n\d+<br> |
![]() |
![]() |
![]() |
#36 |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
Hey everyone im pretty new to regex coding as well but i've been reading about it and trying to figure it out... I have 2 things im tring to get rid of the abc amber lit converter and that aa bb pdf transform.
For the abc amber lit converter lines i've tried every piece of code on this thread and others ive found and tried all the logic i can think of its still just doesnt work. My other issue is pple talk about things going yellow when they are to be removed nothing ever gets higlighted in my calibre 7.40 even when i use a example from kovid. is that my issue or is that just a setting? (ive ticked the remove header box obviously) Can someone plse just give me a copy paste piece of code so i can sort this out? its KILLING ME! Last edited by Confuzzled; 01-19-2011 at 03:44 AM. |
![]() |
![]() |
![]() |
#37 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Quote:
|
|
![]() |
![]() |
![]() |
#38 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,553
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
This refers to when you are using the Wizard (the button the right of the box holding the regex expression) and have pressed the 'Test' button in the wizard. The text that matches the regex (if any) is then highlighted in yellow in the main window of the wizard.
|
![]() |
![]() |
![]() |
#39 | |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
Quote:
The code i played with is modifications of this code: <b.*?>\s*Generated\s+by\s+ABC\s+Amber\s+LIT.*?</b> which as far as i should i'm aware should match i came up with something to this affect but using <p> i.e. page break instead of <b> bold wasn't sure of my defining structure tho so took this kovid structure and then when that didnt work tried to edit it until it did. also tried removing <a> i.e the html link but it didnt work either. delphi was always my preference to python ![]() my problem is this repeating code <p class="calibre3"><b class="calibre1">Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html" class="calibre2">erter, http://www.processtext.com/abclit.html</a></b></p> thanx so much |
|
![]() |
![]() |
![]() |
#40 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
That's weird. I just tested the regex you gave with the string you gave, and as I expected, it matches with no problems. Are there any linebreaks in the XHTML that you edited out?
|
![]() |
![]() |
![]() |
#41 |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
another example of my issue if i'm trying to remove a standard page number not bold shouldnt:
(Page [0-9]+) work? it doesnt |
![]() |
![]() |
![]() |
#42 |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
I kno! im not crazy there is something wierd hey? should i reinstall calibre do u think?
|
![]() |
![]() |
![]() |
#43 |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
|
![]() |
![]() |
![]() |
#44 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
Again, that quite depends on the markup the page number actually has. The regex should work for any page number that is preceded by the word "Page ", which, actually, may be a bit too indiscriminate, as there might be references to pages in the text... anyway, you're absolutely sure that you're typing the regex correctly in the text box of the wizard and actually pressing the test button and scrolling down to see if anything gets highlighted?
|
![]() |
![]() |
![]() |
#45 |
Member
![]() Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
|
hundred percent sure.... i copied the code from the bar into my reply.... could i possibly attach the source document so u can try it in your calibre?
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Regex help to remove HTML footer | neonbible | Calibre | 4 | 09-09-2010 09:42 AM |
Regex to remove header from PDF | neonbible | Calibre | 4 | 09-07-2010 10:08 AM |
Removing header and footer | radicalnomad | Calibre | 2 | 08-26-2010 10:34 AM |
Header/Footer removal | Solicitous | Calibre | 2 | 03-30-2010 05:53 AM |
Multiline Regex Footer | hover | Calibre | 10 | 02-03-2010 04:23 AM |