Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 09-27-2010, 11:56 AM   #31
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Manichean View Post
Without intending any offense, at least in one point, it's the latter: You should have a look at where you put your quantifiers (they repeat the preceding characters).
This is actually a very common mistake. It's based on familiarity with wildcards, where the "*" is a character, whereas in regex it's a quantifier for something else. An ab initio reading of the explanation of regex "*" and "+" sometimes causes the user to think they are wildcards for "zero or more characters" and "one or more characters" instead of quantifiers meaning "zero or more of the preceding character(s)" and "one or more of the preceding character(s)."

It's a very understandable error for anyone familiar with wildcards, but not quantifiers. (Perhaps it's worth a brief comment in your excellent beginner's tutorial about the difference between wildcards and quantifiers.)
Starson17 is offline   Reply With Quote
Old 09-28-2010, 07:21 AM   #32
varmemester
Connoisseur
varmemester has a complete set of Star Wars action figures.varmemester has a complete set of Star Wars action figures.varmemester has a complete set of Star Wars action figures.varmemester has a complete set of Star Wars action figures.varmemester has a complete set of Star Wars action figures.
 
Posts: 56
Karma: 484
Join Date: Sep 2010
Device: Kindle 3 & Sony PRS-950
I have a lot of lines looking like this:
0465002214_Cochran 11/20/08 2:41 PM Page xi

What should the regex look like to remove those? And where do I put it?

In Structure Detection I have ticked 'Remove Header' and 'Remove Footer'. I wonder what the chapter mark options do ('pagebreak', 'rule', 'both', 'none')?

Last edited by varmemester; 09-28-2010 at 07:24 AM.
varmemester is offline   Reply With Quote
Old 09-28-2010, 07:46 AM   #33
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by varmemester View Post
I have a lot of lines looking like this:
0465002214_Cochran 11/20/08 2:41 PM Page xi

What should the regex look like to remove those? And where do I put it?
You put it into either the regular expression for footer field or the regular expression for header field. Also, if you only want to remove one line, only use one of the removal options.
You could try with the regular epxression
Code:
\d+_Cochran\s+\d+/\d+/\d+\s+\d+:\d+\s+PM\s+Page\s+\w+
For further information see the tutorial.

Quote:
Originally Posted by varmemester View Post
In Structure Detection I have ticked 'Remove Header' and 'Remove Footer'. I wonder what the chapter mark options do ('pagebreak', 'rule', 'both', 'none')?
You only need to tick one of the removal options and then customize the regular expression to fit. The chapter mark option selects how detected chapter breaks are marked: either with a new page, a horizontal line, both, or none of the above. You should also get a helpful text explaining what a certain option does when you hover your mouse cursor above said option.
Manichean is offline   Reply With Quote
Old 01-06-2011, 02:56 AM   #34
Techracer
Junior Member
Techracer began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2011
Location: New Zealand
Device: Kobo
Hi,
I'm also having trouble with this implementation of regex. I've looked at the tutorial links, and I've used regex elsewhere before. I'm used to using ^ as meaning the start of the line and this is not working for me.

I'm converting from pdf, the book in question is a free Doctor Who short story on the bbc web site. It has the bbc logo, the web url and another icon as header or footer on every page. It also has the page number, but I want to get rid of that.
Trying to use (^\d+<br>) to match
2<br>
at the start of a line only but it isn't working. If I remove the ^ it finds it but also page numbers from the contents page which are at the end of the line.

Is there some other indicator of "start of line" that I should use?

Cheers,
Damian
Techracer is offline   Reply With Quote
Old 01-06-2011, 05:47 AM   #35
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Try looking for a newline character before the text you want to remove:

Code:
\n\d+<br>
ldolse is offline   Reply With Quote
Old 01-19-2011, 03:38 AM   #36
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
Hey everyone im pretty new to regex coding as well but i've been reading about it and trying to figure it out... I have 2 things im tring to get rid of the abc amber lit converter and that aa bb pdf transform.

For the abc amber lit converter lines i've tried every piece of code on this thread and others ive found and tried all the logic i can think of its still just doesnt work.

My other issue is pple talk about things going yellow when they are to be removed nothing ever gets higlighted in my calibre 7.40 even when i use a example from kovid. is that my issue or is that just a setting? (ive ticked the remove header box obviously)

Can someone plse just give me a copy paste piece of code so i can sort this out? its KILLING ME!

Last edited by Confuzzled; 01-19-2011 at 03:44 AM.
Confuzzled is offline   Reply With Quote
Old 01-19-2011, 03:45 AM   #37
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Confuzzled View Post
Hey everyone im pretty new to regex coding as well but i've been reading about it and trying to figure it out... I have 2 things im tring to get rid of the abc amber lit converter and that aa bb pdf transform. For the abc amber lit converter lines i've tried every piece of code on this thread and others ive found and tried all the logic i can think of its still just doesnt work. also pple talk about things going yellow when they are to be removed nothing ever gets higlighted in my calibre 7.40 even when i use a example from kovid. is that my issue or is that just a setting? (ive ticked the remove header box obviously) can someone plse just give me a copy paste pieve of code so i can sort this out? its KILLING ME!
You did read the tutorial, didn't you? Also, the Amber LIT converter headers are notorious for changing their markup (what's written in the XHTML) from document to document, sometimes even inside one document. So in order to help you, we'd at least need an example of the XHTML you want removed. Also, a little more precise description than "things don't turn yellow" would help, as in in addition to the regexes you found and tested, what, as you say, "additional logic" did you try?
Manichean is offline   Reply With Quote
Old 01-19-2011, 04:13 AM   #38
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,058
Karma: 777825
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
Quote:
Originally Posted by Confuzzled View Post
My other issue is pple talk about things going yellow when they are to be removed nothing ever gets higlighted in my calibre 7.40 even when i use a example from kovid. is that my issue or is that just a setting? (ive ticked the remove header box obviously)
This refers to when you are using the Wizard (the button the right of the box holding the regex expression) and have pressed the 'Test' button in the wizard. The text that matches the regex (if any) is then highlighted in yellow in the main window of the wizard.
itimpi is offline   Reply With Quote
Old 01-19-2011, 05:45 AM   #39
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
Quote:
Originally Posted by itimpi View Post
This refers to when you are using the Wizard (the button the right of the box holding the regex expression) and have pressed the 'Test' button in the wizard. The text that matches the regex (if any) is then highlighted in yellow in the main window of the wizard.
yeah I did use the test wizard sorry if i wasnt clear... Thats the oddity no yellow even when i just put in a simple string which according to the user manual should come up.

The code i played with is modifications of this code:
<b.*?>\s*Generated\s+by\s+ABC\s+Amber\s+LIT.*?</b> which as far as i should i'm aware should match i came up with something to this affect but using <p> i.e. page break instead of <b> bold wasn't sure of my defining structure tho so took this kovid structure and then when that didnt work tried to edit it until it did.

also tried removing <a> i.e the html link but it didnt work either. delphi was always my preference to python

my problem is this repeating code
<p class="calibre3"><b class="calibre1">Generated by ABC Amber LIT Conv<a href="http://www.processtext.com/abclit.html" class="calibre2">erter, http://www.processtext.com/abclit.html</a></b></p>

thanx so much
Confuzzled is offline   Reply With Quote
Old 01-19-2011, 05:57 AM   #40
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
That's weird. I just tested the regex you gave with the string you gave, and as I expected, it matches with no problems. Are there any linebreaks in the XHTML that you edited out?
Manichean is offline   Reply With Quote
Old 01-19-2011, 06:00 AM   #41
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
another example of my issue if i'm trying to remove a standard page number not bold shouldnt:
(Page [0-9]+)
work? it doesnt
Confuzzled is offline   Reply With Quote
Old 01-19-2011, 06:02 AM   #42
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
I kno! im not crazy there is something wierd hey? should i reinstall calibre do u think?
Confuzzled is offline   Reply With Quote
Old 01-19-2011, 06:03 AM   #43
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
Quote:
Originally Posted by Manichean View Post
That's weird. I just tested the regex you gave with the string you gave, and as I expected, it matches with no problems. Are there any linebreaks in the XHTML that you edited out?
no thats exactly as it is in the code!
Confuzzled is offline   Reply With Quote
Old 01-19-2011, 06:08 AM   #44
Manichean
Wizard
Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!Manichean My eyes! My eyes! The light is just too bright!
 
Manichean's Avatar
 
Posts: 3,130
Karma: 80446
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by Confuzzled View Post
another example of my issue if i'm trying to remove a standard page number not bold shouldnt:
(Page [0-9]+)
work? it doesnt
Again, that quite depends on the markup the page number actually has. The regex should work for any page number that is preceded by the word "Page ", which, actually, may be a bit too indiscriminate, as there might be references to pages in the text... anyway, you're absolutely sure that you're typing the regex correctly in the text box of the wizard and actually pressing the test button and scrolling down to see if anything gets highlighted?
Manichean is offline   Reply With Quote
Old 01-19-2011, 06:37 AM   #45
Confuzzled
Member
Confuzzled began at the beginning.
 
Posts: 13
Karma: 12
Join Date: Jan 2011
Device: Samsung Galaxy Tab
hundred percent sure.... i copied the code from the bar into my reply.... could i possibly attach the source document so u can try it in your calibre?
Confuzzled is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex help to remove HTML footer neonbible Calibre 4 09-09-2010 09:42 AM
Regex to remove header from PDF neonbible Calibre 4 09-07-2010 10:08 AM
Removing header and footer radicalnomad Calibre 2 08-26-2010 10:34 AM
Header/Footer removal Solicitous Calibre 2 03-30-2010 05:53 AM
Multiline Regex Footer hover Calibre 10 02-03-2010 04:23 AM


All times are GMT -4. The time now is 10:30 AM.


MobileRead.com is a privately owned, operated and funded community.