Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2013, 06:57 AM   #1
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
Easier way to remove repeating data in books?

Good day, question:

Is there an easier way to remove Authors and Page Numbers and Titles from books during conversions? the Search and Replace feature doesn't seem to find all the instances. In fact, it doesn't seem to be working as well as it did in previous builds.

Is there a simpler way to do this?
Dullahir is offline   Reply With Quote
Old 03-15-2013, 07:45 AM   #2
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
Update, I tried a lot of things but did not succeed. I've tried <b>author</b> , <i>author</i>, <a><p>author</a></p> and only succeeded with one document.

Fixed it. LOL. I am a dunce.

I think it may have been because I didn't include a space. this time I input <i>Title </i> and <i>author </i> and it worked quite well! (Take note for others who may wonder how to do this. xD)

New question, however: How would I use the Search and Replace features in the Metadata field? It doesn't seem to be working the same way? Or maybe it does? (I haven't used the <i> and </i> tags. lol. I forgot.)


Further experimentation resulted in:

It would seem that the RegEx MUST be case-sensitive according to the wizard. If you input <i>title</i> and the code is listed as <i>title </i> in the wizard's results, it won't work.

Also, may just be a fluke, but when I did <i>title<i>,<i>author</i> instead of <i>author</i>,<i>title</i> it won't work. But if I reciprocate that, it does. Really strange, but really enjoying the experimentation!! Quite fun. Now all I need is a Regex to remove page numbers.

*Goes off to donate again rofl.*


Edit with a question:

Okay, this is one thing I am stumped on. How would I provide a RegEx for the following code:

"<A name=2></a>Title<br>
<i>by Author</i>

I have successfully removed the Author, so now it just has the title, but I have no idea how to remove the title. Using "</a>Title<br> doesn't seem accurate, <i>Title </i> didn't work, neither did <b>Title </b>.


Holy crap, I figured it out LOL.

I was unaware that when it's in this cyntax, the proper Regex for the Title would be

TITLE

Without html added. Interesting!!

Last edited by Dullahir; 03-15-2013 at 09:22 AM.
Dullahir is offline   Reply With Quote
Advert
Old 03-15-2013, 09:28 AM   #3
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
By the way, there's a reason I'm reporting my experimentations and findings here: For the benefit of others who may be wondering how to do something like this. I dunno about you guys but I HATE when people say things like, 'Oh, I got it!! Nevermind' and don't even say HOW they did when SO many people ask. LOL.
Dullahir is offline   Reply With Quote
Old 03-15-2013, 10:32 AM   #4
Dullahir
Zealot
Dullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blueDullahir can differentiate black from dark navy blue
 
Dullahir's Avatar
 
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
What I have learned:

Most of the time, if there are no tags in author names or titles for example nothing like <i> or <b>, it's safe to just neglect them in the RegEx.

To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work.

Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again."

Any ideas? Because

Page [0-9][0-9] of 65 won't work,
Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.)

I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess!

Last edited by Dullahir; 03-15-2013 at 10:35 AM.
Dullahir is offline   Reply With Quote
Old 03-15-2013, 11:21 AM   #5
itimpi
Wizard
itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.itimpi ought to be getting tired of karma fortunes by now.
 
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
It is worth remembering the sequences /d* and /s* for removing sequences of numbers or whitespace. This can be particularly useful for removing something like headers and/or footers that contain page numbering, and possibly varying amounts of whitespace.
itimpi is offline   Reply With Quote
Advert
Old 03-15-2013, 08:43 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Dullahir View Post
What I have learned:

Most of the time, if there are no tags in author names or titles for example nothing like <i> or <b>, it's safe to just neglect them in the RegEx.

To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work.

Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again."

Any ideas? Because

Page [0-9][0-9] of 65 won't work,
Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.)

I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess!
In your case:
11:30 wont be captured by \d+ You would need \d+\:\d+
12.5 will only capture the 12 \d+\.\d+ is needed

Page [0-9][0-9] of 65

for D+ to capture. ALL of the green has to be present including spaces

Which may be your problem. You have a normal space (%20) in your pattern.
What if it is a NBSP (What I would use so thing don't spread out)?
theducks is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Repeating passages in pre-loaded books? La Coccinelle Kobo Reader 6 03-27-2011 06:28 AM
Remove Books - Restoring Confirm to remove books Caffey Calibre 6 09-20-2010 09:23 AM
LRF to ePUB -- Remove Repeating Text mshneour Calibre 14 05-03-2010 11:00 PM
Adding books to the kindle 2....has it gotten easier? DeathtoToasters Amazon Kindle 12 03-02-2009 08:45 PM
Office 2003/XP Add-in: Remove Hidden Data Alexander Turcic Lounge 0 01-07-2004 05:09 AM


All times are GMT -4. The time now is 04:36 PM.


MobileRead.com is a privately owned, operated and funded community.