03-15-2013, 06:57 AM | #1 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
Easier way to remove repeating data in books?
Good day, question:
Is there an easier way to remove Authors and Page Numbers and Titles from books during conversions? the Search and Replace feature doesn't seem to find all the instances. In fact, it doesn't seem to be working as well as it did in previous builds. Is there a simpler way to do this? |
03-15-2013, 07:45 AM | #2 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
Update, I tried a lot of things but did not succeed. I've tried <b>author</b> , <i>author</i>, <a><p>author</a></p> and only succeeded with one document.
Fixed it. LOL. I am a dunce. I think it may have been because I didn't include a space. this time I input <i>Title </i> and <i>author </i> and it worked quite well! (Take note for others who may wonder how to do this. xD) New question, however: How would I use the Search and Replace features in the Metadata field? It doesn't seem to be working the same way? Or maybe it does? (I haven't used the <i> and </i> tags. lol. I forgot.) Further experimentation resulted in: It would seem that the RegEx MUST be case-sensitive according to the wizard. If you input <i>title</i> and the code is listed as <i>title </i> in the wizard's results, it won't work. Also, may just be a fluke, but when I did <i>title<i>,<i>author</i> instead of <i>author</i>,<i>title</i> it won't work. But if I reciprocate that, it does. Really strange, but really enjoying the experimentation!! Quite fun. Now all I need is a Regex to remove page numbers. *Goes off to donate again rofl.* Edit with a question: Okay, this is one thing I am stumped on. How would I provide a RegEx for the following code: "<A name=2></a>Title<br> <i>by Author</i> I have successfully removed the Author, so now it just has the title, but I have no idea how to remove the title. Using "</a>Title<br> doesn't seem accurate, <i>Title </i> didn't work, neither did <b>Title </b>. Holy crap, I figured it out LOL. I was unaware that when it's in this cyntax, the proper Regex for the Title would be TITLE Without html added. Interesting!! Last edited by Dullahir; 03-15-2013 at 09:22 AM. |
Advert | |
|
03-15-2013, 09:28 AM | #3 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
By the way, there's a reason I'm reporting my experimentations and findings here: For the benefit of others who may be wondering how to do something like this. I dunno about you guys but I HATE when people say things like, 'Oh, I got it!! Nevermind' and don't even say HOW they did when SO many people ask. LOL.
|
03-15-2013, 10:32 AM | #4 |
Zealot
Posts: 146
Karma: 13316
Join Date: Nov 2010
Location: Deva, Romania
Device: Android
|
What I have learned:
Most of the time, if there are no tags in author names or titles for example nothing like <i> or <b>, it's safe to just neglect them in the RegEx. To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work. Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again." Any ideas? Because Page [0-9][0-9] of 65 won't work, Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.) I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess! Last edited by Dullahir; 03-15-2013 at 10:35 AM. |
03-15-2013, 11:21 AM | #5 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
It is worth remembering the sequences /d* and /s* for removing sequences of numbers or whitespace. This can be particularly useful for removing something like headers and/or footers that contain page numbering, and possibly varying amounts of whitespace.
|
Advert | |
|
03-15-2013, 08:43 PM | #6 | |
Well trained by Cats
Posts: 29,802
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
11:30 wont be captured by \d+ You would need \d+\:\d+ 12.5 will only capture the 12 \d+\.\d+ is needed Page [0-9][0-9] of 65 for D+ to capture. ALL of the green has to be present including spaces Which may be your problem. You have a normal space (%20) in your pattern. What if it is a NBSP (What I would use so thing don't spread out)? |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Repeating passages in pre-loaded books? | La Coccinelle | Kobo Reader | 6 | 03-27-2011 06:28 AM |
Remove Books - Restoring Confirm to remove books | Caffey | Calibre | 6 | 09-20-2010 09:23 AM |
LRF to ePUB -- Remove Repeating Text | mshneour | Calibre | 14 | 05-03-2010 11:00 PM |
Adding books to the kindle 2....has it gotten easier? | DeathtoToasters | Amazon Kindle | 12 | 03-02-2009 08:45 PM |
Office 2003/XP Add-in: Remove Hidden Data | Alexander Turcic | Lounge | 0 | 01-07-2004 05:09 AM |