Easier way to remove repeating data in books?

Dullahir · 03-15-2013, 06:57 AM

Good day, question:

Is there an easier way to remove Authors and Page Numbers and Titles from books during conversions? the Search and Replace feature doesn't seem to find all the instances. In fact, it doesn't seem to be working as well as it did in previous builds.

Is there a simpler way to do this?

Dullahir · 03-15-2013, 07:45 AM

Update, I tried a lot of things but did not succeed. I've tried author , author, <a>author</a> and only succeeded with one document.

Fixed it. LOL. I am a dunce.

I think it may have been because I didn't include a space. this time I input Title and author and it worked quite well! (Take note for others who may wonder how to do this. xD)

New question, however: How would I use the Search and Replace features in the Metadata field? It doesn't seem to be working the same way? Or maybe it does? (I haven't used the and tags. lol. I forgot.)

Further experimentation resulted in:

It would seem that the RegEx MUST be case-sensitive according to the wizard. If you input title and the code is listed as title in the wizard's results, it won't work.

Also, may just be a fluke, but when I did title,author instead of author,title it won't work. But if I reciprocate that, it does. Really strange, but really enjoying the experimentation!! Quite fun. Now all I need is a Regex to remove page numbers.

*Goes off to donate again rofl.*

Edit with a question:

Okay, this is one thing I am stumped on. How would I provide a RegEx for the following code:

"<A name=2></a>Title 
by Author

I have successfully removed the Author, so now it just has the title, but I have no idea how to remove the title. Using "</a>Title doesn't seem accurate, Title didn't work, neither did Title .

Holy crap, I figured it out LOL.

I was unaware that when it's in this cyntax, the proper Regex for the Title would be

TITLE

Without html added. Interesting!!

Dullahir · 03-15-2013, 09:28 AM

By the way, there's a reason I'm reporting my experimentations and findings here: For the benefit of others who may be wondering how to do something like this. I dunno about you guys but I HATE when people say things like, 'Oh, I got it!! Nevermind' and don't even say HOW they did when SO many people ask. LOL.

Dullahir · 03-15-2013, 10:32 AM

What I have learned:

Most of the time, if there are no tags in author names or titles for example nothing like or , it's safe to just neglect them in the RegEx.

To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work.

Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again."

Any ideas? Because

Page [0-9][0-9] of 65 won't work,
Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.)

I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess!

itimpi · 03-15-2013, 11:21 AM

It is worth remembering the sequences /d* and /s* for removing sequences of numbers or whitespace. This can be particularly useful for removing something like headers and/or footers that contain page numbering, and possibly varying amounts of whitespace.

theducks · 03-15-2013, 08:43 PM

Quote:

Originally Posted by Dullahir

What I have learned:

Most of the time, if there are no tags in author names or titles for example nothing like or , it's safe to just neglect them in the RegEx.

To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work.

Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again."

Any ideas? Because

Page [0-9][0-9] of 65 won't work,
Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.)

I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess!

In your case:
11:30 wont be captured by \d+ You would need \d+\:\d+
12.5 will only capture the 12 \d+\.\d+ is needed

Page [0-9][0-9] of 65

for D+ to capture. ALL of the green has to be present including spaces

Which may be your problem. You have a normal space (%20) in your pattern.
What if it is a NBSP (What I would use so thing don't spread out)?

03-15-2013, 06:57 AM	#1
Dullahir Zealot Posts: 146 Karma: 13316 Join Date: Nov 2010 Location: Deva, Romania Device: Android	Easier way to remove repeating data in books? Good day, question: Is there an easier way to remove Authors and Page Numbers and Titles from books during conversions? the Search and Replace feature doesn't seem to find all the instances. In fact, it doesn't seem to be working as well as it did in previous builds. Is there a simpler way to do this?

03-15-2013, 07:45 AM	#2
Dullahir Zealot Posts: 146 Karma: 13316 Join Date: Nov 2010 Location: Deva, Romania Device: Android	Update, I tried a lot of things but did not succeed. I've tried <b>author</b> , <i>author</i>, <a><p>author</a></p> and only succeeded with one document. Fixed it. LOL. I am a dunce. I think it may have been because I didn't include a space. this time I input <i>Title </i> and <i>author </i> and it worked quite well! (Take note for others who may wonder how to do this. xD) New question, however: How would I use the Search and Replace features in the Metadata field? It doesn't seem to be working the same way? Or maybe it does? (I haven't used the <i> and </i> tags. lol. I forgot.) Further experimentation resulted in: It would seem that the RegEx MUST be case-sensitive according to the wizard. If you input <i>title</i> and the code is listed as <i>title </i> in the wizard's results, it won't work. Also, may just be a fluke, but when I did <i>title<i>,<i>author</i> instead of <i>author</i>,<i>title</i> it won't work. But if I reciprocate that, it does. Really strange, but really enjoying the experimentation!! Quite fun. Now all I need is a Regex to remove page numbers. Goes off to donate again rofl. Edit with a question: Okay, this is one thing I am stumped on. How would I provide a RegEx for the following code: "<A name=2></a>Title<br> <i>by Author</i> I have successfully removed the Author, so now it just has the title, but I have no idea how to remove the title. Using "</a>Title<br> doesn't seem accurate, <i>Title </i> didn't work, neither did <b>Title </b>. Holy crap, I figured it out LOL. I was unaware that when it's in this cyntax, the proper Regex for the Title would be TITLE Without html added. Interesting!! Last edited by Dullahir; 03-15-2013 at 09:22 AM.

03-15-2013, 10:32 AM	#4
Dullahir Zealot Posts: 146 Karma: 13316 Join Date: Nov 2010 Location: Deva, Romania Device: Android	What I have learned: Most of the time, if there are no tags in author names or titles for example nothing like <i> or <b>, it's safe to just neglect them in the RegEx. To remove numbers from books, the expression '\d' would supplement [0-9]. This would delete every integer from the book. I'm looking for another way, however. In the tutorial, it mentioned 'Page of Number'. Rare are the occassions when the books in my library have 'Page of Number' instead of just a regular number, so I have had trouble making the expression Page [0-9] of 65 work. Also, I don't think I am going to use the \d expression. While handy, what about when you see things like "11:30. He was late. Again." without the expression, but with it, you'd see ":. He was late. Again." Any ideas? Because Page [0-9][0-9] of 65 won't work, Page [0-9][0-9]+ of 65 won't work. (Double-expressions because of the double integers, I'm assuming.) I haven't tried [0-9]+ of 65, but I'm not really too hopeful on that, but it won't hurt to try, I guess! Last edited by Dullahir; 03-15-2013 at 10:35 AM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Repeating passages in pre-loaded books?	La Coccinelle	Kobo Reader	6	03-27-2011 06:28 AM
Remove Books - Restoring Confirm to remove books	Caffey	Calibre	6	09-20-2010 09:23 AM
LRF to ePUB -- Remove Repeating Text	mshneour	Calibre	14	05-03-2010 11:00 PM
Adding books to the kindle 2....has it gotten easier?	DeathtoToasters	Amazon Kindle	12	03-02-2009 08:45 PM
Office 2003/XP Add-in: Remove Hidden Data	Alexander Turcic	Lounge	0	01-07-2004 05:09 AM

03-15-2013, 09:28 AM	#3
Dullahir Zealot Posts: 146 Karma: 13316 Join Date: Nov 2010 Location: Deva, Romania Device: Android	By the way, there's a reason I'm reporting my experimentations and findings here: For the benefit of others who may be wondering how to do something like this. I dunno about you guys but I HATE when people say things like, 'Oh, I got it!! Nevermind' and don't even say HOW they did when SO many people ask. LOL.

03-15-2013, 11:21 AM	#5
itimpi Wizard Posts: 4,552 Karma: 950151 Join Date: Nov 2008 Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)	It is worth remembering the sequences /d* and /s* for removing sequences of numbers or whitespace. This can be particularly useful for removing something like headers and/or footers that contain page numbering, and possibly varying amounts of whitespace.

Advert

Advert