Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-11-2011, 03:51 PM   #1
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Importing Version Info...Possible?

So I created a user made column called version in which I would like the version info that is usually in the filenames of the books to be added:

Example: George RR Martin - Ice & Fire 1 - Game of Thrones (v5.0).epub

Is it possible to import the version info to the user created column and if so how?

When you go to the test field for import it doesn't seem to let you create a test field for it that I know of so that you can check if your work is correct or not.

This is what i tried and a few variations of:

^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)(\s*-\s*(\[?(?P<version>[^-]+)

with and with out a # sign in front of version.

so I don't know if the regex is incorrect and or it is or isn't possible to even import to user created columns.

I think it would be very useful in importing large collections with duplicates of titles indifferent versions to not strip the version info and have it go into its own column so you can out put it with or without. Its also great in combination with find duplicates with fuzzy logic... you can then get rid of old versions.

I have currently been importing them with:

^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)

so that it will import the version info as part of the title like so:

Author: George RR Martin
Title: Game of Thrones (v5.0)
Series: Ice & Fire
Series Index: 1

I stopped using this:

^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^\]{[()]+\w)

which strips all brackets including version info...

i prestrip all brackets other than version bracket info and clean up files with flash renamer pre-import to calibre.

The issue with having version info in the title is that when you go to get metadata it can throw it in to a loop and not recognize the title.

Does anyone else find this request useful for them as well? or would also like the ability to do that?

I do a lot of filename clean up in bulk with regex, wildcards and commands that can be stored and run in batch with Opus and Flash Renamer.

for example a file like this before running my batch command might look like this:

MCMARTIN, GeoRge R. R. - {Songs OF Fire & ICE 01] - a_game_of_THRONes [unabridged] (V5.1) {epub}.epub

will get fixed to this:

George RR McMartin - Songs of Fire & Ice 01 - A Game of Thrones (v5.1).epub

And i can run it against thousands of files at once.

Another useful tool is ExtractNow which will bulk extract archives to their respective folder and subfolders to a folder of your choice including delete the archives after if you want without manually having to go into each folder/subfolder. Pretty useful for some downloaded collections.

If anyone is interested in any of the other software batch commands i have set up for them just pm me and ill be happy to help.

one other problem its a modification of what i'm using in a find and replace:

this is what im using:

INFO-----: Swap Lastname, Firstname if in front of a title or series with a dash(-)
EXAMPLE-: Carlin, George P - Stupid Jokes 1 - Your Mama!.epup
RESULT--: George P Carlin - Stupid Jokes 1 - Your Mama!.epub
FIND-----: ^(\w+), *([\w \.]+)[ ]+-[ ]*(.*)
REPLACE-: \2 \1 - \3
OR-------: (Depending on what program your using)
REPLACE-: $2 $1 - $3

But it won't work on file names that are like this:

Carlin, George P & Swift, Taylor L. L. - Stupid Jokes 1 - Your Mama!.epup

The intials could have periods. The file should end up looking like this.

George P Carlin & Taylor L. L. Swift - Stupid Jokes 1 - Your Mama!.epup

I need one that can do either both multi and single namess or a sepperate one that can handle multinames whichever is easier.

any ideas? Thanks for your time whoever solves the problem.

Last edited by penguinaka; 06-11-2011 at 04:05 PM.
penguinaka is offline   Reply With Quote
Old 06-12-2011, 07:03 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by penguinaka View Post
So I created a user made column called version in which I would like the version info that is usually in the filenames of the books to be added:

Example: George RR Martin - Ice & Fire 1 - Game of Thrones (v5.0).epub

Is it possible to import the version info to the user created column and if so how?
It's not currently possible in one step. The importer code has no access to custom columns.

Quote:
When you go to the test field for import it doesn't seem to let you create a test field for it that I know of so that you can check if your work is correct or not.
Correct. AFAIK, all fields you can import into are listed in the tester.
Quote:
This is what i tried and a few variations of:

^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)(\s*-\s*(\[?(?P<version>[^-]+)

with and with out a # sign in front of version.

so I don't know if the regex is incorrect and or it is or isn't possible to even import to user created columns.

I think it would be very useful in importing large collections with duplicates of titles indifferent versions to not strip the version info and have it go into its own column so you can out put it with or without. Its also great in combination with find duplicates with fuzzy logic... you can then get rid of old versions.

I have currently been importing them with:

^(?P<author>[^-]+)(\s*-\s*(\[?(?P<series>[^-0-9]+)\s*(?P<series_index>[0-9.]+)?]?)?)?.*?-\s*(?P<title>[^-]+)

so that it will import the version info as part of the title like so:

Author: George RR Martin
Title: Game of Thrones (v5.0)
Series: Ice & Fire
Series Index: 1
Run the title through Search and Replace and you can grab version info and send it to your custom column and strip it from the title.
Starson17 is offline   Reply With Quote
Advert
Old 06-12-2011, 09:02 PM   #3
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Another Question: What about Importing The version into the publisher info field.... is it possible the data can then be transferred into the user created column in bulk by some command?

Then if the info is imported into calibre as publisher info will it try to merge a duplicate book if it is set to merge but the publisher/version info is different?

for example 2 books...

george rr martin - game of thrones (v1.5).epub
george rr martin - game of thrones (v5.0).epub
george rr martin - game of thrones (v4.0).mobi

These are obviously different version... the 5.0 being an improvement in quality. If i have it set so that the books will import with the version info going into the publisher info will it attempt to merge them?

Last edited by penguinaka; 06-12-2011 at 09:05 PM.
penguinaka is offline   Reply With Quote
Old 06-12-2011, 09:44 PM   #4
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,778
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by penguinaka View Post
Another Question: What about Importing The version into the publisher info field.... is it possible the data can then be transferred into the user created column in bulk by some command?

Then if the info is imported into calibre as publisher info will it try to merge a duplicate book if it is set to merge but the publisher/version info is different?

for example 2 books...

george rr martin - game of thrones (v1.5).epub
george rr martin - game of thrones (v5.0).epub
george rr martin - game of thrones (v4.0).mobi

These are obviously different version... the 5.0 being an improvement in quality. If i have it set so that the books will import with the version info going into the publisher info will it attempt to merge them?
Yes they will merge without consideration of version level.
Only including Version in the Title will make separate entries.
theducks is offline   Reply With Quote
Old 06-12-2011, 09:45 PM   #5
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by theducks View Post
Yes they will merge without consideration of version level.
Only including Version in the Title will make separate entries.
Thanks Ducks...guess i need to leave it in the title for now
penguinaka is offline   Reply With Quote
Advert
Old 06-13-2011, 05:32 AM   #6
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Surely if you turn off auto merge you will just get prompted about the duplicate, in which case you can tell the prompt not to merge?
kiwidude is offline   Reply With Quote
Old 06-13-2011, 05:33 AM   #7
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by penguinaka View Post
Another Question: What about Importing The version into the publisher info field.... is it possible the data can then be transferred into the user created column in bulk by some command?
Sure, you can do that in bulk metadata search & replace. Use regex mode. As for the other comments, see theducks' earlier post.
Manichean is offline   Reply With Quote
Old 06-13-2011, 08:21 AM   #8
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by kiwidude View Post
Surely if you turn off auto merge you will just get prompted about the duplicate, in which case you can tell the prompt not to merge?
I was hoping not to have to do that. Basically what i was wishing for was to keep the merge on so that it would merge duplicates unless it had a different version number without it having to be in the title so It wouldn't screw up a metadata retrieval.

I guess i'll have to do it in a few steps like the suggestion earlier in the post. I appreciate all the feedback from everyone thank you.

If the series is different then a duplicate is not merged correct? or for example 1 has series info the other doesn't?

Quote:
Originally Posted by Manichean View Post
Sure, you can do that in bulk metadata search & replace. Use regex mode. As for the other comments, see theducks' earlier post.
Thanks Manichean!
penguinaka is offline   Reply With Quote
Old 06-13-2011, 09:18 AM   #9
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Quote:
Originally Posted by penguinaka View Post
I was hoping not to have to do that. Basically what i was wishing for was to keep the merge on so that it would merge duplicates unless it had a different version number without it having to be in the title so It wouldn't screw up a metadata retrieval.

I guess i'll have to do it in a few steps like the suggestion earlier in the post. I appreciate all the feedback from everyone thank you.

If the series is different then a duplicate is not merged correct? or for example 1 has series info the other doesn't?
Be very, very careful about using automerge. Certain things in the title are ignored/stripped, so you might find that even putting extra stuff in the title in an attempt to differentiate them does not achieve what you want.

To illustrate that point, series is certainly not considered, only the title and author.

Remember that whether it is automerge, calibre's duplicate detection or the Find Duplicates plugin, a duplicate is determined by a portion of its title. The granularity of that comparison can only be controlled by the Find Duplicates plugin.

What are you trying to achieve? Just keep the highest version? Keep all versions with the version in a column?

I think you are in for a world of pain whichever route you take. I don't think that automerge being on should be something you should consider unless you are very specific about what you are adding though.
kiwidude is offline   Reply With Quote
Old 06-13-2011, 10:39 AM   #10
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by kiwidude View Post
Be very, very careful about using automerge. Certain things in the title are ignored/stripped, so you might find that even putting extra stuff in the title in an attempt to differentiate them does not achieve what you want.

To illustrate that point, series is certainly not considered, only the title and author.

Remember that whether it is automerge, calibre's duplicate detection or the Find Duplicates plugin, a duplicate is determined by a portion of its title. The granularity of that comparison can only be controlled by the Find Duplicates plugin.

What are you trying to achieve? Just keep the highest version? Keep all versions with the version in a column?

I think you are in for a world of pain whichever route you take. I don't think that automerge being on should be something you should consider unless you are very specific about what you are adding though.
what is stripped in the title? I thought that had to do with what import script u use. like the one in my signature strips everything with brackets around it in the title but the one i'm using now doesn't. it just removes brackets from around the series leaving others alone.

I was under the impression that auto merge. grouped together the same name but different file types and if it was the same name and file type that it kept the first one it comes to and discards the next ones. The definition of same file type was if the author, series and title and extension matched. yes? and your saying series isn't considered... i thought it was.

As far as what i was trying to achieve with version. I was going to use the find duplicate with fuzy logic on both settings then pick and choose which versions to keep. That being highest version with consideration as to if it was a v5 pdf (they convert poorly) better to keep it in its orginal fomat.

everything is getting converted to mobi but i'm keeping copies of .epub's, & v5 .pdf's everything else i'm gets deleted..(of course i keep a copy of the orginal backup pre-deletions).
penguinaka is offline   Reply With Quote
Old 06-13-2011, 10:52 AM   #11
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Don't confuse the regex you use with the filename with the logic Calibre has internally to decide whether to treat two books as duplicates for automerge purposes. Automerge has some "fuzziness" in its comparison of book titles, stripping a whole bunch of characters like brackets, punctuation, title sort characters like "the, a, an" etc. It is certainly not an "exact match" experience.

It will not throw away numeric values (but will the periods separating them). You might "get lucky" and provided every book you import has enough different characters you get it to do what you want. However you are absolutely playing with fire with this, and as the saying goes you may get burned.

It has its purposes - the best usage of it imho is when you have another format of an existing book in your library that you want to add. Say you have a book record in MOBI format, and now you get an EPUB version from somewhere. For that purpose AutoMerge is brilliant.

However trying to use Automerge in combination with bringing in multiple versions of epubs sounds "wrong" to me. You might get lucky, you might end up with a mish-mash mess.
kiwidude is offline   Reply With Quote
Old 06-13-2011, 12:55 PM   #12
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by kiwidude View Post
Don't confuse the regex you use with the filename with the logic Calibre has internally to decide whether to treat two books as duplicates for automerge purposes. Automerge has some "fuzziness" in its comparison of book titles, stripping a whole bunch of characters like brackets, punctuation, title sort characters like "the, a, an" etc. It is certainly not an "exact match" experience.

It will not throw away numeric values (but will the periods separating them). You might "get lucky" and provided every book you import has enough different characters you get it to do what you want. However you are absolutely playing with fire with this, and as the saying goes you may get burned.

It has its purposes - the best usage of it imho is when you have another format of an existing book in your library that you want to add. Say you have a book record in MOBI format, and now you get an EPUB version from somewhere. For that purpose AutoMerge is brilliant.

However trying to use Automerge in combination with bringing in multiple versions of epubs sounds "wrong" to me. You might get lucky, you might end up with a mish-mash mess.
Thanks for clarifying and taking the time to respond. cheers!
penguinaka is offline   Reply With Quote
Old 06-13-2011, 02:42 PM   #13
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
kiwidude's comments are all right on point.

Automerge off
With Automerge off (and accepting the duplicates), you get each format as a separate record. The author/title will be whatever you specified (obtained from metadata or calculated by regex from the filename). You then use Merge to put matching records together. This is the best for those OCD users who want to deal manually with every record merger. If the title of the html version is "The Oasis: A Novel", and the title of the EPUB is "The Oasis - A Novel" you get two records with those different titles. kiwidude's Find Duplicate plugin will let you find them and my Merge will let you select the better title and Merge them.

Automerge: Fuzzy Title Matching
Automerge was written before Find Duplicates, for those who didn't want to have to do it all manually. It will see the above two titles as the same (punctuation is ignored and multiple spaces are collapsed to single spaces). The first title on the first format entered will be the title used for the book, and later titles will be discarded if they are a close enough match. Authors must match exactly.

Automerge fuzzy matching won't ignore any differences in the author, and won't ignore any character order differences, except if the start of the title is an indefinite article ("The, A", etc.) for whatever indefinite articles you've set to be ignored in your language for the applicable Tweak. There will still be lots of non-duplicate duplicate books as a result of these non-matches. Find them with Find Duplicates and use Merge to pick the better Author and Title.

Automerge: Duplicate Formats
The other question is what to do with incoming formats when Author/Title match according to AutoMerge fuzzy matching rules and the incoming format already exists in the matching record. You have three choices:

For OCD users, tell Automerge to create a new record, use Find Duplicates to locate the dupes and manually Merge them.

For less compulsive users, tell it to ignore the incoming as a duplicate. I have added thousands of books in testing, and have yet to find an inadvertent AutoMerge match. That said, if your books come from different sources there will likely be many non-matches. If so, Find Duplicates still has to be used.

For those who like to work on a copy of a book, then replace the old with the new, or those who assume that newer copies are better - set AutoMerge to overwrite.
Starson17 is offline   Reply With Quote
Old 06-13-2011, 03:00 PM   #14
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by Starson17 View Post
kiwidude's comments are all right on point.

Automerge off
With Automerge off (and accepting the duplicates), you get each format as a separate record. The author/title will be whatever you specified (obtained from metadata or calculated by regex from the filename). You then use Merge to put matching records together. This is the best for those OCD users who want to deal manually with every record merger. If the title of the html version is "The Oasis: A Novel", and the title of the EPUB is "The Oasis - A Novel" you get two records with those different titles. kiwidude's Find Duplicate plugin will let you find them and my Merge will let you select the better title and Merge them.

Automerge: Fuzzy Title Matching
Automerge was written before Find Duplicates, for those who didn't want to have to do it all manually. It will see the above two titles as the same (punctuation is ignored and multiple spaces are collapsed to single spaces). The first title on the first format entered will be the title used for the book, and later titles will be discarded if they are a close enough match. Authors must match exactly.

Automerge fuzzy matching won't ignore any differences in the author, and won't ignore any character order differences, except if the start of the title is an indefinite article ("The, A", etc.) for whatever indefinite articles you've set to be ignored in your language for the applicable Tweak. There will still be lots of non-duplicate duplicate books as a result of these non-matches. Find them with Find Duplicates and use Merge to pick the better Author and Title.

Automerge: Duplicate Formats
The other question is what to do with incoming formats when Author/Title match according to AutoMerge fuzzy matching rules and the incoming format already exists in the matching record. You have three choices:

For OCD users, tell Automerge to create a new record, use Find Duplicates to locate the dupes and manually Merge them.

For less compulsive users, tell it to ignore the incoming as a duplicate. I have added thousands of books in testing, and have yet to find an inadvertent AutoMerge match. That said, if your books come from different sources there will likely be many non-matches. If so, Find Duplicates still has to be used.

For those who like to work on a copy of a book, then replace the old with the new, or those who assume that newer copies are better - set AutoMerge to overwrite.
Thank you for going over that its all much clearer now. Cheers!
penguinaka is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to install prior version of Calibre when newer version installed? SilentSeven Calibre 3 04-13-2011 12:46 PM
Duplicates when importing... john_es Library Management 1 03-21-2011 09:24 AM
Updated Christian Bible Launches eBook Version Before Print Version tubemonkey News 21 12-30-2010 03:53 PM
importing ebooks iconeo Calibre 4 05-05-2009 03:35 AM
Importing Importing121 Lounge 2 05-20-2008 11:24 AM


All times are GMT -4. The time now is 12:59 AM.


MobileRead.com is a privately owned, operated and funded community.