Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 12-27-2011, 08:22 PM   #1
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Exclamation Can't seem to automerge .docx files

I'm having a little problem with getting my .docx files into Calibre. I'll give you the backstory to explain what I'm trying to do.

I have my ebook files in mobi & epub inside of Calibre. I needed them in .docx also, so I converted to RTF inside of Calibre, then exported the RTF's and used LibreOffice to convert the rtf's to .docx.

What I want to do now is to import those .docx files into calibre in such a way that they will automerge with their epub/mobi/rtf counterpart. That's where I'm running into a problem! I can't seem to get the program to automerge.

Here's what I've tried so far: I've used the Adding books Preferences to check Automerge. I've also downloaded the "Find Duplicates" plugin and tried to configure it, but still with no success.

Anyone have any ideas on how to make this work???

I know I could manually merge each book, but I have a large library and that would be very time consuming.

Last edited by mshnryman; 12-27-2011 at 08:31 PM. Reason: additional information
mshnryman is offline   Reply With Quote
Old 12-27-2011, 08:46 PM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,428
Karma: 5560777
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by mshnryman View Post
I'm having a little problem with getting my .docx files into Calibre. I'll give you the backstory to explain what I'm trying to do.

I have my ebook files in mobi & epub inside of Calibre. I needed them in .docx also, so I converted to RTF inside of Calibre, then exported the RTF's and used LibreOffice to convert the rtf's to .docx.

What I want to do now is to import those .docx files into calibre in such a way that they will automerge with their epub/mobi/rtf counterpart. That's where I'm running into a problem! I can't seem to get the program to automerge.

Here's what I've tried so far: I've used the Adding books Preferences to check Automerge. I've also downloaded the "Find Duplicates" plugin and tried to configure it, but still with no success.

Anyone have any ideas on how to make this work???

I know I could manually merge each book, but I have a large library and that would be very time consuming.
AFAIK you can only do it the hard way.
Open the metadata editor on the book and drop the unsupported format there.
I believe getting them back out is almost as painful.
In the main Library, select the book: Tap O and carefully copy. Disturb NOTHING in that folder upon pain of repair
theducks is offline   Reply With Quote
Old 12-27-2011, 08:47 PM   #3
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,428
Karma: 5560777
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Moderator Notice
Please do not cross post the same topic. Ask a Mod to move it if it pas placed in the wrong section
theducks is offline   Reply With Quote
Old 12-27-2011, 08:55 PM   #4
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Quote:
Originally Posted by theducks View Post
AFAIK you can only do it the hard way.
Open the metadata editor on the book and drop the unsupported format there.
I believe getting them back out is almost as painful.
In the main Library, select the book: Tap O and carefully copy. Disturb NOTHING in that folder upon pain of repair
Thanks for your quick response!
That's pretty messed up that there's not a better way...why can't a duplicate finder notice that the .docx has the EXACT same name as the .epub/.mobi/.rtf record?

As far as getting them back out, the only thing I would be looking at doing is using the "Save to Disk" function and then selecting "Save single format to disk>DOCX". So that part should be easy when I need to export them, but the importing looks to be a real pain.
mshnryman is offline   Reply With Quote
Old 12-27-2011, 09:03 PM   #5
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Quote:
Originally Posted by theducks View Post
Please do not cross post the same topic. Ask a Mod to move it if it pas placed in the wrong section
I understand why that is a problem in some situations, but in this situation doesn't it warrant being posted in both forum sections, since it does apply to Library Management in general and also to the functionality of this tool? Different people may view/search in different forum sections depending on what they are trying to accomplish in Calibre and therefore may find the same information useful in different ways.
mshnryman is offline   Reply With Quote
Old 12-27-2011, 09:20 PM   #6
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,428
Karma: 5560777
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Had not noticed the DOCX was now a supported export format ( I don't use, so that is my excuse )
theducks is offline   Reply With Quote
Old 12-27-2011, 09:28 PM   #7
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Quote:
Originally Posted by theducks View Post
Had not noticed the DOCX was now a supported export format ( I don't use, so that is my excuse )
Haha, yeah - I'm glad they started supporting it b/c I use Logos Bible software and unfortunately .docx is what Logos uses for personal books. I would rather not have to use the format at all.

So turns out that "Find Duplicates" does work somewhat...its problem is in slightly varying book names because of how Calibre exported them.

The book name + author in the title apparently confuses "Find Duplicates" enough that it doesn't consider it a duplicate file. Is there any way to make the search more broad (or another plugin) so that you can get a matching file list and then Alt+M them all? I tried "Identical, Similar, Soundex and Fuzzy" with little success. Also, I have no idea what the "Length" parameter is for...
mshnryman is offline   Reply With Quote
Old 12-28-2011, 01:12 AM   #8
Loeffel
Connoisseur
Loeffel began at the beginning.
 
Loeffel's Avatar
 
Posts: 58
Karma: 10
Join Date: Mar 2011
Device: Kindle 3 3G
@mshnryman
He is right, as here is the forum for the "Find Duplicates" plugin what just make something totally different of the probem you have.

Anyway, First I do not understand, why you're converting the RTF-file with libre office, just open it with Office 2003/2010 as they both can read it. There you can work with them and then save them as RTF (to come back into Calibre) and DOCX.

The reason why Calibre doesn't add the file to the existing book can be various.
  • The name of the file and the import settings could be the way that Calibre doesn't see them as one book. Rename the file, change the import settings or merge them with the merge function of Calibre after you deleted the old file versions.
  • Your import settings could say on import automerge and drop existing file formats, that's just the setting I use. Delete the old format in Calibre and reimport afterwards.
  • Your import settings say don't automerge, but then you should get a question if the book should be imported as a new version. Just delete the old file versions and merge them with the Calibre marge function.
Loeffel is offline   Reply With Quote
Old 12-28-2011, 03:37 AM   #9
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,799
Karma: 12528001
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Moderator Notice
Moved various posts out of Find Duplicates plugin thread as there was no question pertaining to that plugin. If you have specific questions about the Find Duplicates plugins feel free to post those questions in that thread.
Quote:
Originally Posted by mshnryman View Post
So turns out that "Find Duplicates" does work somewhat...its problem is in slightly varying book names because of how Calibre exported them.
The problem isn't with how calibre exported the books the problem lies with how you added them back to calibre.

Quote:
Originally Posted by mshnryman View Post
The book name + author in the title apparently confuses "Find Duplicates" enough that it doesn't consider it a duplicate file.
The Book Title and Author should not end up in the title field. Under Preferences - Adding books when you want the metadata (ie. Title and Author) to be filled in based on the file name you first have to uncheck the box in front of Read metadata from file contents rather than file name. Unchecking the box will cause the Title and Author fields to be filled in using the file name. The default regex of
Code:
(?P<title>.+) - (?P<author>[^_]+)
should, during the Adding books phase, interpret the files correctly and parse the Title and Author into their respective fields.

Getting this step right should allow the auto-merge feature may function as expected, since docx is unsupported I'm not sure if this is the case.

I use the Quick Preferences Plugin (see attached) to easily set the appropriate adding books settings and the appropriate regex for the books I'm adding.

Quote:
Originally Posted by mshnryman View Post
Is there any way to make the search more broad (or another plugin) so that you can get a matching file list and then Alt+M them all? I tried "Identical, Similar, Soundex and Fuzzy" with little success. Also, I have no idea what the "Length" parameter is for...
You can make anything as complicated as need be, but in this case the user has to get the basic adding books steps correct to ensure the title and author are both not in the title field.
Attached Thumbnails
Click image for larger version

Name:	quick_preferences.jpg
Views:	69
Size:	186.8 KB
ID:	80466  

Last edited by DoctorOhh; 12-28-2011 at 04:15 AM. Reason: grammar/clarification
DoctorOhh is offline   Reply With Quote
Old 12-28-2011, 03:57 AM   #10
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Quote:
Originally Posted by Loeffel View Post
Anyway, First I do not understand, why you're converting the RTF-file with libre office, just open it with Office 2003/2010 as they both can read it. There you can work with them and then save them as RTF (to come back into Calibre) and DOCX.
The reason I am using LibreOffice is that I don't have Office 2003/10. I am doing the same steps that you mention, only in LibreOffice instead of Word. Sorry that I wasn't clear on that one.
mshnryman is offline   Reply With Quote
Old 12-28-2011, 04:12 AM   #11
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Thank you for your suggestions and the attached picture!
BTW, what are you using to get that coverflow-esque look?
Quote:
Originally Posted by dwanthny View Post
The problem isn't with how calibre exported the books the problem lies with how you added them back to calibre.
I added them back exactly the way you mentioned. You may be right that it isn't how calibre exported them, but the reason I assumed that is that the export filename is set to {title} - {authors} as the filename for the book, and then when importing the book, the Title comes in exactly as the filename.

Quote:
Originally Posted by dwanthny View Post
The Book Title and Author should not end up in the title field. Under Preferences - Adding books when you want the metadata (ie. Title and Author) to be filled in based on the file name you first have to uncheck the box in front of Read metadata from file contents rather than file name. Unchecking the box will cause the Title and Author fields to be filled in using the file name. The default regex of
Code:
(?P<title>.+) - (?P<author>[^_]+)
should, during the Adding books phase, interpret the files correctly and parse the Title and Author into their respective fields.
My settings were already configured this way when I was re-importing the books back into calibre.


Quote:
Originally Posted by dwanthny View Post
You can make anything as complicated as need be, but in this case the user has to do the basic adding books steps to ensure the title and author are both not in the title field.
I'm open to any ideas that I may have missed, but I already followed your "basic steps" to add books...maybe I missed the step of changing the behavior of "Saving books to disk" to "{title}" instead of "{title} - {authors}" before exporting them all.
mshnryman is offline   Reply With Quote
Old 12-28-2011, 04:27 AM   #12
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
I see where the problem lies - looks like I may be doomed to use Alt+M on all the books. Turns out the one thing missing from the filenames that would make calibre recognize Title/Author is a simple dash (eg Title-Author). I used a renamer in the middle to get rid of abnormal characters since the site I use to backup my books doesn't accept abnormal characters. Thus, renamer got rid of the dashes that calibre recognizes as separating Title from Author.

I guess the only question I have left would then be: Is there a way to change the regular expression from (?P<title>.+) - (?P<author>[^_]+) to something that would recognize the Title+Author without the dash?

I tried (?P<title>.+) (?P<author>[^_]+) expression (same as earlier but without the dash) and it ended up only getting the author's last name in the Author field, while the first name remained in the Title field.

Thanks for everyone who's been helpful on this one
mshnryman is offline   Reply With Quote
Old 12-28-2011, 04:29 AM   #13
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,799
Karma: 12528001
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by mshnryman View Post
BTW, what are you using to get that coverflow-esque look?
That is the standard calibre cover flow, find the three icons on the bottom right next to Jobs and click the center icon to toggle the cover flow on or off. (see attached for my configuration)

Quote:
Originally Posted by mshnryman View Post
I added them back exactly the way you mentioned. You may be right that it isn't how calibre exported them, but the reason I assumed that is that the export filename is set to {title} - {authors} as the filename for the book, and then when importing the book, the Title comes in exactly as the filename.
You say your file names are in the format title - author.docx and you have things set correctly the title will not be exactly as the filename.

Quote:
Originally Posted by mshnryman View Post
My settings were already configured this way when I was re-importing the books back into calibre.
I doubt that they were set as you have stated, see attached. The check box in the blue area has to be unchecked. If you look in the red area you will see that I tested a book called test title - test author.docx and the proper title and author are parsed as expected.

Quote:
Originally Posted by mshnryman View Post
I'm open to any ideas that I may have missed, but I already followed your "basic steps" to add books...maybe I missed the step of changing the behavior of "Saving books to disk" to "{title}" instead of "{title} - {authors}" before exporting them all.
I don't know what step you missed (or I forgot to document) but it should import the Title to the title field and the author to the author field.

Do you have the brackets {} around the actual names in the file name? Give me an exact file name as an example.
Attached Thumbnails
Click image for larger version

Name:	calibre (2).jpg
Views:	58
Size:	373.5 KB
ID:	80469   Click image for larger version

Name:	addingbooks.jpg
Views:	65
Size:	168.8 KB
ID:	80470  

Last edited by DoctorOhh; 12-28-2011 at 04:31 AM.
DoctorOhh is offline   Reply With Quote
Old 12-28-2011, 04:43 AM   #14
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,799
Karma: 12528001
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by mshnryman View Post
I see where the problem lies - looks like I may be doomed to use Alt+M on all the books. Turns out the one thing missing from the filenames that would make calibre recognize Title/Author is a simple dash (eg Title-Author).
Sounds like you're on to something.

Quote:
Originally Posted by mshnryman View Post
I guess the only question I have left would then be: Is there a way to change the regular expression from (?P<title>.+) - (?P<author>[^_]+) to something that would recognize the Title+Author without the dash?
There has to be something that separates the title from the author. A simple space won't work because titles frequently have multiple words separated by a space and obviously virtually all authors have more than on name.

May someone can write an expression that breaks on the second space from the end. This way most authors would split correctly because they are two word names.

Good Luck.
DoctorOhh is offline   Reply With Quote
Old 12-28-2011, 04:53 AM   #15
mshnryman
Member
mshnryman began at the beginning.
 
Posts: 16
Karma: 36
Join Date: Mar 2009
Device: Kindle Touch, Nook Simple Touch, Kindle 2, Nook 1st Gen
Quote:
Originally Posted by dwanthny View Post
May someone can write an expression that breaks on the second space from the end. This way most authors would split correctly because they are two word names.

Good Luck.
I like your idea on an expression that breaks on the second space from the end. That would be a real good idea.

I guess it will take me a few hours to Alt+M my collection (I'm using AutoTyper to make the shortcuts quicker) since I only have about 1150 .docx books to import right now.

I'll just remember in the future to import the .docx files back into calibre BEFORE renaming them for archival. :P
mshnryman is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
KDP Now Supports Docx. Files randyflycaster Writers' Corner 2 09-06-2011 06:14 PM
Importing / Converting DOCX files fan of kovid Conversion 13 03-12-2011 07:47 PM
use Gmail to view .doc and .docx files on e-Ink side devseev enTourage Archive 4 12-15-2010 06:38 AM
FYI @Charles: article sort tweak and autosort/automerge Starson17 Calibre 7 10-22-2010 07:55 AM
ms office files .doc .docx app websjapan Onyx Boox 2 04-18-2010 08:34 PM


All times are GMT -4. The time now is 03:03 AM.


MobileRead.com is a privately owned, operated and funded community.