RTF documents wrongly catalogues with a DOC extension
I have catalogued several hundred 'word' documents with a .DOC extension into my Calibre Library.
I assumed that they were Word binary format, but I've recently discovered that a large % of the .docs are actually internally .rtf.
This matters because Calibre can convert rtf to other formats but last time I checked .doc cannot be converted. So the docs that are actually rtf would be best recorded in the library as .rtf but I want to avoid having to manualy process the hundreds of such books if at all possible.
OK so I have a utility that can scan .doc files, identify the rtfs and rename the file with the correct rtf extension. But that will break the library entry in Calibre since the db will record the file as .doc and that will no longer exist and there will be a 'foreign' rtf file in the calibre folder structure.
So my question is: is there some way that Calibre can be persuaded to recognize that my .doc entry is actually a RTF and allow me to (in a reasonably automated way) convert or update the database for the rtfs? Or is the calibre converter clever enough to recognize that a .doc that is actually a .rtf based on the content rather than the file extension and hence allow me to convert those particular .docs to 'say' rtf while leaving the real .docs alone? Or can I configure Calibre to convert .doc (of either internal format [rtf or binary doc]) to other formats?
Bottom line I want to be able to get calibre to automatically (if possible) updated with the correct file format the document are using so any suggestions gratefully accepted.
|