04-07-2014, 09:50 AM | #1 |
Enthusiast
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
RTF documents wrongly catalogues with a DOC extension
I have catalogued several hundred 'word' documents with a .DOC extension into my Calibre Library.
I assumed that they were Word binary format, but I've recently discovered that a large % of the .docs are actually internally .rtf. This matters because Calibre can convert rtf to other formats but last time I checked .doc cannot be converted. So the docs that are actually rtf would be best recorded in the library as .rtf but I want to avoid having to manualy process the hundreds of such books if at all possible. OK so I have a utility that can scan .doc files, identify the rtfs and rename the file with the correct rtf extension. But that will break the library entry in Calibre since the db will record the file as .doc and that will no longer exist and there will be a 'foreign' rtf file in the calibre folder structure. So my question is: is there some way that Calibre can be persuaded to recognize that my .doc entry is actually a RTF and allow me to (in a reasonably automated way) convert or update the database for the rtfs? Or is the calibre converter clever enough to recognize that a .doc that is actually a .rtf based on the content rather than the file extension and hence allow me to convert those particular .docs to 'say' rtf while leaving the real .docs alone? Or can I configure Calibre to convert .doc (of either internal format [rtf or binary doc]) to other formats? Bottom line I want to be able to get calibre to automatically (if possible) updated with the correct file format the document are using so any suggestions gratefully accepted. |
04-07-2014, 04:46 PM | #2 |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Westlyn I don't think there's a OneClick solution - but maybe something like the following will do it - but make sure you test it out on a few books first, because I haven't
Last edited by BetterRed; 04-08-2014 at 12:14 AM. |
Advert | |
|
04-08-2014, 08:23 AM | #3 |
Enthusiast
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
I can see that would do what I need without handling each file one by one.
Typically the doc is the only file extension in the 'book' entry so i shouldn't need to merge the book entries, if I delete the doc entries after export, but I will need to ensure that the metadata for the .doc is exported and then reimported correctly. Either that or I merge and then find some way to remove the .doc format fro the record - that could be a bit slow/manual I guess. Thanks |
04-08-2014, 08:47 AM | #4 |
null operator (he/him)
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
To remove the superfluous DOC formats do this search
Code:
formats:"=DOCX" and formats:"=RTF" Then use the Remove->Remove files of a specific format from selected books feature to remove the DOC format files All gone. BR |
05-20-2015, 08:39 AM | #5 | |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Quote:
As an update to your intial post, there IS a plugin that is able to automatically convert DOC to DOCX available in calibre, using wordconv.exe (installed with the "Microsoft Office Compatibility Pack" avialble from Microsoft). But that plugin fails when trying to convert RTF with doc extension because, of course, it "thinks" the RTF is a DOC file. I was wondering what tool / utility you have used to "scan .doc files, identify the rtfs and rename the file with the correct rtf extension" - could you please let me (us) know? I couldn't find it using google. Thank you. |
|
Advert | |
|
05-20-2015, 10:09 AM | #6 |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Moderator Notice
Necroposting in Calibre is highly discouraged. Things change rapidly and many posters with very low counts rarely return after a solution (or 'not possible') has been posted. |
05-21-2015, 04:54 AM | #7 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Sorry, I didn't know that. I tried searching for "rules" to avoid breaking any other in the future, but there is no thread titled "Rules".
However my question is not about a feature of calibre - I do know features, problems and solutions may change with every update. My question is about the tool westly mentioned so I was hoping maybe he or somebody else might know the tool. I will open a new thread, if that is all right. |
05-21-2015, 09:45 AM | #8 | |
Well trained by Cats
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Calibres FAQ and manual are at the download location www.calibre-ebook.com/help |
|
05-25-2015, 04:53 AM | #9 | |
Enthusiast
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
Quote:
Code:
Function IsRTF(RTFFile) As Boolean Dim firstchars As String firstchars = "aaaa" Open RTFFile For Binary As #1 firstchars = Input(5, #1) 'Debug.Print firstchars Close #1 If firstchars = "{\rtf" Then IsRTF = True Else IsRTF = False End If End Function Code:
Sub ConvertDocsToRTF() Dim fnam As Object, fso As FileSystemObject, ext As String, fld As Object 'browse for folder Set fso = New FileSystemObject If Flder = "" Then startfolder = InputBox("Enter start Folder", "Starting Folder") End If Debug.Print "Processing "; startfolder If fso.FolderExists(startfolder) Then ' process folder contents For Each fnam In fso.GetFolder(startfolder).Files DoEvents ext = fso.GetExtensionName(fnam) Debug.Print ext; " :>"; fnam If IsRTF(fnam) Then ' check file extension and rename to .rtf if required If LCase(ext) <> "rtf" Then 'file extension is wrong 'rename file to .rtf extension newname = Left(fnam, Len(fnam) - Len(ext)) + "rtf" If fso.FileExists(newname) = False Then 'do the rename Debug.Print "Renaming:" Debug.Print fnam Debug.Print newname fso.MoveFile fnam, newname DoEvents End If End If End If Next ' recurse sub folders For Each fld In fso.GetFolder(startfolder).SubFolders Call ConvertDocsToRTF(fld.Name) Next MsgBox "Finished", vbOKOnly End If End Sub Hope this helps. |
|
05-26-2015, 05:14 AM | #10 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Thank you Westlyn, myself I did some VBA coding as a hobby in the past but it didn't pass my mind to try it for this task. Your macro is quite nicely and clearly written as I see it.
For the moment I solved the problem with a batch file kindly suggested by a poster on a different thread, using FINDSTR, but if I need more I'll try your macro. Regards, rebl |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
read ms word .doc documents without conversion | judahis | Which one should I buy? | 2 | 07-27-2011 10:46 AM |
Calibre with HTML and RTF and DOC | niceboy | Calibre | 2 | 11-05-2010 12:35 AM |
Documents with unknown extension | SkyDream | Calibre | 2 | 07-27-2010 02:42 PM |
Help please: LRF to rtf or doc? | Michele | Sony Reader | 5 | 06-19-2009 09:43 PM |
.rtf bigger than .doc | diabloNL | Sony Reader | 14 | 11-22-2006 11:17 AM |