|
|
#1 |
|
Enthusiast
![]() Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
RTF documents wrongly catalogues with a DOC extension
I have catalogued several hundred 'word' documents with a .DOC extension into my Calibre Library.
I assumed that they were Word binary format, but I've recently discovered that a large % of the .docs are actually internally .rtf. This matters because Calibre can convert rtf to other formats but last time I checked .doc cannot be converted. So the docs that are actually rtf would be best recorded in the library as .rtf but I want to avoid having to manualy process the hundreds of such books if at all possible. OK so I have a utility that can scan .doc files, identify the rtfs and rename the file with the correct rtf extension. But that will break the library entry in Calibre since the db will record the file as .doc and that will no longer exist and there will be a 'foreign' rtf file in the calibre folder structure. So my question is: is there some way that Calibre can be persuaded to recognize that my .doc entry is actually a RTF and allow me to (in a reasonably automated way) convert or update the database for the rtfs? Or is the calibre converter clever enough to recognize that a .doc that is actually a .rtf based on the content rather than the file extension and hence allow me to convert those particular .docs to 'say' rtf while leaving the real .docs alone? Or can I configure Calibre to convert .doc (of either internal format [rtf or binary doc]) to other formats? Bottom line I want to be able to get calibre to automatically (if possible) updated with the correct file format the document are using so any suggestions gratefully accepted. |
|
|
|
|
|
#2 |
|
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,033
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Westlyn I don't think there's a OneClick solution - but maybe something like the following will do it - but make sure you test it out on a few books first, because I haven't
Last edited by BetterRed; 04-08-2014 at 01:14 AM. |
|
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
I can see that would do what I need without handling each file one by one.
Typically the doc is the only file extension in the 'book' entry so i shouldn't need to merge the book entries, if I delete the doc entries after export, but I will need to ensure that the metadata for the .doc is exported and then reimported correctly. Either that or I merge and then find some way to remove the .doc format fro the record - that could be a bit slow/manual I guess. Thanks |
|
|
|
|
|
#4 |
|
null operator (he/him)
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 22,033
Karma: 30277294
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
To remove the superfluous DOC formats do this search
Code:
formats:"=DOCX" and formats:"=RTF" Then use the Remove->Remove files of a specific format from selected books feature to remove the DOC format files All gone. BR |
|
|
|
|
|
#5 | |
|
r.eads e.njoys b.ooks lol
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Quote:
As an update to your intial post, there IS a plugin that is able to automatically convert DOC to DOCX available in calibre, using wordconv.exe (installed with the "Microsoft Office Compatibility Pack" avialble from Microsoft). But that plugin fails when trying to convert RTF with doc extension because, of course, it "thinks" the RTF is a DOC file. I was wondering what tool / utility you have used to "scan .doc files, identify the rtfs and rename the file with the correct rtf extension" - could you please let me (us) know? I couldn't find it using google. Thank you. |
|
|
|
|
|
|
#6 |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,272
Karma: 61916422
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Moderator Notice
Necroposting in Calibre is highly discouraged. Things change rapidly and many posters with very low counts rarely return after a solution (or 'not possible') has been posted. |
|
|
|
|
|
#7 |
|
r.eads e.njoys b.ooks lol
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Sorry, I didn't know that. I tried searching for "rules" to avoid breaking any other in the future, but there is no thread titled "Rules".
However my question is not about a feature of calibre - I do know features, problems and solutions may change with every update. My question is about the tool westly mentioned so I was hoping maybe he or somebody else might know the tool. I will open a new thread, if that is all right. |
|
|
|
|
|
#8 | |
|
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 31,272
Karma: 61916422
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Calibres FAQ and manual are at the download location www.calibre-ebook.com/help |
|
|
|
|
|
|
#9 | |
|
Enthusiast
![]() Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
|
Quote:
Code:
Function IsRTF(RTFFile) As Boolean
Dim firstchars As String
firstchars = "aaaa"
Open RTFFile For Binary As #1
firstchars = Input(5, #1)
'Debug.Print firstchars
Close #1
If firstchars = "{\rtf" Then
IsRTF = True
Else
IsRTF = False
End If
End Function
Code:
Sub ConvertDocsToRTF()
Dim fnam As Object, fso As FileSystemObject, ext As String, fld As Object
'browse for folder
Set fso = New FileSystemObject
If Flder = "" Then
startfolder = InputBox("Enter start Folder", "Starting Folder")
End If
Debug.Print "Processing "; startfolder
If fso.FolderExists(startfolder) Then
' process folder contents
For Each fnam In fso.GetFolder(startfolder).Files
DoEvents
ext = fso.GetExtensionName(fnam)
Debug.Print ext; " :>"; fnam
If IsRTF(fnam) Then
' check file extension and rename to .rtf if required
If LCase(ext) <> "rtf" Then 'file extension is wrong
'rename file to .rtf extension
newname = Left(fnam, Len(fnam) - Len(ext)) + "rtf"
If fso.FileExists(newname) = False Then
'do the rename
Debug.Print "Renaming:"
Debug.Print fnam
Debug.Print newname
fso.MoveFile fnam, newname
DoEvents
End If
End If
End If
Next
' recurse sub folders
For Each fld In fso.GetFolder(startfolder).SubFolders
Call ConvertDocsToRTF(fld.Name)
Next
MsgBox "Finished", vbOKOnly
End If
End Sub
Hope this helps. |
|
|
|
|
|
|
#10 |
|
r.eads e.njoys b.ooks lol
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Thank you Westlyn, myself I did some VBA coding as a hobby in the past but it didn't pass my mind to try it for this task. Your macro is quite nicely and clearly written as I see it.
For the moment I solved the problem with a batch file kindly suggested by a poster on a different thread, using FINDSTR, but if I need more I'll try your macro. Regards, rebl |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| read ms word .doc documents without conversion | judahis | Which one should I buy? | 2 | 07-27-2011 11:46 AM |
| Calibre with HTML and RTF and DOC | niceboy | Calibre | 2 | 11-05-2010 01:35 AM |
| Documents with unknown extension | SkyDream | Calibre | 2 | 07-27-2010 03:42 PM |
| Help please: LRF to rtf or doc? | Michele | Sony Reader | 5 | 06-19-2009 10:43 PM |
| .rtf bigger than .doc | diabloNL | Sony Reader | 14 | 11-22-2006 12:17 PM |