Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 04-07-2014, 09:50 AM   #1
Westlyn
Enthusiast
Westlyn began at the beginning.
 
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
RTF documents wrongly catalogues with a DOC extension

I have catalogued several hundred 'word' documents with a .DOC extension into my Calibre Library.

I assumed that they were Word binary format, but I've recently discovered that a large % of the .docs are actually internally .rtf.

This matters because Calibre can convert rtf to other formats but last time I checked .doc cannot be converted. So the docs that are actually rtf would be best recorded in the library as .rtf but I want to avoid having to manualy process the hundreds of such books if at all possible.

OK so I have a utility that can scan .doc files, identify the rtfs and rename the file with the correct rtf extension. But that will break the library entry in Calibre since the db will record the file as .doc and that will no longer exist and there will be a 'foreign' rtf file in the calibre folder structure.

So my question is: is there some way that Calibre can be persuaded to recognize that my .doc entry is actually a RTF and allow me to (in a reasonably automated way) convert or update the database for the rtfs? Or is the calibre converter clever enough to recognize that a .doc that is actually a .rtf based on the content rather than the file extension and hence allow me to convert those particular .docs to 'say' rtf while leaving the real .docs alone? Or can I configure Calibre to convert .doc (of either internal format [rtf or binary doc]) to other formats?

Bottom line I want to be able to get calibre to automatically (if possible) updated with the correct file format the document are using so any suggestions gratefully accepted.
Westlyn is offline   Reply With Quote
Old 04-07-2014, 04:46 PM   #2
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@Westlyn I don't think there's a OneClick solution - but maybe something like the following will do it - but make sure you test it out on a few books first, because I haven't
  1. save the DOCs in library to 'someplace' using the Save to Disk function
  2. run the doc-to-rtf rename utility on .doc files in the 'someplace' folder
  3. set Add Books preferences to merge new formats into existing books
  4. add the .rtf files in the 'someplace' folder to the library (they should merge if you have the file names and Add Books preferences 'aligned')
  5. search for books with DOC and RTF formats
  6. remove the DOC formats from those books
BR

Last edited by BetterRed; 04-08-2014 at 12:14 AM.
BetterRed is offline   Reply With Quote
Advert
Old 04-08-2014, 08:23 AM   #3
Westlyn
Enthusiast
Westlyn began at the beginning.
 
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
I can see that would do what I need without handling each file one by one.

Typically the doc is the only file extension in the 'book' entry so i shouldn't need to merge the book entries, if I delete the doc entries after export, but I will need to ensure that the metadata for the .doc is exported and then reimported correctly. Either that or I merge and then find some way to remove the .doc format fro the record - that could be a bit slow/manual I guess.

Thanks
Westlyn is offline   Reply With Quote
Old 04-08-2014, 08:47 AM   #4
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,568
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
To remove the superfluous DOC formats do this search

Code:
formats:"=DOCX" and formats:"=RTF"
Select all the Books that the search lists

Then use the Remove->Remove files of a specific format from selected books feature to remove the DOC format files

All gone.

BR
BetterRed is offline   Reply With Quote
Old 05-20-2015, 08:39 AM   #5
rebl
r.eads e.njoys b.ooks lol
rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.
 
rebl's Avatar
 
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
Quote:
Originally Posted by Westlyn View Post
OK so I have a utility that can scan .doc files, identify the rtfs and rename the file with the correct rtf extension. But that will break the library entry in Calibre since the db will record the file as .doc and that will no longer exist and there will be a 'foreign' rtf file in the calibre folder structure.
Hi westlyn, I hope you are still reading this. I have a similar problem, I have some doc and docx files but some of the doc files are actually RTF files with incorrect extension.
As an update to your intial post, there IS a plugin that is able to automatically convert DOC to DOCX available in calibre, using wordconv.exe (installed with the "Microsoft Office Compatibility Pack" avialble from Microsoft).
But that plugin fails when trying to convert RTF with doc extension because, of course, it "thinks" the RTF is a DOC file.

I was wondering what tool / utility you have used to "scan .doc files, identify the rtfs and rename the file with the correct rtf extension" - could you please let me (us) know? I couldn't find it using google.
Thank you.
rebl is offline   Reply With Quote
Advert
Old 05-20-2015, 10:09 AM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Moderator Notice
Necroposting in Calibre is highly discouraged. Things change rapidly and many posters with very low counts rarely return after a solution (or 'not possible') has been posted.
theducks is offline   Reply With Quote
Old 05-21-2015, 04:54 AM   #7
rebl
r.eads e.njoys b.ooks lol
rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.
 
rebl's Avatar
 
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
Sorry, I didn't know that. I tried searching for "rules" to avoid breaking any other in the future, but there is no thread titled "Rules".
However my question is not about a feature of calibre - I do know features, problems and solutions may change with every update. My question is about the tool westly mentioned so I was hoping maybe he or somebody else might know the tool.
I will open a new thread, if that is all right.
rebl is offline   Reply With Quote
Old 05-21-2015, 09:45 AM   #8
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by rebl View Post
Sorry, I didn't know that. I tried searching for "rules" to avoid breaking any other in the future, but there is no thread titled "Rules".
However my question is not about a feature of calibre - I do know features, problems and solutions may change with every update. My question is about the tool westly mentioned so I was hoping maybe he or somebody else might know the tool.
I will open a new thread, if that is all right.
MOBIL READS rules (guidelines) and FAQ are at the bottom of each page (in the blue bar)
Calibres FAQ and manual are at the download location
www.calibre-ebook.com/help
theducks is offline   Reply With Quote
Old 05-25-2015, 04:53 AM   #9
Westlyn
Enthusiast
Westlyn began at the beginning.
 
Posts: 33
Karma: 14
Join Date: Jul 2010
Device: Windows Mobile and Android
Quote:
Originally Posted by rebl View Post
I was wondering what tool / utility you have used to "scan .doc files, identify the rtfs and rename the file with the correct rtf extension" - could you please let me (us) know? I couldn't find it using google.
Thank you.
Well sorry to say I wrote myself a word macro so there is no standalone tool I'm afraid.
Code:
Function IsRTF(RTFFile) As Boolean
Dim firstchars As String
firstchars = "aaaa"
Open RTFFile For Binary As #1
firstchars = Input(5, #1)
'Debug.Print firstchars
Close #1
If firstchars = "{\rtf" Then
    IsRTF = True
Else
    IsRTF = False
End If
End Function
and a macro to scan through folders and subfolders from a starting point.

Code:
Sub ConvertDocsToRTF()
Dim fnam As Object, fso As FileSystemObject, ext As String, fld As Object
'browse for folder
Set fso = New FileSystemObject
If Flder = "" Then
startfolder = InputBox("Enter start Folder", "Starting Folder")
End If
Debug.Print "Processing "; startfolder
If fso.FolderExists(startfolder) Then
' process folder contents
For Each fnam In fso.GetFolder(startfolder).Files
    DoEvents
    ext = fso.GetExtensionName(fnam)
    Debug.Print ext; " :>"; fnam
    If IsRTF(fnam) Then
        ' check file extension and rename to .rtf if required
        If LCase(ext) <> "rtf" Then 'file extension is wrong
            'rename file to .rtf extension
            newname = Left(fnam, Len(fnam) - Len(ext)) + "rtf"
            If fso.FileExists(newname) = False Then
                'do the rename
                Debug.Print "Renaming:"
                Debug.Print fnam
                Debug.Print newname
                fso.MoveFile fnam, newname
            DoEvents
            End If
        End If
    End If
Next
' recurse sub folders
For Each fld In fso.GetFolder(startfolder).SubFolders
    Call ConvertDocsToRTF(fld.Name)
Next
MsgBox "Finished", vbOKOnly
End If

End Sub
Please forgive the crudeness of the code but I no longer program for a living and it was only for a one off requirement.

Hope this helps.
Westlyn is offline   Reply With Quote
Old 05-26-2015, 05:14 AM   #10
rebl
r.eads e.njoys b.ooks lol
rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.rebl ought to be getting tired of karma fortunes by now.
 
rebl's Avatar
 
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
Thank you Westlyn, myself I did some VBA coding as a hobby in the past but it didn't pass my mind to try it for this task. Your macro is quite nicely and clearly written as I see it.
For the moment I solved the problem with a batch file kindly suggested by a poster on a different thread, using FINDSTR, but if I need more I'll try your macro.
Regards,
rebl
rebl is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
read ms word .doc documents without conversion judahis Which one should I buy? 2 07-27-2011 10:46 AM
Calibre with HTML and RTF and DOC niceboy Calibre 2 11-05-2010 12:35 AM
Documents with unknown extension SkyDream Calibre 2 07-27-2010 02:42 PM
Help please: LRF to rtf or doc? Michele Sony Reader 5 06-19-2009 09:43 PM
.rtf bigger than .doc diabloNL Sony Reader 14 11-22-2006 11:17 AM


All times are GMT -4. The time now is 08:41 PM.


MobileRead.com is a privately owned, operated and funded community.