05-21-2015, 08:03 AM | #1 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
RTF files with wrong DOC extension - batch identify and rename?
Hello community,
I have a number of *.doc files but some of them are not really Word documents, but actually are RTF files with a wrong .doc extension. If I open this kind of file in a plain text editor such as Notepad++ i can see the RTF syntax, so they are really RTF files. Normally, if DOC files are associated in windows with a viewer that also supports RTF (most of them do) one wouldn't even't notice that the DOC file is not a doc file, but a RTF. The problem is that in some circumstances such as with programs that use Microsoft's wordconv.exe utility (included with the "Office Compatibility Pack") to batch convert doc files to docx, the RTF files won't be converted and lead to errors/software freezeing depending on the software. The same applies with the doc-to-docx plugin in calibre. In an older post (here) someone was mentioning that ther is a tool that is able to automatically scan for RTF files with wrong DOC extension and rename them. Does anyone know about such tool? Thank you. |
05-21-2015, 10:39 PM | #2 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
The linux `file` command tests files for file content, and does not look at the file extension. It will correctly identify an RTF misnamed as a DOC.
You could use that command in Cygwin on Windows, or ... quickly googles ... awesome, gnuwin32 has a native windows binary here: http://gnuwin32.sourceforge.net/packages/file.htm Should be simple to write a batch script to test the output (e.g. "sample.doc: Rich Text Format data, version 1, ANSI") and if it matches RTF, then do a `ren sample.doc sample.rtf`. |
Advert | |
|
05-23-2015, 04:45 PM | #3 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
The following simple batch file should do the job, if the .rtf files were originally created with Word or Wordpad. (Even though it's highly unlikely that it'll damage any of your files you may want to backup your Word documents before running it.)
Code:
FOR %%f IN ("*.doc") DO ( findstr /m /c:"rtf1" %%f && REN "%%f" "%%~nf.rtf" ) @eschwartz: findstr is the Windows equivalent of grep. |
05-25-2015, 03:05 AM | #4 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Thanks! I had to put an additional double quotes pair because the file names contain spaces, but after that it worked fine:
FOR %%f IN ("*.doc") DO ( findstr /m /c:"rtf1" "%%f" && REN "%%f" "%%~nf.rtf" ) I think a bulk rename utiliy with file search capabilities would be good for this task too, especially when working in multiple folders (but Flexible Renamer doesn't filter files by file content). Or I suppose the FOR loop could be modified to search recursively with a modified syntax (the /R parameter?). I know some ms-dos, but not good enough for this. I tried to modify the batch file to work recursively for all doc files: Code:
D: CD "D:\testRTF" FOR /R "D:\testRTF" %%f IN ("*.doc") DO ( findstr /m /c:"rtf1" "%%f" && REN "%%f" "%%~nf.rtf" ) PAUSE REM REN "%%f" "%%~df%%~pf%%~nf.rtf" Last edited by rebl; 05-25-2015 at 07:49 AM. |
05-25-2015, 08:12 AM | #5 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
I am trying to optimize the above batch file for working on a large number of files.
I was wondering if the findstr in the sintax above looks for the string "rtf1" throughout the whole file - that could take quite a long time. If the string is not found in the first line, the file should be skipped. I found that the /B option is for matching the string only at the beginning of a line, but I couldn't find any option for matching the beginning of the file. Also, I'm wondering if i could use "{\rtf1" for the string (should I escape the "\")? It seems "{\rtf" works, I'm not sure if with \B is really faster but here is what I'm going to test: Code:
PAUSE D: CD "D:\FolderName" FOR /R "D:\FolderName" %%f IN ("*.doc") DO ( findstr /B /M /C:"{\rtf" "%%f" && REN "%%f" "%%~nf.rtf" ) PAUSE Last edited by rebl; 05-25-2015 at 08:21 AM. |
Advert | |
|
05-25-2015, 10:09 AM | #6 | |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
Quote:
Code:
findstr /R /M /c:"^\{\\rtf1" "%%f" && REN "%%f" "%%~nf.rtf" |
|
05-26-2015, 04:10 AM | #7 |
r.eads e.njoys b.ooks lol
Posts: 76
Karma: 580748
Join Date: Mar 2010
Location: It's time to get this Book a Rest
Device: Kindle 4 NT
|
Thank you, I've read about that too, but the regex only offers beginning of line option, so it's the same as /B.
Nevertheless the batch script has run quite fast even without the /B option. Though, I prefer to use it - maybe it makes a difference. I was able to rename all (I suppose) rtf files. Thank you again for the help! |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RTF documents wrongly catalogues with a DOC extension | Westlyn | Library Management | 9 | 05-26-2015 05:14 AM |
Need Help Deciding- doc & rtf files | eSheri | Which one should I buy? | 10 | 01-13-2011 04:28 AM |
Creating Bookmarks in RTF or DOC files? | NiftyNifty1 | Sony Reader Dev Corner | 1 | 02-01-2009 07:58 AM |
Cannot read RTF and DOC files in PRS505 | garada k-7 | Sony Reader | 7 | 11-19-2008 07:08 PM |
Using Finereader to batch convert PDF files to RTF | gdxf | Sony Reader | 9 | 10-28-2006 04:14 PM |