|
|
View Full Version : Using Finereader to batch convert PDF files to RTF
I'm using Finereader 8.0 professional to batch convert PDF or Multi-page TIF files to RTF files for reading on Sony Reader and the result is basically satisfactory. It will automatically convert up to 10,000 pages at a time, unattended. I find Finereader 8.0 has much better OCR accuracy than Omnipage 15, Readiris 11, Paperport 11, or Adobe Acrobat 7.0. The page size I use for rtf files is 5.24'x6.69'. Since Sony Reader does not support pictures within RTF, I set the format to "not keeping pictures" in conversion, and this way it reduces the file size dramatically.
One problem that bugs me is that although the converted RTF file has the same FILENAME as the original PDF file, its DOCUMENT TITLE shown on Sony Reader is not the same. I know I can always open each rtf file and change the title dada in file properties one by one, but that would be too much work if you have hundreds of files to modify. Do you have a better solution to batch convert the RTF title to be the same as its file name? Thanks in advance for your suggestions.
Bob Russell 10-23-2006, 02:48 PM Sorry, I have no answers about the file names. But another question about what you said... this is probably a silly question, but why does it need a page size for RTF files?
Thanks, Bob. Your question sort of solved my puzzle, for it actually makes me realize that probably rtf does not need page size at all. I'm very new to this page size thing and have experimented with other page sizes such as A4, letter, or Sony reader screen size, but didn't find much difference in the converted file. So I guess probably you are right that rtf files does not need a page size to be read on the reader.
One problem that bugs me is that although the converted RTF file has the same FILENAME as the original PDF file, its DOCUMENT TITLE shown on Sony Reader is not the same.
I'm not entirely sure about this, but document title is probably one of the PDF document properties, and so something that FR doesn't look at -- as far as I know, it 'prints' PDF pages, OCRs the result, and converts that to whatever output format you have. And document properties don't get printed.
I can imagine two solutions:
A. A program that extracts the document title (and other metadata) from the PDF file, and inserts it in the proper place in the RTF file.
B. Suggest to ABBYY that PDF (and perhaps other) metadata should, if possible, survive this conversion, and hope they think it's a good idea.
Of course, I'm assuming that there *is* a proper place for document title to be placed in the RTF file, and not just something that is Sony Reader specific.
NatCh 10-24-2006, 10:31 AM I'm assuming that there *is* a proper place for document title to be placed in the RTF file, and not just something that is Sony Reader specific.There is, indeed: http://www.mobileread.com/forums/showthread.php?p=41650#post41650
I'm not entirely sure about this, but document title is probably one of the PDF document properties, and so something that FR doesn't look at -- as far as I know, it 'prints' PDF pages, OCRs the result, and converts that to whatever output format you have. And document properties don't get printed.
I can imagine two solutions:
A. A program that extracts the document title (and other metadata) from the PDF file, and inserts it in the proper place in the RTF file.
B. Suggest to ABBYY that PDF (and perhaps other) metadata should, if possible, survive this conversion, and hope they think it's a good idea.
Of course, I'm assuming that there *is* a proper place for document title to be placed in the RTF file, and not just something that is Sony Reader specific.
Since finereader does keep the original pdf file name in the converted rtf file. Another solution I'm thinking of is to get such a software that can make the rtf "document title" the same as its "file name," i.e., synchronize the two. I know PDF EXplorer can batch synchronize the pdf document title to be the same as the pdf file name and do hundreds of files within seconds. I'm wondering whether there is an equivalent program that can do the same for rtf files. Any suggestions?
I start to find that the fonts of Finereader converted rtf files look too small on the Sony reader, even after I use "L" size on the reader. Does anybody know how to batch increase the font size either in Finereader or in MS Word?
I start to find that the fonts of Finereader converted rtf files look too small on the Sony reader, even after I use "L" size on the reader. Does anybody know how to batch increase the font size either in Finereader or in MS Word?
After two days studying on how the Word Macro works, I kind of solved the problem with writing a macro for word to run. It completely fulfills my goal: batch make the rtf document title the same as its file name, batch increase font size to 16 points, batch line space to 1.5.
slayda 10-28-2006, 11:38 AM After two days studying on how the Word Macro works, I kind of solved the problem with writing a macro for word to run. It completely fulfills my goal: batch make the rtf document title the same as its file name, batch increase font size to 16 points, batch line space to 1.5.
Hi gdfx, mind sharing how to create that macro?
Thanks,
No problem, slayda. If anyone else is also interested, please come up with suggestions and improvements for converting rtf files to be read on the reader. Here is what I did, I go to MS Word 2003, in Tools section find "Macro," and create a new Macro from there: create a button on the toolbar, such as "SET16" the one I use, then paste the following codes in the attached file into your macro editing field, it should do the job automatically and tirelessly for you once you click the button. Of course, you can choose your own source folder and set font size to something other than 16 by slightly modifying the macro.
|