Thread: Extracting text
View Single Post
Old 09-11-2009, 06:14 AM   #3
Ea
Wizard
Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.Ea ought to be getting tired of karma fortunes by now.
 
Ea's Avatar
 
Posts: 3,490
Karma: 5239563
Join Date: Jan 2008
Location: Denmark
Device: Kindle 3|iPad air|iPhone 4S
At the moment I prefer to read PDFs on my iREx digital reader, so I have some of the same issues as you do.

How to extract the text, depends on what type your source files are. Firstly, you will need to remove DRM. AFAIK this is not possible with .LRF (BBeb) files, but it is with many others, such as epub, prc and lit.

Then with a DRM free file, you can do a number of things. What I've found to be easiest, was to open the file in Stanza (reader application), copy all, and paste to OpenOffice. There are other ways to get at the text, but I've found that most often the source is a collection of html files, and using Stanza you get all in one go. I haven't tried with DRM'd files but I doubt it will work.

calibre can also convert to a number of formats, but not as many as Stanza. As far as I remember you can also convert to RTF directly in calibre, but the quality was not usable for me - perhaps it is for you.
Ea is offline   Reply With Quote