Thread: PRS-500 Conversion software
View Single Post
Old 10-25-2007, 04:38 AM   #12
kacir
Wizard
kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.kacir ought to be getting tired of karma fortunes by now.
 
kacir's Avatar
 
Posts: 2,679
Karma: 2799391
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
Quote:
Originally Posted by lilpretender View Post
Are there any programs that will convert html documents to text? I
have a lot of files to convert to text and don't want to do them one
by one. And will it also convert some of them into one text?

Thank you.
for converting html manuals to text I use browser called lynx.
lynx is a text-only browser. There is version for Linux, mac OSX an windows.

just run it from commandline like this
lynx -dump -width=10000 myHTMLfile.html > myHTMLfile.txt
that is it.
-dump option means that the browser does not start "browsing" but converts the html page to text and dumps it to the standard output
-width=10000 tells lynx not to wrap lines at the 80 character position
> myHTMLfile.txt means that the text file that lynx has dumped to the standart output will be saved as myHTMLfile.txt

you can get lynx (it is Free Software) here:
http://www.subir.com/lynx/binaries.html

you can see other options here:
http://linux.die.net/man/1/lynx

I suggest that you
- save the binary in c:\bin\lynx.exe
- start console (command line)
- change to the directory with books
using cd "C:/my books to convert"
- issue command dir *.htm* /b > convert.bat
- that command will create text file with list of all your books
- edit convert.bat with your favourite text editor.
you get file like this
.........
mybook1.htm
mybook2.htm
........
after editing it should look like:
..........
c:\bin\lynx.exe -dump -width=10000 mybook1.htm > mybook1.htm.txt
c:\bin\lynx.exe -dump -width=10000 mybook2.htm > mybook2.htm.txt
..........
- run convert bat, put your feet on the table and smugly watch as hundreds of files gets converted in couple of minutes.
kacir is offline   Reply With Quote