Quote:
Originally Posted by lilpretender
Are there any programs that will convert html documents to text? I
have a lot of files to convert to text and don't want to do them one
by one. And will it also convert some of them into one text?
Thank you.
|
for converting html manuals to text I use browser called lynx.
lynx is a text-only browser. There is version for Linux, mac OSX an windows.
just run it from commandline like this
lynx -dump -width=10000 myHTMLfile.html > myHTMLfile.txt
that is it.
-dump option means that the browser does not start "browsing" but converts the html page to text and dumps it to the standard output
-width=10000 tells lynx not to wrap lines at the 80 character position
> myHTMLfile.txt means that the text file that lynx has dumped to the standart output will be saved as myHTMLfile.txt
you can get lynx (it is Free Software) here:
http://www.subir.com/lynx/binaries.html
you can see other options here:
http://linux.die.net/man/1/lynx
I suggest that you
- save the binary in c:\bin\lynx.exe
- start console (command line)
- change to the directory with books
using cd "C:/my books to convert"
- issue command dir *.htm* /b > convert.bat
- that command will create text file with list of all your books
- edit convert.bat with your favourite text editor.
you get file like this
.........
mybook1.htm
mybook2.htm
........
after editing it should look like:
..........
c:\bin\lynx.exe -dump -width=10000 mybook1.htm > mybook1.htm.txt
c:\bin\lynx.exe -dump -width=10000 mybook2.htm > mybook2.htm.txt
..........
- run convert bat, put your feet on the table and smugly watch as hundreds of files gets converted in couple of minutes.