Originally Posted by lilpretender
Are there any programs that will convert html documents to text? I
have a lot of files to convert to text and don't want to do them one
by one. And will it also convert some of them into one text?
for converting html manuals to text I use browser called lynx.
lynx is a text-only browser. There is version for Linux, mac OSX an windows.
just run it from commandline like this
lynx -dump -width=10000 myHTMLfile.html > myHTMLfile.txt
that is it.
-dump option means that the browser does not start "browsing" but converts the html page to text and dumps it to the standard output
-width=10000 tells lynx not to wrap lines at the 80 character position
> myHTMLfile.txt means that the text file that lynx has dumped to the standart output will be saved as myHTMLfile.txt
you can get lynx (it is Free Software) here:
you can see other options here:
I suggest that you
- save the binary in c:\bin\lynx.exe
- start console (command line)
- change to the directory with books
using cd "C:/my books to convert"
- issue command dir *.htm* /b > convert.bat
- that command will create text file with list of all your books
- edit convert.bat with your favourite text editor.
you get file like this
after editing it should look like:
c:\bin\lynx.exe -dump -width=10000 mybook1.htm > mybook1.htm.txt
c:\bin\lynx.exe -dump -width=10000 mybook2.htm > mybook2.htm.txt
- run convert bat, put your feet on the table and smugly watch as hundreds of files gets converted in couple of minutes.