10-24-2007, 02:41 PM | #1 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Oct 2007
Device: PRS-500
|
Conversion software
Are there any programs that will convert html documents to text? I
have a lot of files to convert to text and don't want to do them one by one. And will it also convert some of them into one text? Thank you. |
10-24-2007, 02:46 PM | #2 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
For that task your best bet would be html2lrf in free package libprs500. Click on the libprs500 link on the Conversion page in the MobileRead Wiki.
|
Advert | |
|
10-24-2007, 04:49 PM | #3 |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
The thread for html2lrf is a sticky in the Reader content section I think.
|
10-24-2007, 09:16 PM | #4 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Oct 2007
Device: PRS-500
|
Okay tried it. It did convert html to another format. I haven't yet put them on the memory stick, or read them in the reader yet. But I have saved to disk. The only problem is that I converted more than one html document. About 10 pages so far, but the problem is that when I save it on disk it saves it in seperate folders, so if I want to put it on disk, or in the library I have to open each folder to drag and drop it in there.
|
10-24-2007, 10:01 PM | #5 | |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
Quote:
|
|
Advert | |
|
10-24-2007, 10:03 PM | #6 |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
maybe lilpretender is using the GUI (which I do not use) and it acts differently then the command line version which I do use.
|
10-24-2007, 10:10 PM | #7 |
creator of calibre
Posts: 43,844
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
No both GUI and CLI follow all links. I think he means he's got 10 different documents.
|
10-24-2007, 10:28 PM | #8 |
Member
Posts: 19
Karma: 10
Join Date: May 2005
Location: Indianapolis, IN
Device: Palm TX, Sony PRS505, Sony 650, Sony PRS T1
|
Also HTMLAsText by NirSoft. It is freeware.
Fred |
10-24-2007, 10:29 PM | #9 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
If he is using 10 different documents as kovid suggests then it would be a simple matter to write an integration HTML that links/calls all of the files.
|
10-24-2007, 10:43 PM | #10 |
Enthusiast
Posts: 44
Karma: 30
Join Date: Sep 2007
Device: Sony
|
Or he could just do a find file and they all show up in one 'virtual' folder. That flattens them out so you can drag and drop easily.
|
10-24-2007, 11:18 PM | #11 |
Resident Curmudgeon
Posts: 73,897
Karma: 128597114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I think I get it now.. 10 different HTML files in 10 different directories.
|
10-25-2007, 04:38 AM | #12 | |
Wizard
Posts: 3,450
Karma: 10484861
Join Date: May 2006
Device: PocketBook 360, before it was Sony Reader, cassiopeia A-20
|
Quote:
lynx is a text-only browser. There is version for Linux, mac OSX an windows. just run it from commandline like this lynx -dump -width=10000 myHTMLfile.html > myHTMLfile.txt that is it. -dump option means that the browser does not start "browsing" but converts the html page to text and dumps it to the standard output -width=10000 tells lynx not to wrap lines at the 80 character position > myHTMLfile.txt means that the text file that lynx has dumped to the standart output will be saved as myHTMLfile.txt you can get lynx (it is Free Software) here: http://www.subir.com/lynx/binaries.html you can see other options here: http://linux.die.net/man/1/lynx I suggest that you - save the binary in c:\bin\lynx.exe - start console (command line) - change to the directory with books using cd "C:/my books to convert" - issue command dir *.htm* /b > convert.bat - that command will create text file with list of all your books - edit convert.bat with your favourite text editor. you get file like this ......... mybook1.htm mybook2.htm ........ after editing it should look like: .......... c:\bin\lynx.exe -dump -width=10000 mybook1.htm > mybook1.htm.txt c:\bin\lynx.exe -dump -width=10000 mybook2.htm > mybook2.htm.txt .......... - run convert bat, put your feet on the table and smugly watch as hundreds of files gets converted in couple of minutes. |
|
10-27-2007, 11:26 PM | #13 | |
Enthusiast
Posts: 45
Karma: 10
Join Date: Oct 2007
Device: PRS-500
|
Quote:
Also, can I use this program for the PRS-505 too? I'm thinking of getting one. |
|
10-28-2007, 12:11 AM | #14 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
The LRF files will work on both the 500 and the 505. He is working to add 505 features to the current functionality.
For batch conversion I use the command line version and make a DOS BATch file. I started on PCs years ago and with many years of DOS batch file processing (and many years of IBM JCL scripts before that.) |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Aptara eGen: high-volume ebook conversion Software | Nate the great | News | 9 | 01-15-2010 11:27 AM |
Free PDF Conversion Software (Today ONLY) | PGP_Protector | News | 7 | 03-03-2009 11:57 AM |
New PDF conversion software at Amazon? | carld | Amazon Kindle | 1 | 10-08-2008 01:09 PM |
try the Hanlin Conversion Software and Printer | CommanderROR | HanLin eBook | 9 | 08-06-2008 06:42 AM |