Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Reading and Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-12-2009, 03:53 PM   #1
remjax
Junior Member
remjax began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2006
Device: toshiba E755,Zauru c1000
xml or xhtml to basic html converter

I am looking for a simple (and free) program that will take a downloaded website page and REMOVE all the special java, script, xml etc coding.

I want just as simple HTML as possible. Reason is I then zip the page and pictures to read later or store for reference.

Any such programs available?

Thanks
remjax is offline   Reply With Quote
Old 06-12-2009, 09:38 PM   #2
jgray
Fanatic
jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.jgray ought to be getting tired of karma fortunes by now.
 
Posts: 512
Karma: 1018067
Join Date: Mar 2008
Device: Galaxy Tab 10.1 & Note II
HTML Tidy will do what you want. The free version has been around for years and is used by many other HTML editing programs. Notepad++ is one such program.
jgray is offline   Reply With Quote
Old 06-13-2009, 12:41 PM   #3
remjax
Junior Member
remjax began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2006
Device: toshiba E755,Zauru c1000
Thanks

Thanks but unless I didn't find the option it wont do what I want.

As an example...I want to remove all "SPAN" , "div", etc codes and replace them with simple HTML codes

<span italic> to <I> for example. I want to shrink the page as much as possible.
before zipping it.
remjax is offline   Reply With Quote
Old 06-13-2009, 03:41 PM   #4
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 1,735
Karma: 1643110
Join Date: Jan 2009
Device: Kindle, iPad (not used much for reading)
I don't think <span italic> is valid html, so maybe that is why the program couldn't fix it. It would normally be <span style="font-style:italic"> or similar. Or, <span><i>xxx</i></span>.

If other HTML tags are fixable with the program, you could just manually edit the weird ones with find and replace.

There are regular expressions that you can put into programs that will fix some HTML, but it is usually going to miss a few, since HTML can be a little complex, especially if it came from something like the awful Word HTML.
susan_cassidy is offline   Reply With Quote
Old 06-20-2009, 04:45 PM   #5
remjax
Junior Member
remjax began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2006
Device: toshiba E755,Zauru c1000
Talking Thanks but that was just an example

Thanks for your info Susan. I was only using a shorthand form of the command.

I save a LOT of web pages for future information and they have a tremendous amount of "script's" or other unneeded commands for basic viewing.

I still prefer <B> to <strong> for example. And <I> instead of <span style="font-style:italic">. Its simplier and makes for a smaller file since my PDA has limited file space even in todays world of huge flash cards.

I am looking for a program or script file for windows that will everythinh to its simplist form.

Thanks again
remjax is offline   Reply With Quote
Old 06-20-2009, 05:04 PM   #6
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 1,735
Karma: 1643110
Join Date: Jan 2009
Device: Kindle, iPad (not used much for reading)
That would require an awfully smart program, for use by not very many people. Only if someone ambitious, with good programming skills, ran into the same problem, and felt like providing the program, are you likely to be able to find a program like that.

Usually, web pages designed for mobile use have simpler HTML, or web pages for older browsers. That is usually done with JavaScript, or on the server end when detecting the client browser. Maybe you can find an older browser to try and get the page with, or use the mobile version of the site, if available.
susan_cassidy is offline   Reply With Quote
Old 06-25-2009, 01:48 PM   #7
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
As far as I can tell, CSS tags like <span style="font-style:italic"> *are* standard HTML as of 4.0, and in fact, is preferable to <i> for the sake of future compatibility. <i> isn't deprecated yet, but I have a feeling that's coming.
frabjous is offline   Reply With Quote
Old 07-09-2009, 07:22 AM   #8
remjax
Junior Member
remjax began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2006
Device: toshiba E755,Zauru c1000
frabjous:

Again, Standards dont interest me. HTML/XHTML/Flavor of the month are getting way to complex. Java scripts, Css,any other imbeded "non html" programs just get in the way of reading on my pda and also in my opinion make developing a universal reader way ore difficult that needed.

By manually editing out the "garbage" from a saved web site I can both speed up file access and decrease the size of the file. Both positivies from my view.

Actually, even on my desktop with dual core AMD on fast DSL, I still find waiting for all the complex "stuff" which slows page loading down to be a pain.

It seems programmers use all the tools to make a overly complex and fancy web site "just because they can"! Just because something is a "Standard" doesn't mean it is the best. I would guess that "basic" HTML will be readable for a long time to come by my readers (ie uBook for one) while not supporting the "latest and greatest" web thingy!

Anyway thanks for the info, I didn't expect to find such a tool but it usually never hurts to ask.
remjax is offline   Reply With Quote
Old 07-09-2009, 08:43 AM   #9
Jack Tingle
Punctuation Fetishist
Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.Jack Tingle ought to be getting tired of karma fortunes by now.
 
Jack Tingle's Avatar
 
Posts: 551
Karma: 1030732
Join Date: Nov 2008
Location: The Bluest Commonwealth In East America
Device: Kindle PW, Nexus 7 (2013), Galaxy Player 5 (YP-G70C)
You might try MS's html filter for Word. It greatly simplifies the html that Word writes. Just read your file into Word, save it as a .doc, reload, and export it using the filter.

Regards,
Jack Tingle
Jack Tingle is offline   Reply With Quote
Old 07-10-2009, 01:48 PM   #10
susan_cassidy
Wizard
susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.susan_cassidy ought to be getting tired of karma fortunes by now.
 
Posts: 1,735
Karma: 1643110
Join Date: Jan 2009
Device: Kindle, iPad (not used much for reading)
How about the web site Readability: http://lab.arc90.com/experiments/readability/. It removes clutter from web pages for conversion to ebooks. Someone posted about it not too long ago, and I saved the link.
susan_cassidy is offline   Reply With Quote
Old 07-14-2009, 07:21 AM   #11
Sweetpea
Grand Sorcerer
Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.Sweetpea ought to be getting tired of karma fortunes by now.
 
Sweetpea's Avatar
 
Posts: 7,914
Karma: 22621808
Join Date: Dec 2008
Location: Krewerd
Device: HTC Flyer; BBMini; Sony PRS650
Quote:
Originally Posted by remjax View Post
Thanks but unless I didn't find the option it wont do what I want.

As an example...I want to remove all "SPAN" , "div", etc codes and replace them with simple HTML codes

<span italic> to <I> for example. I want to shrink the page as much as possible.
before zipping it.
What would you replace <div> with? <div> is a block element, together with the <p>. If you want to strip the div's, you should also strip the p's. They are both basic HTML elements. The only difference is that the div is a line, while the p is a paragraph (including the empty line at the end).

Quote:
Originally Posted by remjax View Post
frabjous:

Again, Standards dont interest me. HTML/XHTML/Flavor of the month are getting way to complex. Java scripts, Css,any other imbeded "non html" programs just get in the way of reading on my pda and also in my opinion make developing a universal reader way ore difficult that needed.

By manually editing out the "garbage" from a saved web site I can both speed up file access and decrease the size of the file. Both positivies from my view.

Actually, even on my desktop with dual core AMD on fast DSL, I still find waiting for all the complex "stuff" which slows page loading down to be a pain.

It seems programmers use all the tools to make a overly complex and fancy web site "just because they can"! Just because something is a "Standard" doesn't mean it is the best. I would guess that "basic" HTML will be readable for a long time to come by my readers (ie uBook for one) while not supporting the "latest and greatest" web thingy!

Anyway thanks for the info, I didn't expect to find such a tool but it usually never hurts to ask.
I think I know what you want. I often stirp-down pages myself. But you'll need to be able to use regular expressions. And be warned that if you remove the styles, you can also lose layout, especially if classes are used. And CSS has been incorporated for so long now, that if your PDA can't handle it, it's really time to buy a new one...) I can understand the removal of the javascript, but not the style information...

(and often it's not the developer that wants the complex stuff, it's the one that gives the order to build the web application/website.)
Sweetpea is offline   Reply With Quote
Old 07-14-2009, 09:04 AM   #12
remjax
Junior Member
remjax began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2006
Device: toshiba E755,Zauru c1000
Sweetpea:

I usually use Editpad and change <DIV> To <P>, <strong> to <B>, delete all css, delete all scripts, delete or change any <span ....> to its basic equialent such as <I> instead of its <span> equal.

Then remove most of the tables, along with any "extra stuff" involving pictures usually ending up with just <img src="xyz.jpg">.

Any classes are removed, etc. This gives a very basic,simple, and fast HTML file along with the pictures which I then ZIP into one file. uBook, FBreader, and Alreader all with then display the file withouot needing it unzipped.

Manually is slow and a pain but does give me what I want for storage and use. I thought there might be a script file somewhere that would do this or allow me to modify it to perform at least some of the grunt work.

Guess I will keep doing it the good old fashioned way! Thanks anyway all.
remjax is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
are there any good text to basic html programs? grechzoo General Discussions 14 06-06-2010 01:05 PM
Several xhtml/html to a single epub file help. clowe1028 ePub 3 03-21-2010 03:47 AM
Jetbook HTML (XHTML) rogue_ronin Ectaco jetBook 19 02-12-2010 09:13 PM
best converter from pdf to html ? NASCARaddicted ePub 3 02-11-2010 10:47 AM
html to bbeb converter ? bugsbunny14 Sony Reader 10 11-07-2008 10:50 PM


All times are GMT -4. The time now is 07:07 AM.


MobileRead.com is a privately owned, operated and funded community.