Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 02-01-2010, 11:58 AM   #1
johndoesecond
Connoisseur
johndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it is
 
Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
How to grab plain (Sciencedirect) HTML?

Hi all,

I'd like to save the plain HTML version of ScienceDirect articles, like this one:
http://dx.doi.org/10.1016/j.compenvurbsys.2009.06.001

I'd need to get only the HTML part (no menus,no-boxed FULL WIDTH version) for subsequent conversion in Mobi.

Is there any tool, browser add-on, or anything similar to get just that part of the HTML? Maybe something that would allow me to select the desired part of the Web page and save it as HTML?

Thank you in advance.

Regards.

Last edited by johndoesecond; 02-01-2010 at 12:00 PM.
johndoesecond is offline   Reply With Quote
Old 02-01-2010, 03:12 PM   #2
Jonas777
Groupie
Jonas777 has a complete set of Star Wars action figures.Jonas777 has a complete set of Star Wars action figures.Jonas777 has a complete set of Star Wars action figures.Jonas777 has a complete set of Star Wars action figures.
 
Jonas777's Avatar
 
Posts: 153
Karma: 364
Join Date: Oct 2009
Location: Sweden
Device: Amazon Kindle 2 Intl
There is an option to purchase the full article in pdf or html. Does it look the same as the free sample?
Jonas777 is offline   Reply With Quote
Advert
Old 02-01-2010, 04:47 PM   #3
johndoesecond
Connoisseur
johndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it is
 
Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
Quote:
Originally Posted by Jonas777 View Post
There is an option to purchase the full article in pdf or html. Does it look the same as the free sample?
The HTML formatting is pretty the same.

Here's a link that should show you a full article:

http://www.sciencedirect.com/science...b&artImgPref=F

As I said, PDF pages are just to big to fit confortly even my 9.7" Kindle DX display.

Any ideas how to effectively strip that HTML box (to MOBIze it after that)?
johndoesecond is offline   Reply With Quote
Old 02-01-2010, 06:49 PM   #4
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Here's one thought. With Firefox (--I'm using 3.5--), go to the site, and then highlight the part you want. You might be able to copy and paste that into a Word Processor, but there's a good chance that won't work out too well.

So try this instead, after selecting the part you want, right click and choose "View Selection Source". Copy the HTML code it gives there into your favorite text editor. Precede with:

Code:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
follow with:

Code:
</body></html>
Save it as an HTML file, and open it in a browser to see how it looks.

You might lose some formatting. If it's important, go back to the page, and look at the full web page source (Ctrl-U in Firefox), find the parts that look like:

<link rel="stylesheet" ... type="text/css">

and copy them into your new .html file between the <head> </head> parts. Make sure the full URL for the CSS file is in the href="..." part. Save again.

Worth a shot.
frabjous is offline   Reply With Quote
Old 02-02-2010, 11:49 AM   #5
johndoesecond
Connoisseur
johndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it is
 
Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
Quote:
Originally Posted by frabjous View Post
So try this instead, after selecting the part you want, right click and choose "View Selection Source". Copy the HTML code it gives there into your favorite text editor. Precede with:
Hi frabjous,

I'm not using Firefox, so wasn't aware of this feature. (I will from now on!)

However, I'm not sure this will do the trick. The way you're suggesting will get only the HTML, but I'd also need to download all the images (JPGs & C) linked in the document.

Any further hint?

Thanks again.

Last edited by johndoesecond; 02-02-2010 at 11:53 AM.
johndoesecond is offline   Reply With Quote
Advert
Old 02-02-2010, 03:28 PM   #6
DDHarriman
Guru
DDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura aboutDDHarriman has a spectacular aura about
 
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
Hi

I advice you to try the PDF version of the article in your DX - you can get it free from here if you want: http://198.81.200.2/science/journal/01989715, 3th text.

Before using it, let it be processed by “SoPDF” (you can get the files in the forum for free), and choose “Fix 2x With” with “White Space Croping” for your DX, it will give you a PDF file, without all the white margins and with the page cut in 2, so you can read it landscape.
Probably it’s enough for you to read the small size text, and will retain all the images and tables.

Here you have an example of that, and even in my 6” eBook reader(s) I can read it.

Let me know if this was of some help.

Best regards,
Attached Files
File Type: pdf x.pdf (898.4 KB, 258 views)

Last edited by DDHarriman; 02-02-2010 at 05:16 PM.
DDHarriman is offline   Reply With Quote
Old 02-02-2010, 03:33 PM   #7
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
Code:
However, I'm not sure this will do the trick. The way you're suggesting will get only the HTML, but I'd also need to download all the images (JPGs & C) linked in the document.

Any further hint?

I don't a lot of time to think about this today, but here's one thought -- adding one level of complexity.

Navigate to the page on that website, and then go to "Save Page As...". In the "Save As" dialog box, be sure to choose "Web Page, Complete" as the format to save it as. That will save the file as an .HTML file (say, science.html) and will create an folder (science_files) where it will put all the images. The only problem is that the page you just saved has all the menus and other nonsense.

So NOW open the science.html file you just saved in Firefox, highlight the part you want and view its code and follow the procedure I outlined above. The image links will link the ones on your harddrive rather than the remove site. So long as you save the .html file in the same folder as you saved the original, I think calibre (or whatever) should be able to find them when you convert to .mobi.

I'll have to test that later, however.

P.S. Didn't see DDHarriman's post. I'm a big fan of soPDF, but I'm not sure that's the way to go here. Try both methods and see what you prefer.

Last edited by frabjous; 02-02-2010 at 03:38 PM.
frabjous is offline   Reply With Quote
Old 02-02-2010, 04:17 PM   #8
johndoesecond
Connoisseur
johndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it isjohndoesecond knows what time it is
 
Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
Quote:
Originally Posted by DDHarriman View Post
Hi

Here you have an example of that, and even in my 6” eBook reader(s) I can read it.

Let me know if this was of some help.

Best regards,
[QUOTE=frabjous;769857
I don't a lot of time to think about this today, but here's one thought -- adding one level of complexity.

[/QUOTE]

Thanks DDHarriman and frabjous!

Both hints were useful, and will do the job, depending on the article/PDF's formatting.
johndoesecond is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre Recipe HTML content differs from raw html of index.html. krunk Calibre 4 09-20-2010 09:48 PM
Grab news error with PDF output chatainsim Calibre 3 02-28-2010 06:59 PM
Automatically grab news at windows startup ? phkoech Calibre 3 08-25-2009 02:14 AM
The Sunday Times:Google makes a grab for e-books Kris777 News 8 03-29-2009 01:18 PM
ScienceDirect making 4'000 e-books available TadW News 1 09-11-2007 05:56 AM


All times are GMT -4. The time now is 12:39 AM.


MobileRead.com is a privately owned, operated and funded community.