![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
|
How to grab plain (Sciencedirect) HTML?
Hi all,
I'd like to save the plain HTML version of ScienceDirect articles, like this one: http://dx.doi.org/10.1016/j.compenvurbsys.2009.06.001 I'd need to get only the HTML part (no menus,no-boxed FULL WIDTH version) for subsequent conversion in Mobi. Is there any tool, browser add-on, or anything similar to get just that part of the HTML? Maybe something that would allow me to select the desired part of the Web page and save it as HTML? Thank you in advance. Regards. Last edited by johndoesecond; 02-01-2010 at 12:00 PM. |
![]() |
![]() |
![]() |
#2 |
Groupie
![]() ![]() ![]() ![]() Posts: 153
Karma: 364
Join Date: Oct 2009
Location: Sweden
Device: Amazon Kindle 2 Intl
|
There is an option to purchase the full article in pdf or html. Does it look the same as the free sample?
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
|
Quote:
Here's a link that should show you a full article: http://www.sciencedirect.com/science...b&artImgPref=F As I said, PDF pages are just to big to fit confortly even my 9.7" Kindle DX display. Any ideas how to effectively strip that HTML box (to MOBIze it after that)? |
|
![]() |
![]() |
![]() |
#4 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Here's one thought. With Firefox (--I'm using 3.5--), go to the site, and then highlight the part you want. You might be able to copy and paste that into a Word Processor, but there's a good chance that won't work out too well.
So try this instead, after selecting the part you want, right click and choose "View Selection Source". Copy the HTML code it gives there into your favorite text editor. Precede with: Code:
<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> <body> Code:
</body></html> You might lose some formatting. If it's important, go back to the page, and look at the full web page source (Ctrl-U in Firefox), find the parts that look like: <link rel="stylesheet" ... type="text/css"> and copy them into your new .html file between the <head> </head> parts. Make sure the full URL for the CSS file is in the href="..." part. Save again. Worth a shot. |
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
|
Quote:
I'm not using Firefox, so wasn't aware of this feature. (I will from now on!) However, I'm not sure this will do the trick. The way you're suggesting will get only the HTML, but I'd also need to download all the images (JPGs & C) linked in the document. Any further hint? Thanks again. Last edited by johndoesecond; 02-02-2010 at 11:53 AM. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
Hi
I advice you to try the PDF version of the article in your DX - you can get it free from here if you want: http://198.81.200.2/science/journal/01989715, 3th text. Before using it, let it be processed by “SoPDF” (you can get the files in the forum for free), and choose “Fix 2x With” with “White Space Croping” for your DX, it will give you a PDF file, without all the white margins and with the page cut in 2, so you can read it landscape. Probably it’s enough for you to read the small size text, and will retain all the images and tables. Here you have an example of that, and even in my 6” eBook reader(s) I can read it. Let me know if this was of some help. Best regards, Last edited by DDHarriman; 02-02-2010 at 05:16 PM. |
![]() |
![]() |
![]() |
#7 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
Code:
However, I'm not sure this will do the trick. The way you're suggesting will get only the HTML, but I'd also need to download all the images (JPGs & C) linked in the document. Any further hint? I don't a lot of time to think about this today, but here's one thought -- adding one level of complexity. Navigate to the page on that website, and then go to "Save Page As...". In the "Save As" dialog box, be sure to choose "Web Page, Complete" as the format to save it as. That will save the file as an .HTML file (say, science.html) and will create an folder (science_files) where it will put all the images. The only problem is that the page you just saved has all the menus and other nonsense. So NOW open the science.html file you just saved in Firefox, highlight the part you want and view its code and follow the procedure I outlined above. The image links will link the ones on your harddrive rather than the remove site. So long as you save the .html file in the same folder as you saved the original, I think calibre (or whatever) should be able to find them when you convert to .mobi. I'll have to test that later, however. P.S. Didn't see DDHarriman's post. I'm a big fan of soPDF, but I'm not sure that's the way to go here. Try both methods and see what you prefer. Last edited by frabjous; 02-02-2010 at 03:38 PM. |
![]() |
![]() |
![]() |
#8 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 2000
Join Date: Jan 2010
Device: Kindle DX, Kindle 4, Kindle PW2
|
Quote:
I don't a lot of time to think about this today, but here's one thought -- adding one level of complexity. [/QUOTE] Thanks DDHarriman and frabjous! Both hints were useful, and will do the job, depending on the article/PDF's formatting. |
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre Recipe HTML content differs from raw html of index.html. | krunk | Calibre | 4 | 09-20-2010 09:48 PM |
Grab news error with PDF output | chatainsim | Calibre | 3 | 02-28-2010 06:59 PM |
Automatically grab news at windows startup ? | phkoech | Calibre | 3 | 08-25-2009 02:14 AM |
The Sunday Times:Google makes a grab for e-books | Kris777 | News | 8 | 03-29-2009 01:18 PM |
ScienceDirect making 4'000 e-books available | TadW | News | 1 | 09-11-2007 05:56 AM |