Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-25-2010, 07:20 PM   #1
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
recipe to pull web page similar to 'print/save as pdf'

Let me apologize right up front for my lack of savvy in html, calibre, or programming of any sort. I've been creating eBooks from instructional web sites for my Kindle DX. The web sites are typically set up like book chapters; from the TOC you select a 'chapter', and you click 'next/back' to navigate within each chapter. I go page by page, and do a file/print/save as pdf. Then I open them with Acrobat Pro to customize the metadata. Then I send them to Amazon for conversion, but as they often have scientific notation, figures, etc., they don't convert well, so I end up USB synching and dragging the pdf file from my Mac to the Kindle. Then repeat for every page...

I just discovered Calibre, and thought I'd found salvation. While all the articles are focused on news feeds, I thought it should be simple enough to create a custom 'news source', and use each url of the web page I want as the 'feed'. Wrong. All I end up with are strings of html code. Am I trying to do something that can't be done, or is it not as simple as just entering a web page's URL into the 'feed' field? Any help would appreciated.
JPD is offline   Reply With Quote
Old 09-25-2010, 08:10 PM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JPD View Post
Am I trying to do something that can't be done, or is it not as simple as just entering a web page's URL into the 'feed' field?
It's not as simple as just entering a web page's URL into the 'feed' field. A "feed" has a special format that gets parsed by the recipe. Your web page doesn't have that format. You wrote that you click on 'next/back' to get what you want. However, there are probably lots of links on each page. Links for ads, links to register, links to navigate home, etc. The recipe would need to follow only the links you want, and none of the others.

Can it be done? Probably, but you'd need to custom write it to do what you need. You could look at some of the multipage recipes. You might also consider web scrapers like wget, web2disk, WinHtTrack, etc.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 10:32 PM   #3
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
Thanks for the tips Starson17, and for verifying that there's more to this than I suspected. I suppose I'll keep plugging along with pdfs as before while trying to 'go to school' on this subject. Although even the calibre's basic getting started tutorials are over my head right now.
JPD is offline   Reply With Quote
Old 09-26-2010, 02:00 AM   #4
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
no luck bookit or instapaper; html source worked, but no images

I tried using Bookit to convert web pages to mobi, but ran into the same brick wall of an error message others noted. Then I tried instapaper, as Calbre has a recipe for 'read later' web pages, but it didn't preserve any of the web page formatting or images. So then tried just viewing the page source of the web page I wanted to convert, saved it as a file, added it as a book to my Calibre library, and did a mobi convert. It worked almost perfectly, preserved all the formatting, but the fatal flaw was it just had boxes with '?' icons where the images should be - it was not pulling the embedded images, e.g. '<img src="redliq.gif" width="251" height="110" /></p><pre>', where if you click on the gif link in the source it brings up the image, but it's not making it into the eBook.

If anyone has any suggestions it would be most welcome.
JPD is offline   Reply With Quote
Old 09-26-2010, 08:42 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JPD View Post
I tried using Bookit to convert web pages to mobi, but ran into the same brick wall of an error message others noted. Then I tried instapaper, as Calbre has a recipe for 'read later' web pages, but it didn't preserve any of the web page formatting or images. So then tried just viewing the page source of the web page I wanted to convert, saved it as a file, added it as a book to my Calibre library, and did a mobi convert. It worked almost perfectly, preserved all the formatting, but the fatal flaw was it just had boxes with '?' icons where the images should be - it was not pulling the embedded images, e.g. '<img src="redliq.gif" width="251" height="110" /></p><pre>', where if you click on the gif link in the source it brings up the image, but it's not making it into the eBook.

If anyone has any suggestions it would be most welcome.
Same suggestion as before. Tools like wget can extract a web site, following links you define and returning html. Alternatively, you can try using the recipe system. You may need to turn on recursion. It's hard to make any suggestions to deal with your issues, since you haven't provided any links or copies of sites that you're having problems with.
Starson17 is offline   Reply With Quote
Old 09-26-2010, 12:16 PM   #6
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
almost with iMacros, but unwanted image on top of converted eBook page

duh, I was saving the html file to my computer, and using that file to convert to mobi, but of course all the paths to the images now pointed to where the file was on my computer, while the actual images were on the web site's server.

So I spent the night crawling through the website, going to 'page info/media' for every page, selecting every img, and saving the 2,000 collected .gifs to the same folder the html files were in. Now Calibre gave me a complete mobi with all the images, with one flaw - it plops one of the images at the top of every page. But I was more concerned with not having to repeat what I'd just done for every site, so after much searching found a wonderful FF web-scraper plug-in, iMacros ( https://addons.mozilla.org/en-US/firefox/addon/3863/ ), that will save all the web files, html and imgs. This is an enormous time saver, but I still get the unwanted image at the top of every converted mobi. Any ideas, short of learning to use Sigil and editing them as ePubs (which ain't gonna happen)?

Here's an example of one of the web pages I'm trying to convert:
http://www.chemguide.co.uk/analysis/...ation.html#top

In any zip file I convert to mobi in Calibre, there will be an image at position 1.0 of the eBook.
JPD is offline   Reply With Quote
Old 09-26-2010, 01:23 PM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JPD View Post
duh, I was saving the html file to my computer, and using that file to convert to mobi, but of course all the paths to the images now pointed to where the file was on my computer, while the actual images were on the web site's server.
Here's an example of one of the web pages I'm trying to convert:
http://www.chemguide.co.uk/analysis/...ation.html#top
I opened that page in FireFox, told FF to save it to my desktop, dragged the saved html into Calibre (the images were in the matching folder that FF made when it saved) and opened it to see all images correct. The save from FF saved images locally, and they were correctly picked up by Calibre when it made the book. Where did you have trouble?
Starson17 is offline   Reply With Quote
Old 09-26-2010, 03:05 PM   #8
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
Thanks for the assistance. I tried it following your protocol, but still get one of the page images at position 1.0, above the 'Electromagnetic Radiation' page heading where the book should actually start. The image still shows correctly where it's supposed to as well. It's almost as if the image is being inserted at the beginning as a book cover, although to the right it shows the generic book image. I don't know what I'm doing differently than you. I'm using a PPC Mac w/ OS 10.5.8, FF 3.6.10, and Calibre 0.7.20.
JPD is offline   Reply With Quote
Old 09-26-2010, 03:11 PM   #9
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
I don't know if this is related, but when I quite calibre i get this error message:

ERROR: ERROR: Unhandled exception: <b>IOError</b>:[Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle'

Traceback (most recent call last):
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 147, in main
return run_entry_point()
File "/Applications/calibre.app/Contents/Resources/Python/lib/python2.6/site.py", line 116, in run_entry_point
return getattr(pmod, func)()
File "site-packages/calibre/utils/ipc/worker.py", line 101, in main
IOError: [Errno 2] No such file or directory: '/var/folders/3g/3g++kTeeHJmwGtYBJz9CQk+++TI/-Tmp-/calibre_0.7.20_tmp_gFSqaR/ipc_result_1_7_q_9c8r.pickle'
JPD is offline   Reply With Quote
Old 09-26-2010, 03:14 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Are you looking at the saved html, an epub or some other converted format? Perhaps you want to ask up in the main forum, as this isn't really a recipe issue and you may find more focused help there.
Starson17 is offline   Reply With Quote
Old 09-26-2010, 03:36 PM   #11
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
I save the FF page and drag the html file to calibre; at this point it's zip, and I haven't opened anything yet. I then convert to mobi, and it's then that I view the converted file and there's an image at position 1.0 where the content should actually begin. I'm happy to take this to another forum, but before that I'd like to try and understand why you're conversion of the same web page is rendering correctly, without this stray image, and mine is not.
JPD is offline   Reply With Quote
Old 09-27-2010, 08:15 AM   #12
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JPD View Post
I save the FF page and drag the html file to calibre; at this point it's zip, and I haven't opened anything yet. I then convert to mobi, and it's then that I view the converted file and there's an image at position 1.0 where the content should actually begin. I'm happy to take this to another forum, but before that I'd like to try and understand why you're conversion of the same web page is rendering correctly, without this stray image, and mine is not.
I don't convert to to mobi, as I have nothing that uses mobi. Hold on ....I just converted to mobi, and I get an image at the top. It's a conversion issue, not a recipe issue. I wish I could help, but .......
Starson17 is offline   Reply With Quote
Old 09-27-2010, 01:30 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
MOBI doesn't support floating images, so calibre puts em where they appear in the source document markup
kovidgoyal is offline   Reply With Quote
Old 09-27-2010, 01:43 PM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by kovidgoyal View Post
MOBI doesn't support floating images, so calibre puts em where they appear in the source document markup
And what that means is you could grab the image tag in the recipe and put it where you want. That may be a lot of effort for a single page. I'm not sure if that work would carry over to other pages you are interested in.

Perhaps you should just edit the epub before conversion?
Starson17 is offline   Reply With Quote
Old 09-28-2010, 09:21 PM   #15
JPD
Member
JPD began at the beginning.
 
Posts: 12
Karma: 10
Join Date: Sep 2010
Device: Kindle
edit epub before conversion

I think editing the epub before conversion sounds like the best approach for this. Does that require learning Sigil, or is there a simpler, more basic editor for such minor edits that would be approachable to a newbie? And do I convert from zip to epub first, edit, then convert to mobi?. If you can advise the tools and basic approach I need, I can take further questions to another forum. I appreciate all your help. Thanks.
JPD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
would like a recipe to pull down a free online book N13L5 Recipes 17 10-09-2010 10:38 AM
Financial Times / FT - help creating a UK print edition recipe ndeb123 Recipes 1 09-29-2010 10:55 AM
Recipe - save some date for later retrieval mh445 Calibre 3 07-19-2010 04:06 PM
Anyway to save a web page as an RTF? Fugubot Sony Reader 16 02-06-2007 12:23 PM
Print magazines are better when they emulate the web Bob Russell News 0 05-18-2006 05:53 PM


All times are GMT -4. The time now is 08:27 PM.


MobileRead.com is a privately owned, operated and funded community.