|  10-21-2010, 08:44 AM | #1 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | 
				
				Downloading Webpages without RSS
			 
			
			Hi, I would like to download several hierarchically ordered web pages. But there is no RSS used here. I would just like to follow the links from the top page and create articles for the sub-pages. Is there an easy way to do this with recipes? They seem to be pretty powerfull. Is there a tutorial for creating ebooks with recipes from web sites without RSS? Thanks, Jens | 
|   |   | 
|  10-21-2010, 09:23 AM | #2 | |
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Here are some links: https://www.mobileread.com/forums/sho...postcount=1878 These two are the best for parse_index: http://calibre-ebook.com/user_manual...ownloaded-html And: http://bugs.calibre-ebook.com/wiki/recipeGuide_advanced | |
|   |   | 
|  10-21-2010, 09:42 AM | #3 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | 
			
			Thanks for the answer. I have a question regarding the recursive html download. I am familiar with wget. So when I have the site downloaded, do I convert it with calibre or mobireader creator? Will is keep the links in between the files and put it all into a single document? The calibre recipe would of course enable me to cut away undesired part from the webpage and get print versions ... Thanks, Jens | 
|   |   | 
|  10-21-2010, 10:06 AM | #4 | |||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Quote: 
 Quote: 
 | |||
|   |   | 
|  10-21-2010, 10:27 AM | #5 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | |
|   |   | 
|  10-21-2010, 10:44 AM | #6 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | |
|   |   | 
|  10-22-2010, 07:44 AM | #7 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | 
			
			I found that Mobireader Creator is doing a much better job than Calibre converting complete websites to the mobi format.
		 | 
|   |   | 
|  10-22-2010, 10:58 AM | #8 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | 
			
			What does it do better? More specifically, what do you start with - the local downloaded html from a scraper (wget, htttrack), a recipe, a link?  What settings did you use for converting your starting format?
		 | 
|   |   | 
|  10-22-2010, 04:27 PM | #9 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | 
			
			Sorry I was a bit to brief: I downloaded the site using htttack. Then I used calibre to read to index.html file. Afterwards I downloaded the index.html to the kindle. The result did not look nice and the kindle got somehow stuck when reading the generated book. I.e. it went back to the main menu. So something went wrong. However, I can't see in calibre where I have any control about the conversion. In Mobipocket Creator I can add all html files by dragging them into the tool. I can set html tags for creating a table of contents and the conversion to prc format is pretty quick. The results looked much better than in calibre and did not crash. | 
|   |   | 
|  10-22-2010, 04:46 PM | #10 | |||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Quote: 
 Quote: 
 I'm glad it worked for you. | |||
|   |   | 
|  10-23-2010, 04:26 AM | #11 | 
| Zealot  Posts: 115 Karma: 20 Join Date: Jul 2010 Device: Kindle3 3G, Kindle Paperwhite 2 | 
			
			Apparently, I did not grasp the power of calibre. Is there a good tutorial for: - converting html to mobi - converting pdf to mobi with emphasis on how to create a decent toc and how to influence the ordering of html subpages within a site? Currently, I download a site using wget or htttrack or convert pdf to html. Then I look into the html code with firebug and try to identify chapter section borders and insert tags manually (automatic recursive replace) into the html code. Then I generate a prc using Mobipocket Creator using the manually inserted tags to create a toc. Then I go onto calibre and convert the prc into mobi to be able to navigate through chapters using the 4 way button. Here, again I have to enter the html chapter tags so I do not loose the toc hierarchy. Quite an elaborate process. I am happy about any suggestions for speeding up this process. I suspect calibre could handle that in an efficient way, if I would just be able to use it properly. Thanks, Jens | 
|   |   | 
|  10-23-2010, 09:52 AM | #12 | |||||
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | Quote: 
 Quote: 
 Quote: 
 Quote: 
 Quote: 
 | |||||
|   |   | 
|  10-23-2010, 01:03 PM | #13 | 
| creator of calibre            Posts: 45,598 Karma: 28548962 Join Date: Oct 2006 Location: Mumbai, India Device: Various | |
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| Classic G:RSS: Optimized Google Reader (RSS) for the Nook [BETA Testers needed] | Fmstrat | Barnes & Noble NOOK | 24 | 12-28-2010 12:22 PM | 
| Downloading and Converting Print version of RSS article | Daanish87 | Calibre | 1 | 06-11-2010 02:08 AM | 
| Is there a good way to convert partial rss to full rss feeds. | Zorz | Other formats | 5 | 05-29-2010 12:17 PM | 
| how to do, webpages to?¿ | whopper | Workshop | 0 | 11-24-2008 07:31 PM | 
| Webpages to Mobi? | flashman | Alternative Devices | 0 | 07-29-2007 04:36 PM |