MobileRead Forums - View Single Post

naravive · 01-10-2011, 02:15 PM

Sumeet, if I may, I think maybe you're asking the wrong question here-- to download as pdf then convert would be the long way around. What we need is a script that just goes to the tehelka main website -- http://www.tehelka.com/ scrapes the stories and compiles the ebook. As a non-python user, I tried to do it by going to the rss feed page: http://www.tehelka.com/feeds/tehfeed.xml and adding that to the Basic Recipe. But this has many problems -- the images are dropped for some reason, and the feeds come undistinguished by issue or without an index to tell you which section it comes from.

Now: the help we need from the wizards on this forum: we need a recipe that would pull section by section from the site (click section headers on main page) -- Current Affairs, Business, Culture and Society, etc --by clicking into each link on each section, then assemble the week's issue (this is a weekly) with a section based index.

Can anyone help us with writing this recipe, or point to a preloaded recipe for a site that's organised in the same way, that I could try to modify to fit Tehelka's? Tehelka, by the way, is one of the best Indian weeklies and with a major commitment to investigative journalism, so I think it's important that there's a recipe for it.

01-10-2011, 02:15 PM	#5
naravive Enthusiast Posts: 28 Karma: 6534 Join Date: Jan 2011 Device: Kindle, Boox M92	Sumeet, if I may, I think maybe you're asking the wrong question here-- to download as pdf then convert would be the long way around. What we need is a script that just goes to the tehelka main website -- http://www.tehelka.com/ scrapes the stories and compiles the ebook. As a non-python user, I tried to do it by going to the rss feed page: http://www.tehelka.com/feeds/tehfeed.xml and adding that to the Basic Recipe. But this has many problems -- the images are dropped for some reason, and the feeds come undistinguished by issue or without an index to tell you which section it comes from. Now: the help we need from the wizards on this forum: we need a recipe that would pull section by section from the site (click section headers on main page) -- Current Affairs, Business, Culture and Society, etc --by clicking into each link on each section, then assemble the week's issue (this is a weekly) with a section based index. Can anyone help us with writing this recipe, or point to a preloaded recipe for a site that's organised in the same way, that I could try to modify to fit Tehelka's? Tehelka, by the way, is one of the best Indian weeklies and with a major commitment to investigative journalism, so I think it's important that there's a recipe for it.