Thread: Ebook from pdf
View Single Post
Old 01-10-2011, 01:15 PM   #5
naravive
Enthusiast
naravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animalsnaravive is kind to children and small, furry animals
 
Posts: 28
Karma: 6534
Join Date: Jan 2011
Device: Kindle, Boox M92
Sumeet, if I may, I think maybe you're asking the wrong question here-- to download as pdf then convert would be the long way around. What we need is a script that just goes to the tehelka main website -- http://www.tehelka.com/ scrapes the stories and compiles the ebook. As a non-python user, I tried to do it by going to the rss feed page: http://www.tehelka.com/feeds/tehfeed.xml and adding that to the Basic Recipe. But this has many problems -- the images are dropped for some reason, and the feeds come undistinguished by issue or without an index to tell you which section it comes from.

Now: the help we need from the wizards on this forum: we need a recipe that would pull section by section from the site (click section headers on main page) -- Current Affairs, Business, Culture and Society, etc --by clicking into each link on each section, then assemble the week's issue (this is a weekly) with a section based index.

Can anyone help us with writing this recipe, or point to a preloaded recipe for a site that's organised in the same way, that I could try to modify to fit Tehelka's? Tehelka, by the way, is one of the best Indian weeklies and with a major commitment to investigative journalism, so I think it's important that there's a recipe for it.
naravive is offline   Reply With Quote