01-03-2012, 02:36 PM | #1 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Recipe for Microwave Journal?
Hi there,
I know it is my 1st post, but believe me I had done my homework searching/reading as many posts/pages that I could to solve it by myself unsuccessfully. Last resort is asking for help... So I am trying to write a recipe that downloads articles from Microwave Journal website and convert it to ebook. Like NYT, MWJ also needs user/pass (which is Free, BTW). And also it has RSS site. To login, it sends you to another site and I think (not sure) that once logged in, the other site uses cookies and send the browser back to mwjournal.com. The login page has a checkbox for Remember me. With the above foreword, I wrote the following recipe: Spoiler:
I got "nr = 0" by inspecting the html file for the login page (the 1st FORM is for username/password). I also did check Remember me box (and tested unchecked too). Anyway, still when the epub is made, the site doesn't consider the user to be logged in (yes! I checked username password to be correct). I added two attachments. ePub showing the final result (not logged in) and TXT showing ebook-convert output (I manually deleted user/password, otherwise there were there correctly). Any help would be highly appreciated. PS. omeda.com hosts other magazines as well which I searched recipes online repository to see if any of those magazines are already there to reuse the code, but I found none. Last edited by kiavash; 01-06-2012 at 01:01 PM. Reason: Adding Attachments |
01-05-2012, 12:49 PM | #2 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Any clues? Please!
|
01-06-2012, 01:05 PM | #3 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
I modified get_browser and tried to set a cookie policy and still I get the same result. It didn't login. What is happening?
Spoiler:
Is there a way to dump all the HTML communication out to a file or folder, to see if the login is successful and it moved to fetch the articles after login? I know about --debug but that only dumpt the HTML of the RSS articles. How about to dump def get_browser(self): output to a file? |
01-06-2012, 02:55 PM | #4 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Another (unsuccessful) try:
This time, I checked if there is a JavaScript messing with login process, so fired up Firebug and inspected the login page. I saw this JavaScript. Spoiler:
Now I am out of my comfort zone as I don't know this language, but I can see couple of functions with cookie in their names: readCookie(name) and createCookie(name,value,days) Does get_browser() remove the JavaScript? I tried removing remove_javascript = True from recipe and changing it to False, but didn't login. I tried to follow this post, Spoiler:
but completely lost.
Anybody? Please! |
01-06-2012, 03:31 PM | #5 |
doofus
Posts: 2,507
Karma: 12615905
Join Date: Sep 2010
Device: Kobo Libra 2, Kindle Voyage
|
hi, what's happening is when you submit the login it returns this
Code:
<html> <head> <title>Redirect to BVD</title> </head> <body onLoad="document.forms[0].submit();"> <form method="post" action="http://www.mwjournal.com/default.asp"> <input type="hidden" name="cust_id" value="xxxxxxxxxx"> <input type="hidden" name="status" value="xxxxxxxxxx"> <input type="hidden" name="reqURL" value="xxxxxxxxxx"> <input type="hidden" name="email" value="xxxxxxxxxx"> <input type="hidden" name="password" value="xxxxxxxxxx"> <input type="hidden" name="fname" value="xxxxxxxxxx"> <input type="hidden" name="lname" value="xxxxxxxxxx"> <input type="hidden" name="company" value="xxxxxxxxxx"> <input type="hidden" name="country" value="xxxxxxxxxx"> <input type="hidden" name="job_title" value="xxxxxxxxxx"> <input type="hidden" name="newsletter" value="xxxxxxxxxx"> <input type="hidden" name="microwave_advisor" value="xxxxxxxxxx"> <input type="hidden" name="microview" value="xxxxxxxxxx"> <input type="hidden" name="remember_me" value="xxxxxxxxxx"> <input type="hidden" name="state" value="xxxxxxxxxx"> <!--<include>redirect-fields.htm</include>--> </form> </body> </html> Code:
def get_browser(self): br = BasicNewsRecipe.get_browser() if self.username is not None and self.password is not None: br.open('http://www.omeda.com/cgi-win/mwjreg.cgi?m=login') br.select_form('login') br['EMAIL_ADDRESS'] = self.username br['PASSWORD'] = self.password html = br.submit().read() open('/jwtmp.html','wb').write(html) br.open('file:///jwtmp.html') br.select_form(nr=0) br.submit() return br Note: you will want to clean that up for production code. You probably don't want to write to the root like that (permission problem), and you'll want to delete the temp file afterward. |
01-06-2012, 08:43 PM | #6 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Hi Barty,
Thanks a lot. This explains everything. A big step forward thanks to your help. I am going to study ESPN recipe closely. Kovid used "TemporaryFile" to eliminate writing to the root (or any folder that may not have permission). Hopefully "TemporaryFile" or "PersistentTemporaryFile" (example) will be the magic bullet. PHP Code:
|
01-08-2012, 12:11 AM | #7 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Only thing left is the cover. That part is even less documented on the website. More to read...
So far, the script looks like this with plenty of comments documenting what is happening. Spoiler:
Actually it uses the ESPN recipe's technique to and dump the 1st login page into the temp folder. I am actually ready to write a couple paragraph and add them into here teaching others how to solve the problem with two HTML login. |
01-08-2012, 01:53 AM | #8 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
There it is. It fetches the latest cover and add it to the ebook.
PHP Code:
|
01-08-2012, 01:57 AM | #9 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
By the way the full script is attached. If it is clean enough I would recommend adding it to the next calibre release so others can use it as well.
I also added it my ReadBeam.com account, once approved by their admin (hopefully soon) I will get my e-magazine automatically every month. Thanks you all for making it happen. Edit: Get the latest few posts bellow. Last edited by kiavash; 01-14-2012 at 04:29 PM. |
01-10-2012, 02:05 AM | #10 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Few more updates... I am documenting all of these so somebody else can use it to write a new recipe easier:
This code removes the hyperlinks as well as line breaks. You cannot fine Hyperlinks in real magazine. PHP Code:
PHP Code:
Spoiler:
|
01-12-2012, 06:08 PM | #11 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Maybe I am spending too much time with this recipe, but I had been reading MW journal for a long time and I want to be able to keep reading it as my eye sight is getting worse using my Nook (thanks to bigger fonts.)
Here a little more tweaks. I posted the latest here and on my account on Read Beam. This time all the tabs are changed to space to be homogenous with other Calibre's codes. How can I check this into Calibre's build w/o needing to recompile the whole thing? Spoiler:
|
01-14-2012, 04:28 PM | #12 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Looks like I need to make a zip file so it is included into Calibre. So here it is attached. Latest and most up to date.
Last edited by kiavash; 02-02-2012 at 12:45 AM. Reason: It doesn't work with the new site |
01-14-2012, 09:52 PM | #13 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Your recipe was included in 0.8.35 already. http://bazaar.launchpad.net/~kovid/c...journal.recipe
|
01-14-2012, 11:03 PM | #14 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Cool. Thanks.
I am going to check the recipe every month and update the script here if needed. |
02-02-2012, 12:43 AM | #15 |
Old Linux User
Posts: 36
Karma: 12
Join Date: Jan 2012
Device: NST
|
Needs update!
As Read Beam sent me the e-magazine this month, I noticed that it doesn't look right. So, I checked the site and apparently Microwave Journal had changed almost everything (removing RSS is one of them). Stay tune as I will update the recipe in the next few days to adapt the latest site changes!
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Democracy Journal recipe? | davidnye | Recipes | 3 | 02-26-2013 08:09 AM |
Recipe request: World Journal | teraflame | Recipes | 0 | 03-09-2011 01:11 PM |
New Journal of Physics recipe | chemacortes | Recipes | 0 | 01-05-2011 08:08 AM |
Poughkeepsie Journal recipe | weebl | Recipes | 0 | 12-02-2010 08:56 AM |
New England of Journal recipe | Ebookerr | Calibre | 1 | 08-26-2010 04:59 AM |