10-28-2010, 02:50 PM | #61 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i can parse page 1 no problem. is the page that comes up automatically.
i cant get to page 2. and this is as far that my code gets to: Spoiler:
|
10-30-2010, 05:15 PM | #62 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
ok i'm back
cleared my head a bit and i want to dive back in.
do you have any ideas about why i can get to page 2? |
Advert | |
|
11-01-2010, 11:54 AM | #63 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Switching back to this thread.
You asked if I had any thoughts - I did - I wondered what the maya site was all about! That's what I shared. OK, yes, I thought about it a bit. I still think you need to simplify - get your recipe to pull one page - page 2. If you can't do that, you'll get nowhere. We can talk about the simplified recipe issue, but it's less interesting. Next, let's talk about get_browser. When you initialize with br = BasicNewsRecipe.get_browser() You start a browser session that's used from then on. Normally, you go to one or two pages to set up the login, store cookies, get header info for authentication, etc. From then on that browser session is used. If you retrieved the right cookies, set up login, etc. it all works. You want to do something a bit different. You don't want the same thing every time (authentication header sent each request or cookies from login stored for each request) You want to do a POST that differs for each multiple page. I think you're creating the POST data, but I'm not convinced you've looked at the site closely enough to be sure of how it works for each step. I know I haven't (and don't plan to - sorry - but this site is not of general enough interest for its complexity). Basically, I'd be looking more closely at the first interaction inside Firefox. Suppose you clear the cookies and cache, turn on TamperData and request page 2 before you request page 1. Can you get it? If not, can you get it after getting page 1? Is there any requirement for getting any other page first? Any referer requirement? It's very easy to get confused when using FireFox if it collects cookies that you aren't thinking about, or sends referer info, etc. The bottom line is I always make sure I know the whole detailed interaction in FireFox, then reproduce what it did inside the recipe, or reproduce the recipe function inside FireFox until they match and are doing the same thing. I've never had that fail, but I've often been confused and thought I was seeing the same thing in each, but was wrong. Eventually the difference gets tracked down and the recipe begins doing what I see happening in FireFox. |
11-01-2010, 05:10 PM | #64 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
good answer!
ill get on it. this next question is a bit off the recipe topic. how do i create a POST request from thin air in tamper data (if i clear the cookies and cache, then turn on tamper data, where will the post come from)? also, page one comes up just by entering the site, how do i skip to page two right away? and a final question, tamper data has a friend. http somthing. should i try and use that? Last edited by marbs; 11-01-2010 at 06:29 PM. |
11-02-2010, 05:40 AM | #65 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
I now have page two and three and so on
i cant belive i got there. thank you very much Starson!
now i am not sure what i do with all the new pages i can get. how do i finish append_page i dont see it returns anything in this example or any of the others. some more help? edit: it seems like br.follow_link does not actually open a page in the browser, it gets the responce, but i dont know how to have br. have the new page in it. is there a way to open the link or read the response somehow as a web page in the browser? Spoiler:
Last edited by marbs; 11-03-2010 at 02:22 AM. |
Advert | |
|
11-03-2010, 03:41 PM | #66 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Congratulations!
Quote:
self.append_page(soup, soup.body, 3) It's recursive, and grabs the current page in soup form from the "soup" parameter of the article being processed in preprocess_html. That page will have a "Next Page" button or equivalent, and when append_page is correctly written, it creates a new url from the url in the "Next Page" button, grabs the content of that new page, tacks it on to the bottom of the content in the current page, then recursively does it again, finding rhe "Next Page" button on page 2 to go to page 3, etc. Quote:
|
||
11-03-2010, 04:30 PM | #67 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
are you sure it needs to be used in preprocess_html and exactly as self.append_page(soup, soup.body, 3)?
|
11-03-2010, 04:56 PM | #68 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I don't think I said it did. You can use it anywhere you have a soup.
Quote:
|
|
11-06-2010, 04:10 PM | #69 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
do you know of any way to run java scripts in python?
do you think Kovid would be willing to build in a tool like that? edit: ran a short search, maybe i tool like this? Last edited by marbs; 11-06-2010 at 04:17 PM. |
11-06-2010, 07:47 PM | #70 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I haven't done it. Mechanize is used by the recipe code, and I know it won't do it. I was trying to do it at one time, and looked at python-spidermonkey a bit, but decided I could just emulate what the js was doing.
https://github.com/davisp/python-spidermonkey Quote:
|
|
11-09-2010, 04:09 PM | #71 |
Zealot
Posts: 122
Karma: 10
Join Date: Jul 2010
Device: nook
|
i am not sure where to post a question to Kovid, so i hope you see this.
i wanted to know what the chances are on getting JS support for recipes? maybe this? i haven't read it really, but i am sure python can support JS. and what is the chance is for getting support for pdf articles. i know you said that it is a printed book and not a book, but with out understanding anything about it, i feel that it might be possible to have pdf articles to pdf outputs skip the conversion in the middle and just be included in the end news feed somehow? |
11-09-2010, 05:42 PM | #72 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Adding a javascript interpreter to python is harder than it sounds, or I would have done it already. And using QtWebKit is out of the question as it requires an X server to run, which means that the news download system would no longer work on headless servers.
As for PDF, no that's not possible. |
11-10-2010, 02:23 PM | #73 | |
Update
Posts: 100
Karma: 212
Join Date: Nov 2010
Device: kindle DX graphite, we need update/firmware with optios PDF Kindle 3
|
hi, I have one question general about PDF, a little near offtopic here, but... i don´t see where insert this:
What would be possible in future to implement this option:? when "calibre" is running a recipe, if the link is a PDF file link, and meets size specifications and conditions listed in the recipe code, to be discharged(donwload), "as a book outside" the ereader, "showing" at the out recipe text, "PDF name: "abcd.pdf" file sent to the player ... Code:
rustic algorithm example: ;-) -- read "html rss" (Text in RSS: "bla bla bla "...) find if pdf file if pdf file is <500 kb?, sending / donwload "library calibre", and preparing to send to the player (in my case Kindle DX). end if end if -- ---- (the other option (Open the pdf and insert its contents into the file of the recipe can be impractical, it would be a lot of data to be processed and the recipe would run very slowly... actually do that would make many books as pdf, as a function of the pdf's found, the only practical way I see to do something, it would be if "might" open a pdf (link pdf rss) and extract only the first section to be incorporated in the recipe, but it´s some slowly and big recipe size, too?, maybe... After the caliber "convert" pdf to ebooks, but in a separate process, I fear Keep in mind that some ereaders have native support pdf, (albeit a bit rustic and uncouth, it at least serves to read some info) in the example given in the first place, may be the case get a recipe ebook rss news and externally several pdf's that are sent to the reader at a time ... that meet some requirements, size, mask name, etc ... (Since there is no need to open them, just download them in certain cases ).... Quote:
(sorry for the English, coordination and verbal semantic disorder, but it has been a considerable effort trying to explain a little I had to separate the phrases in paragraphs loose and see if you see what I want to express. by the way, congratulations on such a program again) best Regards from Spain. Last edited by KRorschachZ; 11-10-2010 at 02:41 PM. |
|
11-10-2010, 02:59 PM | #74 |
creator of calibre
Posts: 43,866
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I personally am never going to implement that feature (it's way too much work as it involves monkeying with lots of the internals of how conversion works in calibre), but if you are sufficiently motivated, I'll be happy to get you started implementing it.
|
11-10-2010, 03:22 PM | #75 | |
Update
Posts: 100
Karma: 212
Join Date: Nov 2010
Device: kindle DX graphite, we need update/firmware with optios PDF Kindle 3
|
Quote:
only indicate that "in the first case" is not necessary the conversion of pdf files found (or other interesting extensions, such as mobi, epud, etc), "only" downloading according to the recipe found in, in any case, I depending on the configuration of "Calibre" is possible that this is complicated ... I was intrigued about the ability of "Calibre" to communicate with the main program, while making a recipe, that should be a separate process, and if there are commands in the language of the recipe to implement something. ("Download file", "size analysis", "save to send to library", etc) (Obviously the second option, the online conversion of PDF's to integrate part of the recipe out is quite expensive computationally (clock´s cpu expend) and long ... time ... it would be like merging several small books ...) (Even if it could indicate a maximum number of characters output, as it limits the size of the files converted, a solution, to get "part" of information from the files ... (This would be like an RSS of a PDF, mobi, epub, XD, ...), in a recipe, yet, I think this part take considerable time to the recipe ...) (Whenever there are more pages info about reviews books in electronic format with links, they could be automatically downloaded to calibre-recipes ...) so I thought that maybe "giving" the possibility of downloading ... without having to analyze internally ... ok. (We See what the community believes, maybe do I should create a thread with this issue...?) best regards, from Spain... Last edited by KRorschachZ; 11-10-2010 at 03:33 PM. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New recipe | kiklop74 | Recipes | 0 | 10-05-2010 04:41 PM |
New recipe | kiklop74 | Recipes | 0 | 10-01-2010 02:42 PM |
New Title from Book View Cafe: A Princess of Passyunk by Maya Kaathryn Bohnhoff | suelange | Self-Promotions by Authors and Publishers | 0 | 08-11-2010 04:35 PM |
Recipe Help | lrain5 | Calibre | 3 | 05-09-2010 10:42 PM |
Recipe Help Please | estral | Calibre | 1 | 06-11-2009 02:35 PM |