Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-01-2018, 03:21 PM   #16
nelson1379
Enthusiast
nelson1379 began at the beginning.
 
Posts: 31
Karma: 32
Join Date: Jan 2012
Device: Kindle Paperwhite
Thanks from me as well!
nelson1379 is offline   Reply With Quote
Old 11-02-2018, 09:42 AM   #17
EMSBoys
Junior Member
EMSBoys began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: Kobo Aura H2O2, Kobo Aura
Thanks, Kovid! Back up and running, just in time for the Friday reviews.
EMSBoys is offline   Reply With Quote
Advert
Old 11-04-2018, 06:01 AM   #18
BillD
Member
BillD began at the beginning.
 
BillD's Avatar
 
Posts: 16
Karma: 10
Join Date: Sep 2010
Device: Kindle
I copied the GitHub text and loaded in Calibre and customised the non-web edition. I seem to be getting only 3 articles for many of the sections - which is unusual for a Sunday edition. Is there any parameter I should be setting to ensure I get all articles per section? Thanks
BillD is offline   Reply With Quote
Old 11-04-2018, 08:27 AM   #19
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
the new todays paper page of the NYT has only three articles in mowst sections in the HTML the rest are loaded by javascript, so the recipe does not pick them up
kovidgoyal is offline   Reply With Quote
Old 11-05-2018, 02:19 AM   #20
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
And I just committed some code to duplicate whatthe javascript is doing, so there should be more articles now. https://github.com/kovidgoyal/calibr...ef713c9070937c
kovidgoyal is offline   Reply With Quote
Advert
Old 11-06-2018, 03:55 AM   #21
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Thanks Kovid. I'm getting a lot of carbage now in the form of newsletter signups, related items, etc. Theyir new tag system seems to use a format of tags like css-<7 chars> <8 chars>
Is there a way to add to remove_tags a match where class matches re.compile(/css-.{7}\w.{8}/) or such?

Also remove_tags_after = [dict(name=['articleBody'])] seems to be failing for me which would remove all the article signups. Is something wrong with that syntax?

Last edited by bobbysteel; 11-06-2018 at 03:57 AM.
bobbysteel is offline   Reply With Quote
Old 11-06-2018, 04:15 AM   #22
BillD
Member
BillD began at the beginning.
 
BillD's Avatar
 
Posts: 16
Karma: 10
Join Date: Sep 2010
Device: Kindle
Great work - getting lots more articles now - thanks!
BillD is offline   Reply With Quote
Old 11-06-2018, 11:59 PM   #23
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
beautifulsoup supports arbitrary python functions for matching, or even regexps. Something lke:

Code:
remove_tags=[dict(attrs={'class':re.compile(r'pattern')})]
kovidgoyal is offline   Reply With Quote
Old 11-07-2018, 07:03 AM   #24
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Quote:
Originally Posted by kovidgoyal View Post
beautifulsoup supports arbitrary python functions for matching, or even regexps. Something lke:

Code:
remove_tags=[dict(attrs={'class':re.compile(r'pattern')})]
Thanks that works. But for the remove_after I'm getting a problem still - Also
Code:
remove_tags_after = [dict(name=['articleBody'])]
is something wrong w/ that where it wouldn't leave off sections after <section name='articleBody'>?
bobbysteel is offline   Reply With Quote
Old 11-07-2018, 11:05 PM   #25
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
IIRC remove_tags_after needs to be a single dictionary, not a list of dictionaries.
kovidgoyal is offline   Reply With Quote
Old 11-08-2018, 03:43 AM   #26
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Is it just me or all the headers now randomly mismatch? Each run I get a different selection of articles under each header seemingly at random.
bobbysteel is offline   Reply With Quote
Old 11-08-2018, 03:56 AM   #27
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Yes retesting with a fresh install on a clean VM, it's definitely
1) totally random in the order of article placement
2) the headings don't match up with the articles whatsoever

Each subsequent run makes a totally different order of articles. From what I can tell the articles are all being downloaded but the logic to assign the heading to the id from the JSON is off somehow. I can't easily infer by looking at the code however or else I'd check in a PR.
bobbysteel is offline   Reply With Quote
Old 11-08-2018, 05:23 AM   #28
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
that should take care of it, I donthave the time to actually run it and test, however.

https://github.com/kovidgoyal/calibr...cbfcdfe23707e2
kovidgoyal is offline   Reply With Quote
Old 11-08-2018, 03:06 PM   #29
bobbysteel
Big Poppa
bobbysteel began at the beginning.
 
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
Passes the bobbysteel regressions with flying colours thanks for this Kovid!
bobbysteel is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"The New York Times" recipe failing with error mikebw Recipes 8 10-02-2015 05:48 PM
"New York Times best-selling author" Katsunami General Discussions 72 09-07-2014 09:17 PM
"We will stop printing the New York Times sometime in the future" Soldim News 8 09-12-2010 10:37 PM
Not downloading "The New York Times - Latest News" twister Amazon Kindle 0 01-17-2010 10:51 AM
New York Times- "Microsoft and HP to Debut Courier Tomorrow" Dulin's Books News 18 01-07-2010 12:11 AM


All times are GMT -4. The time now is 07:30 PM.


MobileRead.com is a privately owned, operated and funded community.