View Single Post
Old 11-22-2014, 08:43 PM   #3520
JimmXinu
Plugin Developer
JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.JimmXinu ought to be getting tired of karma fortunes by now.
 
JimmXinu's Avatar
 
Posts: 7,029
Karma: 4604637
Join Date: Dec 2011
Location: Midwest USA
Device: Kobo Clara Colour running KOReader
Development Notes

This post is about development issues. Users can safely ignore.

@cryzed - NP. I wasn't in any rush to worry about it then. I'm not now, really, but I had more spare time the last week. But I'm expecting to have less time next week; so I wanted to save the changes where you could see them and perhaps offer 'pythonic' help and advice.

(I did eventually get past the attribute issue, btw. Spinning on a generator instead of an iterator lets me process until it fails instead of failing before it processes. Not perfect, but better.)

I've checked in a new branch 'bs4' that includes the six, html5lib, and bs4 libraries, changes to allow their import in all the different run environments, and some not-ready-for-prime-time changes in a small handful of adapters to test out the bs4 changes--I don't intend to use the adapters as is. The packages are all at the top level because it makes it much easier in web engine and plugin that way.

My to-do list for this is:
Spoiler:
  • Is it worthwhile trying to make adapters that can use either bs3 or bs4? Or should each only use one? Leaning towards only one.
  • If 'either', should it be configurable per adapter?
  • If not 'either', should bs3 adapters and bs4 adapters have different parent base classes?
  • How should common code to avoid having BeautifulSoup(___,'html5lib') everywhere be implemented? Base adapter method?
  • How should stripHTML be changed to accommodate? Move to a method on base adapter? Needed at all with bs4?
  • What about non-adapters that use bs? html.py, geturls.py, etc.
  • chardet - should that library be updated and pulled up to the same level as bs4, etc?
  • AO3 adapter--<b> tags added directly as text are being treated as text.
  • bs4/html5lib do some things differently. For example, &nbsp; rather than becoming a space becomes \u00a0 -- a literally non-breaking space character. See the TtH adapter's date string.
JimmXinu is offline