Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 02-27-2011, 06:13 AM   #1
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Handling of region specific web scraping

In my Goodreads metadata plugin I had a user report an issue which I traced down to scraping a numeric value. On his machine which has an OS set to English but German number settings, the value being presented back from using lxml and Calibre's 'browser' object has no period in it. In particular a rating value of say "3.42" is coming back from the html.tostring(node, 'text',encoding=unicode) call as "342".

Interestingly when he views the web page using his internet browser and sends me the html from that it displays the number as "3.42". So I suspect it is either the html.tostring() call or the browser/lxml libraries which are responsible for giving the different result back - not some sort of regionalisation on the goodreads website.

I have a crude workaround but I'm sure there must be a "proper" way of handling this which would cater for other regional number settings as well, such as commas instead of periods etc?
kiwidude is offline   Reply With Quote
Old 02-27-2011, 06:47 AM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,740
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
None that I know of. I have been frustrated by this before. For example, in France, the number "1,234.56" is written as "1 234,56".

Is perhaps the number being stored as an integer * 100? That would let goodreads avoid all the problems of converting back and forth. They get the number, divide by 100, then hand it to localization.
chaley is offline   Reply With Quote
Advert
Old 02-27-2011, 07:06 AM   #3
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Yeah, that France example is another of where it will possibly go wrong.

I don't know where in the chain it is breaking down. What I don't understand is why on both our machines our web browsers are rendering exactly the same result of "3.42", yet via the Python libraries to scrape the html it is ending up as "342" on his but "3.42" on mine.

If there isn't an obviously clever way of handling this I'll just stick with a crude approach such as stripping out all non-numeric characters and then dividing by 100.
kiwidude is offline   Reply With Quote
Old 02-27-2011, 09:45 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,853
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Try sending a Accept-Language header. In this case you can also workaround the problem by check if the number is between 10-100 then diving by 10 and if it is between 100-1000 diving by 100
kovidgoyal is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Are you region-ist? gmw General Discussions 49 12-16-2010 12:53 PM
Book prices according to region shemsha Amazon Kindle 5 08-30-2010 07:20 AM
Region and removing DRM Rumpelteazer ePub 4 10-19-2009 05:43 AM
Region Locked? heb Sony Reader 17 10-15-2007 05:06 PM


All times are GMT -4. The time now is 02:30 AM.


MobileRead.com is a privately owned, operated and funded community.