Using kittens to unlock photo-sharing website datasets
Mining photo-sharing websites is a promising approach to complement in situ and satellite observations of the environment, however a challenge is to deal with the large degree of noise inherent to online social datasets. Using the Flickr application programming interface I queried all the public images metadata tagged at least with one of the following words: “snow”, “neige”, “nieve”, “"neu” (snow in French, Spanish and Catalan languages). The search was limited to the geotagged pictures in the Pyrenees area. However, the number of public pictures available for a given time interval depends on several factors, including the Flickr website popularity and the development of digital photography. Thus, I also searched for all images tagged with “chat”, “gat” or “gato” (cat in French, Spanish and Catalan languages). The tag “cat” was not considered in order to exclude the results from North America where Flickr got popular earlier than in Europe. The number of “cat” images per month was used to fit a model of the number of images uploaded in Flickr with time. This model was used to remove this trend in the numbers of snow-tagged photographs. The resulting time series was similar to a time series of the snow cover area derived from the MODIS satellite over the same region.
Attachment: snow_cycle_flickr_MODgapfill.png (42 KB)
Of course, the underlying assumption is that the probability for a social-network participant to talk about cats is invariant with time. Depending on the evolution of the demographics of the social networks this may be untrue. But it does sound reasonable that cat pictures, at least, don't depend on season.
Apart from that, I would like to ask, are you limiting the cat searches with the same geotagging criterion? I would expect so (to get a real "in situ" measurement) but this doesn't seem to be the case, from your remark on North American and Flickr.
Andrea Giammanco · 20 Jan, 2016
Thank you for your comments! I decided to remove the geotagging criterion from the "cat query" because most of the geotagged pictures before 2012 were outdoor photographs (typically SLR cameras with built-in GPS). So I found that there remains a seasonal component in the search result if I restricted to geotagged photos (although less marked than for the snow). This may be different now, because I assume that most photographs that were uploaded in photo-sharing websites recently were taken with smartphones, which do not rely only on the GPS for geolocalisation.
Simon Gascoin · 27 Jan, 2016
I posted some more details here : http://www.cesbio.ups-tlse.fr/multitemp/?p=7317
Simon Gascoin · 13 May, 2016