You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the best option would probably be to add an appendix about the october data set describing the situation and the caveats from the dataset (e.g: not using a mobile UA, so some websites might have done UA sniffing and served a desktop website without mobile specific meta tags)
Then we link to that appendix rather than using the current link to the gist we have right now.
The text was updated successfully, but these errors were encountered:
We need to say if the sample is probabilistic or non-probabilistic (it's non-probabilistic because we don't know how many webpages there are on the Webs). Hence, we cannot generalize from it. However, the sample size n=78k, is more than appropriate for an exploratory analysis (cf. [1]).
Selection bias: the pages were selected by Alexia's ranking algorithm - hence we need to understand how they end up with this list... and if it's representative of "the world" (i.e., are all countries represented in the set, etc.). There may be language bias. We don't need to look at this, just acknowledge it.
We know some of the data may be bad if process with grep. I think that's about it. Or good enough to start.
[1] Reference: Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: John Wiley & Sons.
I think the best option would probably be to add an appendix about the october data set describing the situation and the caveats from the dataset (e.g: not using a mobile UA, so some websites might have done UA sniffing and served a desktop website without mobile specific meta tags)
Then we link to that appendix rather than using the current link to the gist we have right now.
The text was updated successfully, but these errors were encountered: