-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GDPR #152
Comments
We should be OK if we store IPs for security purposes only, even without ask the permission. But if we use the IPs (like I believe we do) to decide which Clients are fast so we can deliver Matches only to the faster clients then you need a consensus. |
Hmm, I see. The claim would be that using it to prioritize matches to fast clients would not be a "legitimate interest" because it could be achieved by alternative approaches? (This is different than showing the last generated game which can only reasonably be done that way!) I am not entirely sure here. The server could record a token (as we do with matches) and store the send time of the token in the DB. If the client replies with the token, we can measure the delay, and we don't need the IP to associate speed with a client. Because the token is random and immediately discarded it doesn't count as personal information? I'm not entirely sure of the latter, but this does seem to have better privacy properties than IP addresses. (If you need to track over multiple games to get a good speed indication, you need to keep the token longer and this doesn't seem any different from an IP to me) |
Inthe case of server Logs you have the "legitimate interest" claim do you are ok! (Even if you should really encrypt them!). |
That has got nothing to do with "legitimate interest". It's specifically allowed for "The processing of personal data to the extent strictly necessary and proportionate for the purposes of ensuring network and information security" which is a different section from legitimate interest provisions.
This does not work at all. Most users are on IPv4 and a hash of IPv4 can be trivially reversed. A keyed/salted hash cannot, but if the server must match up IPs with clients that connect, it needs to keep the key alive, which makes the hash reversible again. So hashing would achieve exactly nothing here.
It's allowed for abuse prevention, and we've used it to cull spammers from the DB. Using it for the last generated game is reasonably a legitimate interest because that feature can't be implemented in a way that doesn't store similar or more PII. I would say that it's also reasonable under "whether a data subject can reasonably expect at the time and in the context of the collection of the personal data that processing for that purpose may take place." I think there's a fair argument that the match scheduling does not pass those, though, if only because that could be implemented with random keys. The relevant difference that makes one work and the other not is that for showing the game you submitted, we need to make a link between your browser and the client you're running, and the connecting IP is the only thing that works for that. |
Would a non-IP-address-based token similar to normal web cookies require a disclaimer or other special handling? Also, anything not based on IP address will require users to somehow identify / associate with the autogtp that has been talking to the server before it can "show the last game I generated", e.g., autogtp would need to print out the token (it already prints out the task json, so no special changes are required), and the server page could have an input box for the token. |
But then I guess the autogtp patch is mandatory because in that case you do not need to save the IP of the client. |
I don't think so. If we're the only ones to ever handle it, it can't become linked to a person. The thing that made IPs work or not work was that websites can store IP->data, and the ISP can store the IP->person mapping, so they could be combined to form a data->person map. Obviously you can't store a map from the token to an IP anywhere. (This makes me realize that using the token over multiple games is probably OK anyway)
That also works, I guess. AutoGTP could generate the token on startup. |
Autogtp would tell the server the last game duration AT JOB REQUEST TIME. |
The client can lie though. Maybe we don't care about that? If we want to keep showing the latest game then we might as well use a token. |
What would be the client gain from lying? |
You would be able to force matches towards your client(s), and can then lie about the results, which is exactly what has at least happened in the past. (Does this not exactly make it an argument that throttling matches via IPs is permissible for data integrity/DDOS prevention, thus allowing their use?) |
I guess you can always do that, even now. Just generate 100 games without upload, upload them all at the same time then you are a really fast client and you get the match. Or generate random game really fast. |
Instead of using IP adresses, why doesn't the server assign a random id to a client on a first connection and store that somewhere. This way if you want to recheck the speed of a client, you can expire the id and force the client to take another one. |
The client needs to remember the ID, so it might as well generate it itself - that's what is being proposed. |
From some quick reading, it sounds like cookies / tokens / ids (any "identifiers") are "personal data" as it is possible to relate it back to an individual. I suppose at a high level, anything the server can do to show "yours" will mean the server has "personal data." So assuming that's true (i.e., the server will have "personal data"), that just means the data needs to be handled in compliance with GDPR -- it's not wrong to have personal data. I don't really know what that means to comply ;) but I guess there's some aspect of what gcp has already stated for data retention policies, but also probably needing to provide users consent / choice, e.g., "I want to be able to see my submitted games" or "I want to participate in matches" |
I don't think so? They are a problem if they are shared accross sites, because then you can construct a profile who the person is. But that does not apply for per-site IDs that aren't linked to any other personal data. For IPs, it's a problem with a single site because someone (the ISP) has the database with the mapping to a person. There was an explicit legal decision setting out that reasoning.
The problem is that "handling in compliance with GDPR" is quite a nuisance so it's better to have nothing at all (which is a philosophy I certainly like...). For one, you need explicit user consent first and foremost. |
https://www.privacy-regulation.eu/en/recital-30-GDPR.htm
The first line basically treats IP address and cookies equally as identifiers. The second line in our case relates to submitted game data associated to these identifiers. If the server can show a user "your" data, then the server has "personal data." |
The second paragraph is critical, no? "leave traces which, combined ... may be used to create profiles and identify them". The games don't help you towards identifying a person. If they would, we would have bugs :-) IP addresses do, which is why they are under discussion. A site-unique identifier that is not liked to other data does not either, as far as I can tell. |
A simple example is BRII cluster generating significantly more games than others, so it's fairly easy to identify the individual associated to that data (whether the games were associated to each other via IP address or cookie or other identifier). |
If that reasoning holds (it could be anyone with access to a lot of hardware!) I see no other solution than to either significantly expand the site features (have an option to download all games that are in the current DB, have an option delete them, have an option to disable see last game and match assignment) and have AutoGTP present an opt-in at startup, or remove those features altogether. |
I mean, if you modify AutoGTP so that it scribbles "Mardak" in the comments of every game, are you then able to claim that I'm storing personal information about you even after anonymizing the IPs? Or modify the moves such that they spell out MARDAK on the board? Now suddenly the training data has personal information too. I now need to provide you with all training data where that happened? That doesn't seem right. |
A similar issue is with encrypted data, say a file sending service allows anonymous uploading data, which the server doesn't understand, for a limited time for others to download. Does that service need to be able to produce "your data" if it doesn't even realize it has anything from you? Would be interesting to see how they deal with GDPR. My guess is that even if the law doesn't have a "best effort" type clause, a judge would maybe favor the service if it truly didn't know. I suppose other services to compare or inquire about are anonymous / account-less services. Just guessing that GDPR was primarily written in the common case for services that have the usual login / identifier because they want to have a strong / persistent connection to its users. |
We could put a disclaimer at the beginning of autogtp that say, 'by using the program you agree that your IP address is saved and use for statistical purposes inherent to the project' and then you have to really explain what you do with their IP. Also you can put in the disclaimer that the user agree, in case he/she modify the code, that any to store other additional personal data given by modified program to the server. @Mardak if the files are encrypted and the server cannot decrypt them then these are not considerate personal data. |
The problem with this is that even though this allows you to store the personal data, you are now liable to all the other obligations such as data takeaway and deletion. |
As you probably have noticed (LOL), the GDPR is now in effect in Europe. This affects us, because the server sits in Germany and we have European users.
Based on my reading of the law, the hobby and non-commercial like nature of the project exempts us from compliance. However, I believe privacy protections are generally a good thing, and we should set an example and follow best practices wherever possible and reasonable.
In the past I've rejected various enhancement proposals that would have meant storing PII on the server, so we have smooth sailing there. The only PII that is ever stored are the IPs in the server logs (like any web server!). This is allowed without opt-in for abuse tracking and defensive purposes, provided they're not stored longer than necessary. I changed the server configs a while ago to rotate and delete logs much faster (14 days). I'll probably also enable encrypting them soon. The data from the games is already cleaned out of the server on a similar time-frame (required anyway because of storage concerns!).
The IPs are also used to make the "show the last game I generated" feature work. I suspect that is OK as well - it will expire as above.
The only thing I'm not clear about is whether we need some kind of notice about the IPs in the server logs somewhere, and what it should say.
The text was updated successfully, but these errors were encountered: