Privacy risks in MyData

I wrote an essay for the course Privacy and Security for Software Systems (Leppänen & Ruohonen, 2020)

Thu, Mar 11, 2021

Context

I wrote this essay for the course Privacy and Security for Software Systems (Leppänen & Ruohonen, 2020). The assignment was to consider the possible weaknesses in the MyData initiative for data protection and privacy. The NIST privacy controls are used as a baseline to compare to. In addition, possible attack vectors in an MyData example use case are discussed. This essay not an accusation against the initiative, and I am sure the MyData authors would refute some points made. The assignment was to discuss perceived threats, so that’s the context.

Privacy risks in MyData

The MyData initiative is an infrastructure-level framework for sharing user data and consents (Poikola et al, 2015). This brief essay takes a critical look at the initiative, comparing it to the NIST privacy controls and evaluates one of the MyData reference use cases for research data banks (Poikola et al, 2015) for privacy threats.

The goal for MyData is to change the API economy from an unstructured web of proprietary services (and varying privacy stipulations) to a human-centric system where personal data is a resource the user can control (Poikola et al, 2015). This involves a service registry, a consent management protocol and service, and account provision services. The business case of the initiative is that personal data sales become transparent (and they are authorized on a MyData-operator’s platform). This would benefit the user; they could receive a monetary reward for their personal data or a service and, most importantly, they would have control. The MyData operators could get account and transaction fees from service providers or the user.

The NIST standard (NIST, 2013) provides a catalogue of privacy controls, which are safeguards that an organization should adopt to protect privacy during the whole life cycle of personal data processing. Each are defined as a short “control” text that are supplemented with further guidance and references. The controls cover a wide array of topics, which won’t be iterated here. Instead, next we will consider the MyData initiate on the basis of the NIST privacy controls (PCs), and see where they overlap or fall short.

MyData has four core operational roles: Account Owners, MyData Operators, Data Sources and Data Sinks (Alén-Savikko et al., 2016). As architecture focuses on data and authorization transfers between these parties, it’s reasonable to assume the transfers themselves are specified in a secure way. The main privacy risks reside in abusing the agreements and expectations between the different parties. Consider the PC DI-1 Data Quality in the context of a MyData data transfer from the perspective of a Data Sink. The control asks (to start with) that the organization “Confirms to the greatest extent practicable upon collection or creation of personally identifiable information (PII), the accuracy, relevance, timeliness, and completeness of that information” and “Collects PII directly from the individual to the greatest extent practicable”. Both of these conditions are hard if not impossible in a personal data trading platform; the Data Source could feed them inaccurate or invalid data with no automatic way to validate it.

From the perspective of a Data Source, it’s hard to define a privacy policy for arbitrary Data Sinks. Consider the PC TR-1 Privacy Notice that asks the Source “Provides effective notice to the public and to individuals regarding: (i) its activities that impact privacy, including its collection, use, sharing, safeguarding, maintenance, and disposal of personally identifiable information (PII) […]" and “whether the organization shares PII with external entities, the categories of those entities, and the purposes for such sharing”. Since the MyData links are initiated from the Account Owned using standardized transfer protocols, there’s no way to enumerate the possible Sinks to a privacy policy. In MyData the responsibility is on the user, but they can be mistaken or deceived, and it’s not clear that the Data Source would not be at fault.

These examples are just an illustration how hard it is to make these kind of data sharing platforms data protection friendly. The MyData initiative is done in good faith and it’s possible these issues are not blockers; it just relies on upholding agreements between the MyData Operators and other parties. The NIST PCs generally concern a single organization, and thus they don’t take into account that other parties may not follow the same procedures. Controls such as AR-4 Privacy monitoring and auditing are not useful if they only concern one party in a web of MyData data flows.

Finally, let’s consider the MyData reference use case 3 (Poikola et al, 2015) for research banks. Here, they describe a MyData operator with three external parties: Health Data Sources (HDS), a Data Anonymization Service (DAS), and Researchers. The Researchers would get anonymized data from the DAS who, in turn, gets it from HDS in plain form. Granted, the example is not elaborated fully, but from the description it seems fraught with risks. Let’s evaluate here some of the major ones. First, the HDS would have to register themselves as sources in MyData. There is no guarantee who will be the Sink, so an Imposter Data Anonymization Service is a threat. Even with good faith DAS, the possibility of aggregating N sources to a research data set makes it hard to guarantee anonymization and there’s is a high risk of a mistake. Simply the fact that medical data is available in the public web behind the authorization of some MyData operator increases the threat surface. A compromised MyData operator could issue fake consents from any source to any sink. I could go on; the fundamental mistake in the example case that non-anonymized health data should possibly be available online is hard to remedy with consent controls.

PS

If you read this and liked it, this post is another in this “series” of essays from my university. I’ll try to post another essay from the current course as well.

References

Joint Task Force Transformation Initiative: Security and privacy controls for federal information systems and organizations. NIST Special Publication 800(53), 8–13 (2013)

Poikola, A., Kuikkaniemi, K., Honko, H.: Mydata - a nordic model for human-centered personal data management and processing. Finnish Ministry of Transport and Communications (2015)

Alén-Savikko, A., Byström, N., Hirvonsalo, H., Honko, H., Kallonen, A., Kortesniemi, Y., Kuikkaniemi, K., Paaso, T., Pitkänen, O., Poikola, A., Tuoriniemi, S., Vainikainen, S., Yli-Kantola, J.: Mydata architecture: Consent based approach for personal data management. Tech. rep. (2016). https://doi.org/10.5281/zenodo.17628