Introduction
A commitment to academic ethics is essential when conducting statistical analysis as a research tool. Two pillars of such ethics are anonymizing the data and obtaining informed permission from respondents. These standards are compassionate in clinical research, where patient personal information and medical history are used as data. Understandably, not all patients are willing to provide personal data for publication, but on the other hand, clinical trials require accurate statistics and cannot be based on hypothetical values. This paper attempts to determine which of two strategies for using personal data, anonymization or pseudonymization, is the most acceptable for clinical research.
Definition of Terms
Both anonymization and pseudonymization refer to the processing of data of clinical significance used in research. Anonymization should be understood as depersonalizing medical information, avoiding the use of personal data, and presenting research-useful material in an anonymous form that completely eliminates the possibility of comparing such data to real people (MENTIS, 2019). In contrast, pseudonymizing data allows each of the data strings to be provided with a pseudo-identifier that is irrelevant to the reader but allows researchers to link the data to a specific respondent. Thus, both practices refer to data masking, but anonymization completely eliminates the possibility of matching, whereas pseudonymization leaves that possibility if additional information or a decryption key is available.
Scenario #1. Using Data Without Obtaining Consent
Obtaining consent from patients to use their data in specific studies takes time, and there is always the possibility of refusal. Both time commitment and refusal are not desirable for clinical trials. For this reason, the most reasonable practice in this scenario is to anonymize medical data so as to eliminate the possibility of linking it to real people. The use of non-consensual data is a rather sensitive topic, which means that the risk of litigation costs and reputational losses is high if research data are matched with real people. In this sense, pseudonymization does not solve the problem of eliminating this possibility, whereas complete anonymization meets the needs of clinicians.
Scenario #2. Use of Financial Data.
Patients’ financial data is also a sensitive topic, as a rare person would want to disclose their socioeconomic status and income to unknown people. At the same time, if a patient can afford to spend much money in medical facilities, then identifying him or her is associated with personal security risks. Therefore, as in the first scenario, an intelligent solution would be to use complete anonymization of the data, eliminating the possibility of identification. For the tasks of clinical trials, this approach will be sufficient since the data on insurance costs and the patient’s personal spending do not need to be accurately identified and can ultimately be used anonymously. This protects the safety of respondents and the reputation of clinicians, something that data pseudonymization practices cannot provide.
Scenario #3. The Need for Patient Connection
Some of the clinical research is aimed at solving applied problems, so the results of such research are of particular relevance to medical institutions. For example, studies of rare or fatal diseases may yield results that show the effectiveness of specific treatments or palliative care programs, so such work has not only theoretical but also tangible, practical value. The research design proposed in the scenario meets this characteristic, and so there is a need to use pseudonymization of the data. In this case, the researcher assigns a cipher to each of the respondents and publishes the data virtually anonymously. The reader will never know the patient’s identity, but the researcher always retains the ability to relate specific data to the person in a way that can be shared later and offer any of the solutions found. Anonymization would not be appropriate in this context, as in this case, even the researcher loses the opportunity to decipher the data and address the respondent directly.
Conclusion
To summarize the analysis above, it is paramount to note that both anonymization and pseudonymization are successful and practiced data masking methods. They cannot be compared in terms of effectiveness, as they are simply different strategies – their choice is motivated by the author’s need for follow-up. In the case of anonymization, there is no way to identify the patient’s identity, even for the researcher; independent parties or software can be used for this practice to anonymize the information thoroughly. In contrast, with pseudonymization, the researcher retains the ability to identify the respondent from the results of the study. For example, this is realized by using the decryption key that was previously used to encode the raw data. From the reader’s point of view, both options appear largely impersonal, but one cannot rule out the possibility that one of the readers might one day be able to figure out which patient the study is talking about. This raises threats to the personal safety and reputation of the clinician. It was shown that in two of the three scenarios, clinical data anonymization strategies were appropriate because their themes were related to data sensitivity; this involved personal information and financial information. In the third scenario, there was a need to relate the research findings to actual treatment practice, so in this case, anonymization would be useless, whereas pseudonymization would meet the needs of the researcher.
Reference
MENTIS. (2019). Anonymization vs. pseudonymization. Medium. Web.