A recent letter from researchers at the Mayo Clinic to the editor of The New England Journal of Medicine outlined a new challenge in de-identifying, or preserving the de-identified nature of, research and medical records.[1]  The Mayo Clinic researchers described their successful use of commercially available facial recognition software to match the digitally reconstructed images of research subjects’ faces from cranial magnetic resonance imaging (“MRI”) scans with photographs of the subjects.[2]  MRI scans, often considered non-identifiable once metadata (e.g., names and other scan identifiers) are removed, are frequently made publicly available in published studies and databases.  For example, administrators of a national study called the Alzheimer’s Disease Neuroimaging Initiative estimate other researchers have downloaded millions of MRI scans collected in connection with their study.[3]  The Mayo Clinic researchers assert that the digitally reconstructed facial images, paired with individuals’ photographs, could allow the linkage of other private information associated with the scans (e.g., cognitive scores, genetic data, biomarkers, other imaging results and participation in certain studies or trials) to these now-identifiable individuals.[4]

The problem, the Mayo researchers note, may be one without a straightforward solution as existing software for the removal or blurring of faces in MRI scans is infrequently used because these methods can degrade the quality and research value of the scans, and sometimes still do not fully prevent re-identification.[5]  While the Mayo Clinic researchers’ findings could be described as only a “proof of concept” (the researchers used a limited set of possible photograph matches), it is not difficult to imagine more powerful facial recognition tools (which are known to be deployed and developed by both commercial and state actors) potentially being brought to bear to match MRI scans and associated health information to the huge swath of identifiable/identified photographs that proliferate as a result of widespread usage of social media and video and photographic surveillance.

In the research context, the Federal Policy for the Protection of Human Subjects (a/k/a the “Common Rule”) sets forth rules governing “human subjects research” which, with limited exceptions, is any federally funded research where an investigator (i) obtains information or biospecimens through intervention or interaction with an individual, and uses, studies, or analyzes the information or biospecimens; or (ii) obtains, uses, studies, analyzes, or generates identifiable private information or identifiable biospecimens.[6]  Identifiable private information is private information for which the identity of the subject is or may readily be ascertained by the investigator or associated with the information.[7]  An identifiable biospecimen is a biospecimen for which the identity of the subject is or may readily be ascertained by the investigator or associated with the biospecimen.[8]

Further, in the health care context, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) governs the use and disclosure of “protected health information,” which is essentially any identifiable health information.[9]  There are two methods under HIPAA to de-identify information (statistical analysis and removal of specific identifiers), but both hinge on there being “no reasonable basis to believe that the information can be used to identify an individual.”[10]

Whether a research study involves identifiable private information or biospecimens implicates the need to obtain a patient’s written informed consent to a study and the Common Rule provides significantly more flexibility to researchers investigating non-identifiable private information and biospecimens.  Similarly, the stringent requirements of HIPAA, applicable to “covered entities” and their “business associates,” are triggered by whether the health information to be accessed, used or disclosed is identifiable.  Notably, both regulatory schemes employ flexible definitions of what is “identifiable.”

The general concern raised by the Mayo Clinic findings, namely that information or biospecimens that once may have been considered de-identified or non-identifiable can become identifiable through the advance or new application of technology, are not unique to this recent facial recognition analysis finding.  For example, the prevalence of faster and cheaper genetic sequencing technology was at least one reason for recent changes to the Common Rule allowing researchers to obtain a more flexible “broad consent” from research subjects as such consent relates to the storage, maintenance and secondary research use of identifiable private information or biospecimens.[11]  Misuse of genetic information also prompted legislators in the United States to act over a decade ago to prevent the use of such information in certain discriminatory manners.[12]

Policy and rule changes, however, do not change the fact that the Mayo Clinic findings confirm that advances in, and the proliferation of, technology (like facial recognition software) paired with the widespread use of algorithmic-based data science will create ever more privacy “attack vectors.”  Researchers, clinical sponsors, institutional review boards (IRBs) and health care providers should be both forward thinking in their collection, storage and dissemination of potentially identifiable information and biospecimens and reactive when changes in technology, law or regulations require action on their part to protect individuals’ privacy.

[1] Christopher G. Schwarz, Ph.D., et al., Letter to the Editor, Identification of Anonymous MRI Research Participants with Face-Recognition Software, 381 New Eng. J. Med. 17, 1684 (2019), https://www.nejm.org/doi/full/10.1056/NEJMc1908881.

[2] Id.

[3]Gina Kolata, You Got a Brain Scan at the Hospital. Someday a Computer May Use It to Identify You. N.Y. Times, Oct. 23, 2019, https://www.nytimes.com/2019/10/23/health/brain-scans-personal-identity.html.

[4] Supra n.1.

[5] Id.

[6] See 45 C.F.R. § 46.102.

[7] 45 C.F.R. § 46.102.

[8] Id.

[9] See 45 C.F.R. § 160.103.

[10] See 45 C.F.R. § 164.514.

[11] See Celia B. Fisher and Deborah M. Layman, Genomics, Big Data, and Broad Consent: a New Ethics Frontier for Prevention Science, 19 Prevention Science 7 (2018), https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6182378/pdf/11121_2018_Article_944.pdfsee also 45 C.F.R. § 46.116.

[12] See The Genetic Information Nondiscrimination Act of 2008, Pub. L. No 110-233, 122 Stat. 881 (2008).