Genes Wide Open: Data Sharing and the Social Gradient of Genomic Privacy

Tobias Haeusermann, Marta Fadda, Alessandro Blasimme, Bastian Greshake Tzovaras, Effy Vayena. AJOB Empirical Bioethics. Volume 9, Issue 4, 2018.

Since the mid 2000s, direct-to-consumer genetic testing (DTC-GT) companies have made genetic information available to private individuals outside of research or clinical settings. These companies claim that their services empower consumers with personal information to which they are entitled and that they are able to use for a variety of personal purposes—including, but not limited to, gaining better control over their health, learning about their ancestry, and contributing to the development of medical knowledge (Curnutte and Testa 2012). To date, DTC-GT companies have analyzed and stored genetic data from millions of individuals and represent a rapidly growing segment of the DNA testing industry (IBISWorld 2017; Kaiwar et al. 2017). DTC-GT companies such as 23andMe (23andme 2015) monetize only aggregate genomic data. Large pharmaceutical companies have bought access to such data sets. Collaborations on specific projects between these companies and academic researchers are also being reported.

Numerous concerns about the concept of DTC-GT have been debated in the literature and by regulators (Covolo et al. 2015; Hall et al. 2017), including, among others, lack of medical supervision and genetic counseling, inadequate informed consent, questionable analytic and clinical validity, risks of misdiagnosis and overdiagnosis, negative impacts on family members, and on public health (Badalato, Kalokairinou, and Borry 2017; Hogarth and Saukko 2017). As a consequence, DTC-GT has given rise to regulatory attempts at limiting the availability of such services (Borry, Cornel, and Howard 2010; Borry et al. 2012; Curnutte and Testa 2012). For instance, although 23andME (23andme 2018) previously offered tests on hundreds of gene variants, the Food and Drug Administration (FDA) first warned the company to cease offering such services, and then cleared it to market only 10 genetic tests revealing reliable, clinical-grade information about the risk of developing diseases (Curnutte 2017). Yet some scholars argue that DTC-GT offers important opportunities to exercise personal choice, cultivate autonomy, or attain other personal objectives (Chung and Ng 2016; Roberts et al. 2017; Vayena 2014).

What information should be available to consumers and how they will respond to information they receive are still matters of debate, but in most cases DTC-GT consumers have access to their raw data sequences. Consumers may thus choose to share their raw data with others or on online platforms such as openSNP, DNA.Land, and Open Humans. In principle, one of the most important benefits data sharing can bring to scientists is the availability of large data sets for the study of human disease (ACMGBoard of Directors 2017; Ball et al. 2014; Blasimme et al. 2018; Scudellari 2010; Vayena and Blasimme 2018). According to proponents of this model, openly sharing genomic data will allow researchers to uncover linkages across millions of samples and lead to tangible advances in medicine (van Schaik et al. 2014; Vayena et al. 2016). By advancing citizens’ active search for scientific knowledge and improving the public’s genetic literacy, proponents of open data sharing in genomics contend that an increased involvement in science for ordinary citizens might follow (Angrist 2009; Vayena 2014; Wicks, Vaughan, and Heywood 2014). Open sharing of genomic and phenotypic data, then, could not only serve as a primary research tool for science (Ball et al. 2014; Scudellari 2010), but also pave the way for participant-driven research initiatives (Swan, 2012; Woolley et al. 2016). Growing data-intensive research illustrates the need for and value of globally accessible data (Shabani, Knoppers, and Borry 2015). An investigation conducted by the DNAdigest identified four bottlenecks for data sharing, which include finding relevant and usable data (data discovery), obtaining authorization to access data, formatting data, and storing and moving data (van Schaik et al. 2014). According to open data sharing platforms, making DNA data publicly available may help researchers avoid these very bottlenecks.

However, the nature of genomic information and its possible uses and misuses create relevant privacy challenges. Single-nucleotide polymorphisms account for the vast majority of variation in the human genome (Wang et al. 1998) and have a determinant influence on an individual’s physical traits, disease risk, and capacity to respond to environmental factors (Sachidanandam et al. 2001). Since genetic data provide information on key characteristics of individuals, disclosure or misuse of data can lead to serious harm, ranging from embarrassment to stigmatization, abuse, and potential discrimination in employment, insurance, or education (Annas and Elias 2015; Wang et al. 2016). In the field of education, for instance, a child can be denied access to a school based on genetic information (Levenson 2016) or based on presumed correlations between genetic variants and cognitive performance (Haga 2009; Novas and Rose 2000; Nuffield Council on Bioethics 2002). An alleged case of genetic discrimination in education was reported in 2012, when a California public school district was accused of illegitimately denying attendance to a middle-school student based on the results of genetic screening tests for cystic fibrosis (CF) that he underwent as a newborn (Levenson 2016). The child’s parents claimed that the school inappropriately ordered the child to change schools following a complaint by the parents of two other students with CF who were concerned about the risk of cross-infection with bacteria particularly harmful to people with CF (Levenson 2016).

Given the relevance of such risks, privacy is critical for preventing the misuse of genetic information. Since a handful of genetic markers is sufficient to distinguish any two individual DNA sequences from one another (Collins and Mansoura 2001), genetic privacy is particularly difficult to ensure. Genetic privacy refers to the idea that everyone should enjoy protection of his or her genetic information from unauthorized collection, processing, use and distribution, and that certain uses of genomic data must be forbidden because they impact data subjects in ways that are considered unjust, unfair, or outright discriminatory (Annas and Elias 2015; Delgado, Lorente, and Naro 2017; Erlich et al. 2014; Gostin and Hodge 1999; McGonigle and Shomron 2016; Rothstein 1998; Shen and Ma 2017).

Given the likelihood that genetically related people share relevant DNA variants, genetic privacy has implications for family members as well as for individuals (Annas and Elias 2015). Protecting genetic privacy, then, is arguably more complex than safeguarding other types of data. Moreover, some of the techniques usually employed to ensure data privacy are not entirely applicable to genomic data. Anonymization is a case in point. While it is theoretically possible to strip any genomic data set of all personally identifying information—thereby rendering the data anonymous—the combination of genetic variants on any individual human genome is unique to that individual. Gymrek and colleagues, for instance, showed that by crossing anonymized genomic information with publicly available data, any given genetic sequence can be reidentified (Gymrek et al. 2013). Also, the majority of current techniques to prevent unauthorized disclosure of genomic information limit what researchers can learn from the data (Enserink and Chin 2015). These issues create conspicuous privacy concerns regarding genomic data.

Despite the serious harms linked to the misuse of genetic data, some individuals decide to publicly share their genomic data, obtained through DTC-GT companies, on online platforms that offer no privacy protection and no ethical oversight mechanisms (Francis 2014; Vayena, Mastroianni, and Kahn 2012; Vayena and Tasioulas 2013a; 2013b). This article reports on the findings of a qualitative study of openSNP users, an online nonprofit platform financed through an ongoing crowdfunding campaign and managed by a small group of volunteers. OpenSNP allows individuals to upload their DTC-GT results along with phenotypic annotations about themselves (Vayena 2014). As a result, genomic and phenotypic data are freely and publicly accessible, thus granting unrestricted access to any third party. A previous study analyzed individuals’ motivations for being tested and sharing the results online (Haeusermann et al. 2017). Here, we focus more specifically on how individuals who share their genomes in this way explain their experience with openness and their attitudes toward privacy.

Subjects and Methods

Recruitment of the Participants

This article presents a follow-up investigation of a study conducted between December 2015 and January 2016 with 550 openSNP users. We conducted semistructured, open-ended interviews to gather more in-depth insights on the attitudes, thoughts, and actions of a select group of openSNP users who had filled out the questionnaire used for our previous quantitative study (Haeusermann et al. 2017). To recruit interview participants, we initially applied sequential mixed-methods sampling. At the end of the questionnaire, respondents were asked for permission to be contacted for a follow-up interview at a later date. Of the 550 users returning the questionnaire, 196 users indicated willingness to participate. Subsequently, stratified purposive sampling (quota sampling) was employed to recruit suitable interview participants among the 196 volunteers. The stratification was based on (1) age, (2) gender, (3) geography, (4) participants with or without offspring, and (5) participants with or without previous or current professional experience in genetic research. We calculated a quota for each stratum. We continued to invite participants until the quota for each stratum was met, in order to gain access to a sample that was representative of, and proportional to, the survey respondents. The final sample included 13 participants who were contacted by a member of the research team via e-mail. The e-mail invitation included a description of the study and reference to its ethical safeguards, including anonymity and confidentiality. No financial incentives were offered for participation. Once they confirmed their availability for the interview, participants were directed to an online consent form to be signed before the interview. We recruited our participants over a 2-month period during May and June 2017.

Data Collection

The interviews were conducted in English via Skype by a member of the research team between July and September 2017, and lasted between 35 and 90 minutes. The interview guide consisted of predetermined open-ended questions prompted to elicit information on participants’ interest in genomics, their privacy concerns, and their decision to share the results. The format resembled an in-depth conversation, which granted the participants ample liberty and time to describe their experiences and thoughts. At the same time, it allowed the researcher to ask follow-up questions. We followed Holstein and Gubrium’s (1995) “active interview” model, which sets the interviewer and interviewee as equal partners, who both construct meaning around an interview event (Holstein and Gubrium 1995).

With permission, all interviews were digitally recorded and then transcribed verbatim by a professional transcriber with no prior knowledge of the study’s research question. A member of the research team later cross-checked the recorded interviews to guarantee accurate documentation of the discussion and removed any identifying information. To determine the point of “data saturation,” we relied on the concept of information power developed by Malterud and colleagues (Malterud, Siersma, and Guassora 2015). Information power indicates that the more information relevant to the actual study the sample holds, the fewer participants are needed (Malterud, Siersma, and Guassora 2015). This way, we looked at data saturation not in its traditional sense (i.e., thematic redundancy) but rather as capturing powerful reports about a rather unique set of participants. Information power depends upon five elements: (a) a narrow study aim, (b) the specificity of the experiences included in the sample, (c) the use of established theory regarding the topic of the study, (d) the quality of dialogue during the interview, and (e) the limited scope of the analysis. Our study fulfills all these criteria. We explored a niche phenomenon, voluntary public sharing of genetic data through an online platform, within the much broader area of genetic testing. Our target participants share a specific kind of behavior with respect to their genetic data: All had publicly shared their genetic data online on the openSNP platform. Quota sampling allowed us to obtain a representative sample of participants from a highly specific group of individuals. We relied on established theory on privacy, and genetic privacy in particular, in the design of the study, as well as in the analysis of the results. We allowed ample room for open, undirected discussion during the interviews. Finally, we limited the analysis to privacy-related narratives in a selected group of participants. Therefore, while the overall size of our sample is relatively limited, it confers sufficient information power to our study.

Data Analysis

Although some earlier studies have also investigated the motivations and attitudes of individuals toward DTC-GT and genetic and genomic data sharing (Ball et al. 2014; Brown Trinidad et al. 2010; Cheung et al. 2016; Critchley, Nicol, and Otlowski 2015; Lemke et al. 2010; McGuire et al. 2008; Oliver et al. 2012; Vayena 2014; Wallis, Rolando, and Borgman 2013), this is the first study to interview individuals who have decided to share their data publicly without any institutional oversight. We analyzed the interviews by thematic coding analysis, and merged ideas topically (Thomas 2003). The thematic analysis led us from surface codes to deeper level themes. This method allowed us to engage with the data in a more active way, opening up new themes that may not have been covered by established theory (Bryman 2015; Matthew and Sutton 2011).

The coding process followed Braun and Clarke’s (2006) comprehensive thematic analysis approach. To ensure internal validity, two members of the research team coded all 13 transcripts independently. Key phases of the coding process included marking meaningful text regardless of its length (such as expressions, sentences, or full paragraphs) and then condensing the data into key codes. The two researchers shared their codes with each other multiple times, compared them, and discussed discrepancies. The final stage involved condensing the data, linking the themes and code families, and using noteworthy quotations to highlight the frequent similarities, dominant differences, and significant contradicting themes.


Participants’ Characteristics and Experiences with Open Data Sharing

Of the 13 participants who completed the interview, 7 were from the United States, 2 from Canada, 1 from the United Kingdom, 1 from Australia, 1 from Switzerland, and 1 from Russia. About half of the participants were women (n = 7); 7 did not have children; and many had no previous scientific involvement in genetic or genomic research (n = 9). Participants’ ages ranged from 25 to 68 years (M = 51.23; SD =13.89). The majority of the participants were native English speakers (n = 11), while 2 spoke English as a second language. These sociodemographic characteristics were self-reported by the participants.

In general, participants reported awareness that, unlike for the majority of personal data types, the decision to publicly share genomic data not only affects the individual making the choice but bears potential consequences for family members and future generations and could unveil significant glimpses into a family’s entire health and genealogical history. However, only two participants talked to their family members before openly sharing their genomic data and, in some form, asking for permission. All other participants either informed their family after sharing or never mentioned it to them. In two cases, some family members were explicitly against their sharing their data. Others mentioned that their family members were skeptical about data being shared due to fears that they would unveil genetic links with minority ethnic groups.

Articulating Privacy

Unavoidable Exposure

First, we probed participants’ views about privacy and the risk of unauthorized access to personal information. About half of the participants perceived privacy as “dead,” indicating that it is now impossible to keep any information safe. According to these participants, privacy can be breached so easily that their personal data would not be private even if they did not intentionally share them. These participants described privacy as an “illusion,” explaining that hackers can easily gain access to any kind of information, even institutional, such as government files. They also stated that since insurance companies require their personal information, protecting one’s privacy is an impossible task.

Privacy in the digital age. OK, well, that’s a little broader topic. I can summarize it in three words: privacy-is-dead. And I have a background in, years ago I did computer and networking stuff, pretty low level, but enough to kind of know how computers work. And I just don’t believe the only way you could keep information really private in the digital age is have it on a computer that is completely disconnected from all networks. […] So, I think that worrying about privacy in the Internet age or in the digital age is really closing the barn door after the horse has already gone. (participant #4, male, United States, no scientific involvement, no children)

They’ve hacked into the personnel files in the government and they’ve hacked into all sorts of things. I don’t have any illusions that my information is totally private. (participant #12, female, Australia, no scientific involvement, no children)

And I think […] you’re still, for life insurance for example, you’re still required to tell them about any conditions you have anyway. So, if you don’t, you’re lying and your policy isn’t valid anyway. And so, this kind of … Like, oh what if we’re discriminated for insurance purposes, you’re still compelled to give that information to them anyway. The arguments against sharing weren’t super compelling and it felt nice to share. (participant #3, male, United States, with previous scientific involvement, no children)

Other participants felt they had lost their capacity to control certain information about themselves with respect to third parties due to special circumstances in their lives that allowed medical, insurance, legal, and governmental institutions access to their data.

I think possibly because I’d already had the genetic testing done for medical reasons. So, the information was already going to be available to insurance companies or people like that. I’m already on disability as well so it wasn’t going to affect insurance because it was already affected. So, I’d already ruled out several of the main risks I was concerned about. So, it seemed more useful to share it. (participant #10, female, Canada, with previous scientific involvement, parent)

I don’t know, I’ve unfortunately been arrested in the UK, so my DNA is already written on the police files. So, I don’t think it’s going to make much of a difference. (participant #5, male, United Kingdom, no scientific involvement, no children)

I spent my entire childhood with my entire family being under a federal investigation, thus my concept of privacy is a little different from most people. (participant #7, female, United States, no scientific involvement, no children)

Finally, one participant decided to share her genomic data publicly because her former occupation had already required her to share her private data:

I was in the military […] so my fingerprints are already at the FBI. I felt like using privacy as an excuse to keep from sharing is like kind of cutting off my nose to spite my face. (participant #9, female, United States, no scientific involvement, parent)

The Social Determinants of Genetic Privacy

When discussing privacy risks linked to publicly shared genomic data, almost all participants expressed little concern for the possibility that those risks would affect them directly. For some, not having to worry about privacy when openly sharing their genomic data was a privilege that stemmed from living in countries where discrimination based on genomic information is legally prohibited. As one participant mentioned:

Maybe because I’m in Canada, and not the States, and I don’t have to worry about not being able to get health insurance because of a preexisting condition. Also, we’re a healthy family. And I don’t feel I need to worry if anything important were to show up in my genetic data, as I don’t think that would harm me. (participant #11, female, Canada, with previous scientific involvement, parent)

Other participants perceived themselves at low risk of specific types of privacy-related harm because they did not belong to vulnerable social groups, such as ethnic or sexual minorities, that are more exposed to discrimination. As one participant reported:

I think I would not be so open about it if I was a person of color or a transgender or … So, all of what I said before, I said in full awareness that I am an able hetero white male. (participant #13, male, Switzerland, no scientific involvement, no children)

Socioeconomic factors such as income emerged as crucial determinants of one’s privacy-related risks. Participants reported that wealthy individuals are less likely to perceive privacy-related risks and are therefore more inclined to share their private data. However, this difference between socioeconomic groups means that research data from those less well off are limited.

I feel like, here is some concept of privilege that informs who is willing to share their data, like if you’re wealthy and you’re not worried about finding a new job and you’re much more likely to say what bad thing can happen to be by doing this. And so, I feel like we end up with a lot of sequences that tend to be from people who don’t have to worry about it. (participant #3, male, United States, with previous scientific involvement, no children)

On the basis of the preceding considerations, about half of the participants reported that reducing privacy risks for minority groups would be key to fostering scientific progress. This change might encourage data sharing among disadvantaged people, resulting in more comprehensive studies that may benefit a wider set of people.

Well I know that any kind of minority community or disadvantaged community, getting them to participate in a lot of research is harder. And the important thing of that is that a lot of times that puts them at a further disadvantage. (participant #4, male, United States, no scientific involvement, no children)

If all the sequences in the open data set are white men for example and people are using those in experiments and writing research about them, then everybody else is getting left behind. (participant #3, male, United States, with previous scientific involvement, no children)

Most respondents explained that eliminating the social and health inequalities that make some people more exposed to privacy harms should be a priority for governments.

I guess the first thing would be to enact legislation that prevents there being any consequences of doing DNA testing or sharing results. (participant #10, female, Canada, with previous scientific involvement, parent)

While ad hoc measures might also foster data sharing from vulnerable social groups, participants were aware that eliminating discrimination and other risks linked to the use of genetic information faces challenges in attempting to counteract enduring cultural biases against minorities.

I think with my little magic wand I have here I would try to influence society into the Utopia that people are not able to discriminate against people with genetic diseases and I think this has to be done in many ways, not only antidiscrimination when it comes to genetics, but also everybody has to be a feminist, everybody has to look at transgender rights and disability and anti-racism rights. It’s even weird to say it, but when we look to America it’s kind of a topic again. (participant #13, male, Switzerland, no scientific involvement, no children)

Reasons for Open Sharing of Genetic Data

While participants stated that they valued their privacy, they also valued the benefits of sharing personal genetic data:

So […] I think about privacy, but I am glad to share information about my genetics and other things because it will be helpful for all of us I hope. (participant #8, male, Russia, no scientific involvement, parent)

There are an awful lot of people doing a lot more than I could ever hope to do but I’d like to at least contribute. […] So, I guess I don’t feel like I’m that unique or that much of a pioneer. But it’s a movement I want to be part of it anyway. (participant #1, female, US, with previous scientific involvement, parent)

Some participants argued that those who are less exposed to the risks of privacy harms should be more open about their genomic data:

I am an able hetero white male. And I think for all the other people it’s more difficult, but I think that’s where responsibility for the able white male comes in because someone has to do it. […] I think first we have to look at all these antidiscrimination things and if people become sensitive about this it’s easy to share genetic data. But half of the people are women and they are discriminated against. I will not share my data if I was a woman probably. And so, first we have to work on these social issues and then afterwards we can go on and do more sharing I guess. And the way to get there is probably by white males sharing their data basically. (participant #13, male, Switzerland, no scientific involvement, no children)

Sharing genomic data for research purposes was widely viewed as a responsibility of those who are socially better off. Other participants asserted that vulnerable groups should receive special attention since they might perceive themselves to be at higher risk for discrimination and, as a result, might be more reluctant to share their data.

I feel like, one thing that I want are stronger protections for genetic data, like we have GINA in the US, but then recently have had a challenge to GINA in the form of, like this really weird workplace wellness program exemption. So, I’d like to see stronger protections for people who share genetic data. Because I think that’s not going to give us 100,000 people who want to share but at least it makes it safer for people who want to share. […] It’s the combination of low risk and for me, what’s the worst that could happen? Other than that, I guess raising awareness, that’s a thing you can even do. (participant #3, male, United States, with previous scientific involvement, no children)

After pointing out that he belongs to a privileged section of society (educated, upper middle class) that has no reason to worry about privacy harms, one participant highlighted the need for safeguards to facilitate sharing:

I don’t have enough at risk. […] If I could ensure that everybody’s information was not going to be abused […], I would make this wide open. […] It’s exciting to me to be able to contribute to this in even the smallest possible way. (participant #6, male, United States, no scientific involvement, parent)

Open sharing is not the only solution participants imagined to solve problems arising from socially determined impediments to data sharing. One-third of the participants suggested that opt-out policies—that is, sharing by default—would increase data sharing.

Just make sure that every time they give a blood sample they get genetic information. My mother was in a study where she gave genetic samples every year, DNA samples every year, that was inspirational to me. Have it so they see it as something positive. Like they did with my mom, something that moves society forward in a positive way, so they clarify it that, oh my God, they will not genetically modify us or something crazy like that. (participant #7, female, United States, no scientific involvement, no children)

What I would do to promote this sharing and development of this is of course to set up programs like 23andMe and to share the data by default and give people the option of opting out as they chose. And I would make it clear and I would make it easy for people to recognize, but I think that a lot of the times there’s that initiative barrier. A lot of the people aren’t willing to do it, but if you kind of set it as the default a lot of people would do it simply because they would say well I might as well. (participant #4, male, United States, no scientific involvement, no children)


Previous research on genomic data sharing suggests that, when given the choice, a considerable number of research participants agree to share their genomic data for research purposes (Pereira, Gibbs, and McGuire 2014; Vayena et al. 2014). Most people share such data under certain conditions and with a promise of privacy safeguards. In this study, we have explored a particular subset of people who share their genomic data publicly, unconditionally, and without any privacy safeguards. Despite openSNP’s radically open model, users are nonetheless concerned about the importance of privacy (Haeusermann et al. 2017). Such a phenomenon, commonly defined as the privacy paradox (Barth and de Jong 2017), suggests that individuals possess a subtler understanding of their privacy-related interests than might be presumed. The fact that they decide to publicly share their data does not necessarily imply that they have no privacy concerns (Hallam and Zanella 2017; Kokolakis 2017; Taddicken 2014). Indeed, our study demonstrates that openSNP users recognize socially determined conditions that affect privacy and the likelihood and consequences of privacy-related harms. In particular, our respondents recognized privacy concerns along the following three dimensions.

Institutionally Embedded Privacy

Some of our respondents see privacy as being influenced by social and institutional practices. In particular, they interestingly pointed out that many organizations and institutions rely heavily on collecting and retaining personal information of citizens who interact with them. As such, many socially important activities presuppose handing over personal information and losing control over it. This loss of control over personal information pertains particularly to activities and services that depend upon digital technologies, as noted by privacy theorist Daniel J. Solove and sociologist David Lyon (Lyon 2003; Solove 2006), among others. What study participants stressed, however, is not so much the danger that this loss of control poses to the enjoyment of privacy. Instead, respondents viewed this realization as a precondition for a realistic understanding of privacy. This view resonates with a popular understanding of privacy in the digital age as beyond reach (BBC 2017; Morgan 2014; Preston 2014; Rauhofer 2008), rather than being obtainable by technically improving data security: people, in other words, seem to lack trust in the capacity to keep personal information safe from possible misuse.

The Social Gradient of Privacy

For many people, protecting privacy means primarily protecting vulnerable individuals likely to face discrimination following the release of their private information. Privacy protection efforts have focused mainly on building walls between data subjects and third parties, as well as on limiting unauthorized disclosure of information. These efforts also apply to genetic information, since restricting access to genetic information has long been a major concern in the debate about genetic privacy. However, as Mark A. Rothstein has argued, protecting genetic privacy, rather than focusing mainly on accessibility, entails “preventing the harmful use of this sensitive information” (1998). While the likelihood of privacy-related harms depends somewhat on how easily genetic data can be accessed, limiting unauthorized access is just one aspect of tackling the problem (Erlich and Narayanan 2014). For instance, disclosing one’s own genetic information voluntarily can also lead to harms. What is more, tightly regulating the sharing of genetic data might even be inappropriate in contexts where users wish to disclose their genetic information voluntarily (Rothstein 1998), as in the case of openSNP.

Understandably, the academic debate on genetic privacy has focused consistently on the need to prevent harmful uses of genetic information (Rothstein 1998). Distinct challenges remain in domains such as health insurance, life insurance, and equality of opportunity in employment and education (Erlich et al. 2014). Protecting genetic privacy thus closely connects to broader issues regarding the structure of social welfare and the legal protection of fundamental social rights, such as access to health care (Raisaro, Ayday, and Hubaux 2014). OpenSNP users are generally aware that privacy-related harms are less likely to affect people enjoying stronger protections against misuses of genetic information, such as citizens of countries with universal healthcare coverage. This awareness is reflected in studies of the way genetic privacy is understood in countries offering universal health care (Akgün et al. 2015; Parthasarathy 2004). Also, belonging to a dominant or advantaged social group is understood as a form of protection against discrimination (Annas, Glantz, and Roche 1995; Hudson, Holohan, and Collins 2008; Miller 1998; Parthasarathy 2004).

Abuse of genetic information has been the focus of intense legislative debate in the United States. Despite the protection offered to citizens by the Genetic Information Non-discrimination Act (GINA 2008; National Human Genome Research Institute 2012), this law does not alleviate fear of genetic discrimination (Hudson, Holohan, and Collins 2008). For example, the provision stating that insurance companies cannot require genetic testing or use the results of a genetic test to deny coverage or increase insurance rates applies only to asymptomatic individuals (Rothstein 2008). Individuals with clinical symptoms are instead more vulnerable. In some cases, though, fear of genetic discrimination by insurance companies might be alleviated by better educating the public about existing regulations (Allain, Friedman, and Senter 2012).

OpenSNP users showed awareness of such issues. Moreover, they show a clear understanding of the way genetic privacy risks are distributed across a social gradient, including determinants such as gender, ethnicity, sexual orientation, health status, employment status, and nationality. Participants clearly articulated these risks as policy issues, saying that these forms of inequality can, and indeed should, be tackled by public authorities through appropriate legislation. In this respect, since our participants represent a range of nationalities, their views may reflect or be influenced by the specific privacy-related regulations that apply in each country, such as, for instance, the UK insurance industry’s concordat and moratorium on predictive health tests (HM Government 2014), or Canada’s 2017 Genetic Non-Discrimination Act (OpenParliament 2018).

A Duty to Openness

Our study further highlights that, according to the participants, acknowledging a social gradient of genetic privacy generates duties on the part of those enjoying stronger protections against privacy risks. A duty to openness, they suggest, would increase the number of people sharing their data and contributing to science. Those who are exposed to a higher risk of privacy-related harms are seen as less likely to contribute their data for research. As a consequence of this social disadvantage, so the argument goes, vulnerable people or groups are not sufficiently represented in medical research, which can lead to further health-related disadvantages for this population. Underrepresentation of minorities in medical research is an established fact, depending more on a lack of access to health research than on a lack of willingness to participate (Wendler et al. 2006). According to the World Health Organization, health research involving minorities, including indigenous people, should be promoted and carried out, so as to take into due account cultural differences; it should be grounded in mutual respect, and should be perceived to be beneficial and acceptable to both the subjects and their communities (WHO 2018). The same issues of fairness apply to the marketing of ancestry tests for Native American heritage, often criticized for being misleading and deceptive (Scodari 2017).

Bioethicist John Harris locates the basis of a duty to volunteer for research in the idea of reciprocity (Harris 2005). Each of us has a duty to contribute to the benefits he or she receives from medical knowledge by volunteering to the research that allows such knowledge to exist and progress. Participants who fail to reciprocate in this way, then, are free riders. Participants in our study expressed similar commitments to advancing medical knowledge by sharing their genetic data (Haeusermann et al. 2017). Yet our participants saw the duty to openness mainly as a form of activism, raising awareness about the social determinants of privacy. This form of activism echoes elements of the so-called “communitarian turn” in the ethics of genetic research, emphasizing solidarity, reciprocity, citizenry, and the democratization of research (Chadwick 1999; Knoppers and Chadwick 2005; Lee and Crawley 2009). Interestingly, our participants envision research participation as a site for broad narratives about equality and social justice, as well as an opportunity to actively promote fairness and equal rights to participation in scientific research.


The reported findings represent the views and voices of a select group of individuals who take a decidedly open stance towards their genetic privacy. A first limitation of our study concerns generalizability. Although the study was not designed to allow statistical-probabilistic generalizability to the broader social group of DTC-GT users or to other open sharing platforms, it sheds light on a novel social phenomenon—open sharing of genetic data—that has not previously been addressed.

Furthermore, the study may have a self-reporting bias, since openSNP users may be more interested in the topic of genetic privacy than the general population. Moreover, openSNP users have less variability in ethnicity, language, nationality, and geographic origin than a statistically representative sample of all people openly sharing their genetic data, let alone of all individuals involved in DTC-GT. In addition, the number of participants from socially disenfranchised groups appear overrepresented in our sample (about half of the total participants). We extracted the present study sample from a previous survey study and, since this study was the first of its kind at that time, we had no reason to assume that disenfranchised people would be so present and willing to be interviewed. For this reason, our initial survey included no questions on previous “unconventional” experiences (such as having been arrested or under federal investigation). It was not possible, therefore, to create a quota corresponding to this criterion in the present sampling. The literature shows that DTC-GT consumers are a rather homogeneous group of well-educated “dominant group members” (Levenson 2016). OpenSNP represents a subgroup of DTC-GT that allows DTC-GT consumers to publicly upload their genomic data online. We therefore assumed that they would share most relevant features. However, this study shows that this assumption is not necessarily true. This study provides indications that further research is needed to understand the social composition of openSNP users.

Finally, our findings refer to the users of one specific platform (openSNP), but different views may emerge in other online communities.


This study explored the concept of privacy among individuals who share their genetic data publicly online. Although the qualitative nature of the study does not allow statistical generalizability of our findings beyond openSNP users, our results suggest that open sharing of one’s genetic data is not driven solely by the altruistic desire to contribute to science. Moreover, we showed that genetic privacy, for such users, means more than simply protecting personal information from unauthorized access (see also Phillips and Charbonneau 2015). Instead, the notion of genetic privacy points to the need to tackle social and health inequalities that might lead to discrimination and other privacy-related harms. The right to privacy, even if formally guaranteed to all citizens, is not enjoyed by all individuals in equal measure. Discussing how privacy risks and privacy-related harms are distributed across a gradient of socioeconomic conditions reveals an important dimension of social inequality that warrants further exploration.

To our knowledge, this is the first qualitative study conducted on the experiences, motivations, and beliefs of users of online data sharing platforms who decide to openly share their genomic data. More research is needed to further investigate how privacy is articulated in other online data sharing platforms. In particular, the question of how to assess the benefits of open genetic data sharing against its potential risks—how to strike a balance between genomic utility and privacy—will benefit from more qualitative and quantitative studies exploring the motivations, attitudes, and behaviors of individuals who voluntarily contribute their genetic data to online open repositories such as openSNP. As online open data sharing is conducted primarily for research purposes without the supervision of a researcher or research institution that can ensure this type of research is conducted in a scientifically and ethically sound manner, it is crucial to gather users’ perspectives on how privacy interests can best be protected while also preserving the bottom-up approach characterizing these activities (Vayena and Tasioulas 2013a).