Valerie Racine

Valerie Racine's Commentary on "Big Data & Conservation Biology"

Commentary Content

Some of the main goals in conservation biology are to track changes in large-scale ecosystems and to conserve biodiversity. Defining and assessing ‘biodiversity’ presents many epistemological challenges to which many scientists attend (cf. Sarkar 2002; Sarkar et al. 2006). Moreover, conservation biologists must collect, maintain, and analyze large sets of data. And, with better technology to track and measure biological and environmental variables, and the ability to share or create open access databases, conservation biology faces emerging ethical issues concerning its reliance on big data.

As with other sciences, the use of big data in conservation biology has led to ethical considerations about how to best balance basic scientific virtues, like the open flow of information and collaborations across borders, with the need to protect participant privacy and to maintain confidentiality in certain contexts (Bowser et al. 2014).

In the hypothetical scenario described above, the context is a citizen science project in which amateur birders share records of their observations, which are then curated and annotated by experts to become data made available on an open access platform. Because private citizens are sharing information, the eBird website clearly outlines its privacy policy to inform participants that although no personal contact information is publicly available, be aware that any details of an observation as well as its corresponding location (species, numbers, etc.) are in fact available to all users who are registered with eBird (Cornell Lab of Ornithology 2018). Therefore, it is possible that information about sightings on private property become publicly available. In the scenario, the interactions between Andrei and his neighbor, Anna, illustrate one particular ethical issue that might arise from such circumstances.

The potential costs of reporting on the presence of sensitive or endangered species presents another concern related to confidentiality, which also arises in the hypothetical scenario. The eBird website includes guidelines for reporting on sensitive species (Team eBird 2012).

Conservationists worry that publicizing the explicit coordinates or directions to the locations of sensitive or rare species might encourage more traffic in the area, which may lead to an increase in the risk of human disturbance to a vulnerable species’ habitats. There are also potential negative effects that reports of rare birds might have on the quality of the databases to which they are submitted. For example, the reports may lead to the phenomena of “twitching” – “the act of making trips specifically to see previously reported rare birds” (Straka & Turner 2013, 40). Twitching can lead to biased samples of checklists or misleading data on bird abundance in open access databases from citizen science projects (Straka & Turner 2013; Kelling et al. 2009).

There are additional concerns about confidentiality and security with respect to publicly available data in conservation biology, sustainability, and environmental sciences (Keeso 2014). For example, poachers may gain access to the locations of endangered species and cause harm. Governments are sometimes hesitant to disclose detailed geographical maps – which might be very useful to scientists in tracking a region’s biodiversity – for reasons of national security. And, some corporations and scientists are worried about confidentiality because they view their data as proprietary.

Moreover, new technologies used by conservation biologists in the field to gather data, such as biotelemetry, might require interventions in natural habitats, which raise some ethical concerns, especially in the context of research on endangered species or sensitive ecosystems (Cooke 2008; Jewell 2013). The use of biotelemetry often requires tagging individuals of a species. This generates valuable information that may be useful to inform conservation priorities and meet conservation goals, and the assumption is that such interventions will not harm the welfare of individuals or populations, but the risk of harm is still a possibility. To mitigate these harms, biologists have made efforts to weigh the relative benefits of the research and any costs to individuals and populations. Researchers also investigate the impacts of tagging activities and test tagging techniques to develop better intervention practices (Cooke 2008, 172).

Furthermore, researchers have considered some of the large-scale effects of big data biodiversity projects, such as the global biodiversity information facility (GBIF), on the priorities and practices of ecological sciences (Devictor & Bensaude-Vincent 2016). They argue that the conversion of records and observations into data – what they call the process of datafication – results in the loss of information (e.g. de-contextualization) about particular environments or ecosystems, which in turn transforms the science of ecology from one centered on environmental management to one centered at providing and managing data for environmental management. They refer to this shift of focus as a transformation of ecology into a “technoscience” (Devictor & Bensaude-Vincent 2016, 20). This transformation might have harmful implications if it leads to a situation where scientists feel justified in accumulating data and monitoring global diversity without any concern for consequences occurring at smaller, local scales, or concern about the lack of political action needed to protect local environments or ecosystems (Devictor & Bensaude-Vincent 2016, 19-20).

While an emphasis on the accumulation of big data for conservation biology and environmental science might lead to a neglect of local contexts, some researchers have instead emphasized potential ethical upshots and societal benefits of big data, and data sharing in particular, within these fields. For example, Soranno et al. (2015) claim that “the issue of data sharing holds the potential for environmental scientists to align their practice with the discipline’s growing interest in issues of social consciousness, the democratization of science, inclusion, and scientific literacy” (Soranno et al. 2015, 71). According to these authors, the increasing reliance on public participation in, and sponsorship of, research creates an ethical obligation for scientists to promote and facilitate data sharing.


1. Cornell Lab of Ornithology. 2018. “Home: Privacy Policy and Cookie Policy.” Accessed 14 May 2021.

2. Team eBird. 2012. “Guidelines for Reporting Sensitive Species.” Accessed 14 May 2021.

Valerie Racine's Commentary on "Big Data & Neuroscience"

Commentary On
Commentary Content

Current discussions concerning recently launched large-scale data collection projects in neuroscience, such as the US’s BRAIN Initiative and the EU’s Human Brain Project, raise both epistemological and ethical questions. Concerning the former type of questions, many have asked what, if anything, can “bottom-up” strategies of large-scale collection of data about the brain really tell us about the human mind, consciousness and behaviour. Those sorts of concerns (e.g. about faulty inferences, false positives, etc.) often steer the ethical questions about the implications that the collection of brainwave data may have on our notions of personal identity, privacy, property, the capacity for consent, and the control of behaviour. Additionally, the novel uses of neurotechnologies raise some of the typically issues in the ethics of emerging technologies, such as dual-use dilemmas and governance.

First, the issues of personal identity, privacy, and property in big data neuroscience projects are similar to those that have emerged in the context of genetics and genomics (Choudhury et al. 2014; Illes & Lombera 2008). If data from EEGs, for example, can be used as a biometric signature that can identify individuals, then the identifying data may include sensitive information about the mental health or capacities of individuals. That kind of sensitive information must be protected to avoid its misuse and the potential profiling of individuals (Rodenburg 2014). So, safeguards must be taken to protect the confidentiality of research participants. But, researchers also have a duty to research participants to be clear about the purpose of data collection, its use, accessibility, and purposes.

Also similar to the context of genomics databases, researchers and scientists think it is important to safeguard individuals’ mental privacy in a way that does not impede scientific and technological developments (Choudhury et al. 2014; Illes & Lombera 2008; Rose 2014). In this sense, there is a tension in whether to prioritize the principle of autonomy in research involving human subjects, or whether the principles of beneficence and justice ought to become more important in guiding the moral duties and responsibilities of researchers.

With respect to neuromarketing, many have questioned whether the appeal to consumers’ unconscious brain signals might be an invasion of privacy or an unethical manipulation of our affective states. Others have proposed that perhaps the field is exploiting useful medical equipment for frivolous and shallow purposes (Ulman et al. 2015). In light of these sorts of concerns, France has recently revised its 2004 rules on bioethics in 2011 to include a section on the appropriate use of brain-imaging technologies. It states: “Brain-imaging methods can be used only for medical and scientific research purposes or in the context of court expertise” (Oullier 2012; Ulman et al. 2015). With this revision, the commercial use of brainwave technologies is currently banned in France.

Ethicists have also noted that using these data to sell goods and services might lead to exploitation of vulnerable groups (e.g. children) who cannot understand or consent to the practices of neuromarketing. As with other research with human subjects, ethicists have argued vulnerable groups should be protected (Ulman et al. 2015).

Lastly, some fear that the potential manipulation of our cognitive and affective states for profit in commercial contexts might spill over to the political realm, where individuals can be manipulated to vote one way or another. Here again there is concern over whether the use of data gathered from neurotechnologies might interfere with our capacity for consent (Rodenburg 2014; Gutmann 2015).

Valerie Racine's Commentary on "Big Data & Public Health"

Commentary Content

The fictional scenario described above is loosely based on a recent initiative by Google. In 2009, research scientists at Google published a study in Nature, describing their methods for tracking seasonal and pandemic influenza outbreaks using data generated from monitoring health-seeking behaviour on Internet search engines (Ginsberg et al. 2009). They had developed tools to track outbreaks in real-time in order to improve upon the traditional methods used by the Center for Disease Control and Prevention (CDC), which take approximately two weeks to gather and analyze data. The algorithms developed by the scientists at Google led to the creation of Google Flu Trends (GFT), a web service launched in 2008 to track flu outbreaks. The service is no longer publishing its results, but its data are made available to other researchers.

The 2009 Nature paper is often used as a paradigm example to illustrate the emergence of a new field referred to as digital epidemiology, or digital disease detection (DDD) (Brownstein et al. 2009; Salathe et al. 2012; Vayena et al. 2015). This field shares the goals and objectives of traditional epidemiology (e.g. public health surveillance, disease outbreak detection, etc.), but makes use of electronic information sources, such as internet search engines, mobile devices, and other social media platforms, which can generate data related to public health but that are not explicitly designed for collecting public health-related data. The motivation behind DDD initiatives, like Global Flu Trends, is to mine large datasets in order to accelerate the process of tracking and responding to outbreaks of infectious diseases.

In 2013, Google’s program to track influenza outbreaks was heavily criticized for mis-estimating the prevalence of influenza outbreaks (Butler 2013, Lazer et al. 2014, Lazer & Kennedy 2015). Its first big mistake occurred in 2009, when it underestimated the Swine Flu (H1N1) pandemic (Butler 2013; White 2015), due to changes in people’s search behaviour with respect to the categories of “influenza complications” and “term for influenza” given the non-typical seasonal outbreak of H1N1 during the summer months (Cook et al. 2011). Then, in 2013, Nature reported that GFT significantly over-estimated outbreaks of influenza (Butler 2013; Lazer et al. 2014). In a comment published in Science in 2014, Lazer et al. reported that GFT had been consistently over-estimating the prevalence of flu outbreaks before then, inaccurately predicting the prevalence of flu cases in 100 of 108 weeks during the 2011-2012 flu seasons (Lazer et al. 2014).

GFT’s track record of mis-estimations has been described as “big data hubris” – “the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis” (Lazer et al. 2014, 1203). In epidemiology, traditional data collection and analysis involves gathering data from structured interviews, archives, censuses, and surveys, and then to look for patterns and trends in the data. However, most scientists commenting on the case of GFT have insisted that, despite its failures, the use of big data in epidemiology can be extremely valuable for public health surveillance (Lazer et al. 2014, Lazer & Kennedy 2015, White 2015).

The GFT case has invoked many epistemological questions about how to improve Google’s flu algorithms, and big data analytics more generally, and how public health policy and decision-makers ought to use these tools. But, it has also engendered ethical concerns at “the nexus of ethics and methodology” (Vayena et al. 2015).

For example, there can be harmful consequences when such models are woefully inaccurate or imprecise. False identification of outbreaks or inaccurate and imprecise predictions of outbreak trajectories could place undue stress on limited health resources (Vayena et al. 2015). Wrong results or predictions might also undermine the public’s trust in scientific findings, and worse, might lead to the public’s dismissal of public health warnings.

In addition to worries about maintaining the public’s trust on issues of public health, researchers developing models aimed at detecting outbreaks must consider that their results risk harming individuals, businesses, communities, and even entire regions or countries (Vayena et al. 2015). This harm may take the form of stigmatization of groups, and financial loss due to prejudice or restrictions on travel to tourist destinations. It can also restrict the freedom of individuals in the form of imposed travel restrictions or quarantines. Consequently, ethicists have stressed that “methodological robustness” with respect to digital epidemiology is “an ethical, not just a scientific, requirement” (Vayena et al. 2015, 4).

As with other instances of big data collection and use in the life sciences, the use of big data gathered online in social or commercial contexts for public health purposes raises ethical issues about an individual’s right to privacy and notions of informed consent when that data is used for research purposes. However, in this context, it has been suggested that private corporations that have access to relevant data might have a moral obligation to share that data for matters related to public health and public health research. This consideration raises questions about how to regulate private-public partnerships with regards to data ownership within a global context in order to uphold the values of transparency, global justice, and the common good in public health research (Vayena et al. 2015).

Valerie Racine's Commentary on "Big Data & Genetic Privacy: Re-identification of Anonymized Data"

Commentary Content

The study by Gymrek et al. 2013, and others like it, generated demands for additional restrictions in database sharing policies, changes to how and what kinds of data were collected and anonymized, and worries about some of the foundational concepts in research ethics, including the notions of informed consent, privacy, confidentiality, and the nature of the researcher/clinician – subject/patient relationship. This short commentary will focus on those concepts in biomedical research ethics.

Most researchers and ethicists agree that it is important to safeguard privacy and confidentiality for patients and research subjects, but to do so in a way that does not impede scientific progress. This “sweet spot” between the competing goals of scientific research and the individual’s right to privacy is especially relevant for current genomic and genetic analyses using big data. For instance, Genome Wide Association Studies (GWAS) capitalize on correlated sets of large databases of individuals’ genetic variants to determine whether certain variants are important contributors to complex diseases or disorders. There is also much optimism about the prospects of personalized medicine, in which medical professionals would access and integrate patients’ personal genomic data into targeted and tailored treatments. The success of personalized medicine, however, requires knowledge about which sorts of treatments will be effective for certain genetic variants, which depends on genomic analyses of big data.

While there are clear potential benefits of biomedical research analyses of large sets of genomic and genetic data, that information is also particularly sensitive as it can accurately reveal subjects’ identity in the same way as social security numbers can. It can also reveal the identity of an individual’s relatives. Because of the way this information can serve as accurate individual identifiers, some researchers have taken the notion of genetic privacy to denote a special instance of privacy (e.g. Rothstein 1997), based on the notion of “genetic exceptionalism” – i.e. “the view that being genetic makes information, traits, and properties qualitatively different and deserving of exceptional consideration” (Lunshof et al. 2008).

If we accept a concept of genetic privacy, based on genetic exceptionalism, then there are implications for the way we think about infringement of privacy and breach of confidentiality within the biomedical research context. For instance, Lunshof et al. (2008) argues that because some violations of privacy occur which are beyond the control of individuals or institutions (as in the above case scenario), they do not necessarily signal a moral failure even though those violations may cause harm in some instances. However, they note that the promise of confidentiality implies a relationship of trust and, with it, moral responsibilities on those who promise confidentiality. For that reason, a breach in confidentiality does entail a moral failure with respect to the relation of trust between the researcher/clinician and subject/patient.

These moral considerations have led research scientists and ethicists to rethink the model of informed consent that typically guides the relationships of trust between clinician/researcher and patient/subject in the biomedical context, and to reconsider what, if any, sense of privacy and anonymity should be promised to patients and research subjects.

Informed consent is typically used in cases of specific research studies. It is problematic in research that makes use of big data because it does not, and cannot, explicitly cover all future investigations, or future instances of sharing and aggregating data across research communities. Because of these elements in big data science, the traditional notion of informed consent cannot be implemented in the usual way.

Consequently, some have proposed more liberal notions of consent, such as “open,” “broad,” or “blanket” consent (Mittelstadt & Floridi 2015). These notions of consent require research participants to consent to all future research activities that makes use of their data. However, those approaches have been criticized for limiting patients’ or subjects’ autonomy (Mittelstadt & Floridi 2015; Master et al. 2014). An alternative proposal to the models of general consent is the notion of “tiered” consent. That notion of consent would enable patients and subjects to choose to limit future access to their data to only some kinds of research, or to require researchers to re-consent patients and subjects for specific kinds of future research. That approach has been criticized for creating too many difficulties for researchers and the management of large databanks.

Another alternative has been to emphasize the concept of solidarity rather than consent. This approach relies on the participation of “information altruists” concerned with the public good. It is mainly concerned with how research can be pursued and harms can be mitigated, “by providing data subjects with a ‘mission statement’, information on potential areas of research, future uses, risks and benefits, feedback procedures and the potential commercial value of the data, so as to establish a ‘‘contractual’’ rather than consent basis for the research relationship” (Mittelstadt & Floridi 2015; Prainsack and Buyx 2013). The proposed reliance on solidarity and public sentiment has been criticized for placing undue burdens on individuals to participate in research. However, it might also serve to emphasize the ethical responsibilities of big data researchers and database managers, and encourage scientists to be more proactive in the disclosure and transparency of risks of harm that might occur as a consequence of the loss of privacy (Lunshof et al. 2008; Barocas & Nissenbaum 2014). In this way, genomic and genetic research dependent on large sets of data has the potential to shift the moral responsibilities of researchers from protecting the privacy of individuals to ensuring the just distribution of any benefits from the outcomes of their research (Fairfield & Shtein 2014).

The emerging concepts of consent under negotiation within this research context, and the emphasis on researchers’ duty to benefit research participants and their communities more widely as well as the research participants’ duty to contribute to the public good, are areas of ethical deliberation intended to maintain the public’s trust in the medical profession, and scientific institutions more broadly. These ethical concepts and proposals, therefore, ought to be evaluated by how well they are able to do so.

Valerie Racine's Commentary on "Genome Editing & the Ethics of CRISPR-Cas9"

Commentary Content

The publication of Huang and colleagues’ research caused a stir in the scientific community and generated many editorials and opinion pieces in scientific publications warning about the ethical issues that must be addressed before this research is pursued any further.

Scientists were quick to call for a moratorium on all genome editing of human embryos, and invoked similarities to the technological innovation that led to recombinant DNA in the 1970s and the meeting at Asilomar in 1975, where molecular biologists met to discuss and set guidelines to ensure that genetic research would develop in a safe and ethical manner (Vogel 2015).

However, many are critical of the comparisons with the Asilomar meeting and the attempt to use that conference as a model on which to build bioethical guidelines for future research with genome editing technologies (Jasanoff et al. 2015). Critics claim that the 1975 Asilomar conference was not an inclusive meeting because many of the stakeholders were not invited, such as ethicists, politicians, religious groups, and representatives of human-rights organizations or patient-interest groups (Reardon 2015b). Because of the lack of representation from non-scientists in the discussions, critics claim that Asilomar was merely an effort by scientists to resist government restrictions and promote public trust in the idea that scientists are able to regulate themselves (Reardon 2015b).

In response to calls for a moratorium, the US National Academy of Sciences (NAS) and the National Academy of Medicine (NAM) have launched an initiative to develop new guidelines to address the use of technology which makes germ line genetic modification possible, and called for members of the scientific community to attend an international summit on the topic set in December 2015 (Reardon 2015b).

The International Summit on Human Gene Editing held in Washington, D.C., in December 2015, was hosted by the National Academy of Sciences, the National Academy of Medicine, the Chinese Academy of Sciences, and the U.K.'s Royal Society. Members of the Summit’s organizing committee submitted a public statement shortly after the meeting, outlining four recommendations. First, basic and preclinical research on gene-editing technologies is needed and should proceed. Second, clinical use of the technologies on somatic cells should be explored. Third, it is irresponsible to pursue clinical applications of gene-editing technologies on germline cells at this time. And, fourth, there is a need for ongoing discussions regarding the clinical use of germline gene editing, so the national academies should create a forum to allow for discussions which are inclusive and which engage with a variety of perspectives and expertise.   

Some science policy experts have argued that the complexity of the issues surrounding germ line genetic modification cannot be adequately addressed from a scientific perspective. For example, Daniel Sarewitz, co-director of Arizona State University’s Consortium for Science, Policy, and Outcomes, argues:

The idea that the risks, benefits and ethical challenges of these emerging technologies are something to be decided by experts is wrong-headed, futile and self-defeating. It misunderstands the role of science in public discussions about technological risk. It seriously underestimates the democratic sources of science's vitality and the capacities of democratic deliberation. And it will further delegitimize and politicize science in modern societies (Sarewitz 2015).

Sarewitz’s comment signifies the importance of a democratic deliberative process when identifying and addressing ethical issues about emerging technologies, as well as developing guidelines that will help to decide how these technologies will be further developed and used. In this particular case, there is worry that germ line genetic modification on human embryos to replace defective genes may lead to a slippery slope to eugenics, or attempts to create perfect designer babies.

Lastly, the decision by Science and Nature to decline to publish the research paper because of undisclosed ethical objections raised further ethical issues about the dissemination of scientific research within a global context. The managing editor of Protein & Cells, Xiaoxue Zhang, has claimed that their editorial board was not blind to the potential ethical objections to the research, but decided to publish the article as a way to “sound an alarm” to begin discussions about the future direction of genome editing technologies (Cressey & Cyranoski 2015). Whether these discussions should come before or after the scientific research is conducted or published raises important questions about how best to regulate innovative scientific research with uncertain outcomes or potential dual-use applications.