Data Management Bibliography

Author(s)
Discipline(s)
Description

Includes guidelines, reports, books, and articles discussing ethical issues that arise in collaborative research, including international and interdisciplinary research.

Body

Data Management and Data Sharing Bibliography

See also bibliographies on Reproducibility and Big Data (Under Review).

 

Policy/Guidance


American Psychological Association. 2015. Data Sharing: Principles and Considerations for Policy Development
A brief discussion of the benefits of data sharing in the field of psychology, as well as a list of ethical considerations that any researcher or institution needs to consider in terms of protecting the privacy of individuals.

American Statistical Association. 2016. Ethical Guidelines for Statistical Practice
Developed by the American Statistical Association, these ethical guidelines are meant for statistical practitioners and anyone who uses statistics in their professional work. 

Association of Internet Researchers. 2012. Ethical Decision-Making and Internet Research
This set of recommendations is meant to try and assist researchers from a large range of disciplines, countries and contexts who are working in this field. The document guides researchers in asking a series of key questions about their gathering, use, presentation, and disposal of data as a way to help develop sound internet research protocols.  

Michigan State University, 2016. Life Cycle Data Management Planning.
Along with a good list of policies from major U.S. agencies that fund research, this site is an excellent source of guidance for developing a data management plan for the entire life-cycle of your scientific data.

National Institutes of Health Data Sharing Policy
Includes guidelines that all research funded by the NIH must follow to ensure the expedited translation of research results into knowledge, products and procedures to improve human health.

National Institutes of Health Sharing Policies and Related Guidance on NIH-Funded Research Resources
A site that pulls together data sharing policies from all different programs in the NIH, and includes specific guidelines for the sharing of genomic data, a table of NIH data-sharing repositories, and example Data Sharing Plans for individuals putting together a proposal.  

National Science Foundation  Dissemination and Sharing
Guidelines from the National Science Foundation, including specific guidelines by Directorate and Program, on how to share and disseminate data coming from NSF-funded research.

 

Web Sites

Northern Illinois University. 2005. Data Management. United States, Office of Research Integrity.
An online module covering the ethical issues involved in data ownership, management, and the sharing of data.

Guidelines for Responsible Data Management in Scientific Research
Developed by Clinical Tools Inc. for the U.S. Office of Research Integrity, this is an educational “course” that talks about the general rules of data management, how to define the role of research staff regarding data management, and guidance on how to implement an communication plan for dealing with data management among team members.

Resources for Research Ethics Education  - Data Management
A concise introduction to best practices on data management, including background, an introduction to regulations and guidelines, as well as discussion tools, cases and further resources.

 

Whitbeck, Caroline. 2006. The Responsible Collection, Retention, Sharing and Interpretation of Data. Online Ethics Center.
A training module using scenarios and other methods to facilitate discussion about the key ethical issues that arise in data management in research.

 

Books/Reports

Corti, Louise, Veerle Van de Eynden, Libby Bishop and Matthew Woollard. 2014. Managing and Sharing Research Data. A guide to good practice. Los Angeles: SAGE.
Written by experts from the UK Data Archive with over 20 years experience, this book gives post-graduate students, researchers and research support staff the data management skills required in today’s changing research environment.

Council on Library and Information Resources (U.S.). 2013. Research Data Management: Principles, practices and prospects. Washington, D.C.: Council on Library and Information Resources.
This report examines how research institutions are responding to data management requirements of the National Science Foundation, National Institutes of Health, and other federal agencies. It also considers what role, if any, academic libraries and the library and information science profession should have in supporting researchers’ data management needs.


Howard Hughes Medical Institute 2006. “Chapter 8. Data management and Laboratory Notebooks.” A Practical Guide to Scientific Management for Postdocs and New Faculty. Pp. 143-152.
A brief but excellent overview of best practices for day-to-day recordkeeping and good practice for keeping laboratory notebooks.

Mascalzoni, Deborah (ed.) 2015. Ethics, Law and Governance of Biobanking : National, European, and international approaches. Dordrecht: Springer.
Biobank research and genomic information are changing the way we look at health and medicine. Genomics challenges our values and has always been controversial and difficult to regulate. This book contains an overview of the existing regulatory landscape for biobank research in the Western world and some critical chapters to show how regulations and ethical frameworks are developed and work.

National Academy of Sciences (U.S.), National Academy of Engineering., & Institute of Medicine (U.S.). 2012. The Case for International Sharing of Scientific Data: A Focus on Developing Countries: Proceedings of a Symposium.
The theme of this international symposium is the promotion of greater sharing of scientific data for the benefit of research and broader development, particularly in the developing world. The symposium has been co-organized by the NRC's Policy and Global Affairs Division—the Board on International Scientific Organizations (BISO) and the Board on Research Data and Information (BRDI). The Case for International Sharing of Scientific Data: A Focus on Developing Countries: Proceedings of a Symposium summarizes the symposium.

National Academy of Sciences (U.S.), National Academy of Engineering., & Institute of Medicine (U.S.). 2010. Conducting Biosocial Surveys: Collecting, Storing, Accessing, and Protecting Biospecimens and Biodata. Washington, D.C.: National Academy Press.
This report offers findings and recommendations concerning the best approaches in the Biosocial field. The topics covered include: informed consent, privacy issues and the best practice, but also additional legal, ethical, and social issues, as well as practical issues related to the storage, retrieval, and sharing of data.

National Academy of Sciences (U.S.), National Academy of Engineering., & Institute of Medicine (U.S.). 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, D.C.: National Academy Press.
As digital technologies are expanding the power and reach of research, they are also raising complex issues. These include complications in ensuring the validity of research data; standards that do not keep pace with the high rate of innovation; restrictions on data sharing that reduce the ability of researchers to verify results and build on previous research; and huge increases in the amount of data being generated, creating severe challenges in preserving that data for long-term use.

Olsen, Wendy Kay. 2012. Data collection: Key debates and methods in social research. Thousand Oaks, Calif: SAGE.
Provides an overview of key issues and practical methods needed for data collection in social science research, as well as ethical issues that can arise.

 

Planning and Preparing for Research” in Doing Global Science: A Guide to Responsible Conduct in the Global Research Enterprise, 32-37. Princeton University Press, 2016.
This concise introductory guide explains the values that should inform the responsible conduct of scientific research in today's global setting. It includes a section on data collection and analysis, and the sharing of data.

Xiao, Norah and Leah Rae McEwen (eds.) 2012. Special Issues in Data Management. Washington, D.C.: American Chemical Society.  
Focusing on the field of chemistry, this book provides a discussion of data management and then moves on to focus on challenges in supporting e-science research and data management, especially from the view of information science and research libraries.

 

Journal Articles

Benos, Dale J., and Sara H. Vollmer. 2010. "Generalizing on Best Practices in Image Processing: A Model for Promoting Research Integrity."  Science & Engineering Ethics 16 (4):669-673. doi: 10.1007/s11948-010-9226-2.
Modifying images for scientific publication is now quick and easy due to changes in technology. This has created a need for new image processing guidelines and attitudes, such as those offered to the research community by Doug Cromey (Cromey ). The authors suggest that related changes in technology have simplified the task of detecting misconduct for journal editors as well as researchers, and that this simplification has caused a shift in the responsibility for reporting misconduct. They also argue that the concept of best practices in image processing can serve as a general model for education in best practices in research.

Buehl, Jonathan. 2014. "Toward an Ethical Rhetoric of the Digital Scientific Image: Learning From the Era When Science Met Photoshop."  Technical Communication Quarterly 23 (3):184-206. doi: 10.1080/10572252.2014.914783.
Over the past two decades, scientific editors have attempted to correct “mistaken” assumptions about scientific images and to curb unethical image-manipulation practices. Reactions to the advent and abuse of image-adjustment software (such as Adobe Photoshop) reveal the complex relations among visual representations, scientific credibility, and epistemic rhetoric. Perelman and Olbrechts-Tyteca's model of argumentation provides a flexible system for understanding these relations and for teaching students to use scientific images ethically and effectively.

Charbonneau, Deborah H. 2013. "Strategies for Data Management Engagement."  Medical Reference Services Quarterly 32 (3):365-374. doi: 10.1080/02763869.2013.807089.
The research landscape is growing dramatically, and librarians are examining new roles, services, and types of collaborations to support data-intensive research. This column describes curricular enhancements at one School of Library and Information Science in the United States. Several key areas of data management in which health sciences librarians may wish to build or enhance their skills are outlined. Possible roles and opportunities for health sciences librarians to strategically engage in data management initiatives are also presented.

Chen, Jinchuan, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao, and Xuan Zhou. 2013. "Big data challenge: a data management perspective."  Frontiers of Computer Science 7 (2):157-164. doi: 10.1007/s11704-013-3903-7.
There is a trend that, virtually everyone, ranging from big Web companies to traditional enterprisers to physical science researchers to social scientists, is either already experiencing or anticipating unprecedented growth in the amount of data available in their world, as well as new opportunities and great untapped value. This paper reviews big data challenges from a data management respective. In particular, we discuss big data diversity, big data reduction, big data integration and cleaning, big data indexing and query, and finally big data analysis and mining. Our survey gives a brief overview about big-data-oriented research and problems.

Corti, Louise, and Veerle Van den Eynden. 2015. "Learning to manage and share data: jump-starting the research methods curriculum."  International Journal of Social Research Methodology 18 (5):545-559. doi: 10.1080/13645579.2015.1062627.
Researchers’ responsibilities towards their research data are changing across all domains of social scientific endeavour. Government, funders, and publishers expect greater transparency of, open access to, and re-use of research data, and fears over data loss call for more robust information security practices. Researchers must develop, enhance and professionalize their research data management skills to meet these challenges and to deal with a rapidly changing data sharing environment. This paper sets out how the authors have contributed to jump-starting the research methods training curriculum in this field by translating high-level needs into practical guidance and training activities. Their pedagogical approach involves applicable, easy-to-digest, modules based on best practice guidance for managing and sharing research data. In line with recent findings on successful practices in methods teaching, they work on the principle of embedding grounded learning activities within existing narratives of research design and implementation.

Cromey, Douglas W. 2010. "Avoiding Twisted Pixels: Ethical Guidelines for the Appropriate Use and Manipulation of Scientific Digital Images."  Science & Engineering Ethics 16 (4):639-667. doi: 10.1007/s11948-010-9201-y.
Digital imaging has provided scientists with new opportunities to acquire and manipulate data using techniques that were difficult or impossible to employ in the past. Because digital images are easier to manipulate than film images, new problems have emerged. One growing concern in the scientific community is that digital images are not being handled with sufficient care. The problem is twofold: (1) the very small, yet troubling, number of intentional falsifications that have been identified, and (2) the more common unintentional, inappropriate manipulation of images for publication. Journals and professional societies have begun to address the issue with specific digital imaging guidelines. Unfortunately, the guidelines provided often do not come with instructions to explain their importance. Thus they deal with what should or should not be done, but not the associated 'why' that is required for understanding the rules. This article proposes 12 guidelines for scientific digital image manipulation and discusses the technical reasons behind these guidelines. These guidelines can be incorporated into lab meetings and graduate student training in order to provoke discussion and begin to bring an end to the culture of 'data beautification'.

De Vries, Jantina, Thomas N. Williams, and Kalifa Bojang. 2014. "Knowing Who to Trust: Exploring the Role of 'Ethical Metadata' in Mediating Risk of Harm in Collaborative Genomics Research in Africa."  BMC Medical Ethics 15:e1-E10.
The authors conducted a study looking at one of the first collaborative genetics research projects in Africa and looked how good management practices can help limit the potential of harm to ethnic groups involved in the project.

Dove, Edward S., David Townend, Eric M. Meslin, Martin Bobrow, Katherine Littler, Dianne Nicol, Jantina de Vries, Anne Junker, Chiara Garattini, Jasper Bovenberg, Mahsa Shabani, Emmanuelle Lévesque, and Bartha M. Knoppers. 2016. "Ethics review for international data-intensive research."  Science 351 (6280):1399-1400. doi: 10.1126/science.aad5269.
Historically, research ethics committees (RECs) have been guided by ethical principles regarding human experimentation intended to protect participants from physical harms and to provide assurance as to their interests and welfare. But research that analyzes large aggregate data sets, possibly including detailed clinical and genomic information of individuals, may require different assessment. This article looks at five such efforts involving multiple countries around the world, including resource-poor settings to identify models that could inform a framework for mutual recognition of international ethics review.

Frugoli, Julia, Anne M. Etgen, and Michael Kuhar. 2010. "Developing and Communicating Responsible Data Management Policies to Trainees and Colleagues."  Science & Engineering Ethics 16 (4):753-762. doi: 10.1007/s11948-010-9219-1.
The basic components of data management including data ownership, collection, selection, recording, analysis, storage, retention, destruction, and sharing. A number of important principles underlie best practices for each of these components; these include recording details such that another can repeat the experiment, keeping the data safe, managing storage in such a way as to facilitate easy retrieval for the period of time required by regulatory agencies and establishing data sharing principles with colleagues before collaborations begin. Experience as practicing scientists and teachers has aided in developing helpful strategies and approaches for communicating these principles, policies and practices to trainees and colleagues. The authors recommend didactic instruction focused by discipline, combined with the use of 'teachable moments' in a student's career, as well as teaching principles versus rules, because changing methods of data collection and storage have implications for data management practices.

Geller, Lisa. 2010. "Data Management in Academic Settings: An Intellectual Property Perspective."  Science & Engineering Ethics 16 (4):769-775. doi: 10.1007/s11948-010-9240-4.
Intellectual property can be an important asset for academic institutions. Good data management practices are important for capture, development and protection of intellectual property assets. Selected issues focused on the relationship between data management and intellectual property are reviewed and a thesis that academic institutions and scientists should honor their obligations to responsibly manage data.

Grand, Ann, Clare Wilkinson, Karen Bultitude, and Alan F. T. Winfield. 2016. "Mapping the hinterland: Data issues in open science."  Public Understanding of Science 25 (1):88-103. doi: 10.1177/0963662514530374.
Open science is a practice in which the scientific process is shared completely and in real time. It offers the potential to support information flow, collaboration and dialogue among professional and non-professional participants. Using semi-structured interviews and case studies, this research investigated the relationship between open science and public engagement. This article concentrates on three particular areas of concern that emerged: first, how to effectively contextualise and narrate information to render it accessible, as opposed to simply available; second, concerns about data quantity and quality; and third, concerns about the skills required for effective contextualisation, mapping and interpretation of information.

 

Hardy, Lisa J., Amy Hughes, Elizabeth Hulen, and Anna L. Schwartz. 2016. "Implementing Qualitative Data Management Plans to Ensure Ethical Standards in Multi-Partner Centers."  Journal of Empirical Research on Human Research Ethics 11 (2):191-198. doi: 10.1177/1556264616636233.
Thorough data management is crucial for the protection of people who participate in research and the ability for researchers to share results with the public. The impact of inadequate adherence to data management is particularly evident in small field sites and among vulnerable populations partoicipating in Community-Based Participatory Research (CBPR). CBPR presents exciting opportunities for multimedia and multi-sectoral dissemination of research results and policy change, especially concerning the impact of research on health equity for underrepresented populations. In this article, the authors discuss how we defined data boundaries and protections to adhere to ethical standards while also prioritizing data dissemination while using CBPR with American Indians in Arizona. Although complex partnerships can introduce additional risks to data oversight, data management practices can also increase opportunities for wide-reaching dissemination. They hope to contribute to the literature on data sharing in multi-partnership projects to bolster the impact of dissemination while also protecting participants and populations who chose to collaborate in research and policy practices.

Joshi, Margi, and Sharon S. Krag. 2010. "Issues in Data Management."  Science & Engineering Ethics 16 (4):743-748. doi: 10.1007/s11948-010-9223-5.
Data management raises a number of issues, both regulatory and non-regulatory. Researchers should understand how data are defined by their particular institutions and regulatory authorities. Data are the bases of scientific communication and provide a strong defense against allegations of scientific misconduct. Authorization is often necessary before collection of data can commence. Proper handling, retention, and storage of data, especially that involving humans, are crucial for the researcher. Data ownership by the institution leads to a responsibility by the institution to educate all its researchers in responsible data management practices.

Kaplan, Bonnie. 2016. "How Should Health Data Be Used?"  Cambridge Quarterly of Healthcare Ethics 25 (2):312-29. doi: 10.1017/s0963180115000614.
Electronic health records, data sharing, big data, data mining, and secondary use are enabling exciting opportunities for improving health and healthcare while also exacerbating privacy concerns. Two court cases about selling prescription data, the Sorrell case in the U.S. and the Source case in the U.K., raise questions of what constitutes "privacy" and "public interest"; they present an opportunity for ethical analysis of data privacy, commodifying data for sale and ownership, combining public and private data, data for research, and transparency and consent. These interwoven issues involve discussion of big data benefits and harms and touch on common dualities of the individual versus the aggregate or the public interest, research (or, more broadly, innovation) versus privacy, individual versus institutional power, identification versus identity and authentication, and virtual versus real individuals and contextualized information. Transparency, flexibility, and accountability are needed for assessing appropriate, judicious, and ethical data uses and users, as some are more compatible with societal norms and values than others.

King, Gary. 2011. “Ensuring the Data-Rich Future of the Social Sciences.” Science 331 (6018): 719-721). doi: 10.1126/science.1197872.
Massive increases in the availability of informative social science data are making dramatic progress possible in analyzing, understanding, and addressing many major societal problems. Yet the same forces pose severe challenges to the scientific infrastructure supporting data sharing, data management, informatics, statistical methodology, and research ethics and policy, and these are collectively holding back progress. I address these changes and challenges and suggest what can be done.

Lounsbury, David W., Thomas C. Reynolds, Bruce D. Rapkin, Mark E. Robson, and Jamie Ostroff. 2007. "Protecting the privacy of third-party information: Recommendations for social and behavioral health researchers."  Social Science & Medicine 64 (1):213-222. doi: 10.1016/j.socscimed.2006.08.035.
In psychosocial and health-behavioral research, we often request that research participants provide information on significant individuals in their lives, so-called “third parties”. Recently there has been a greater recognition of privacy issues and risks in research pertaining to third parties. Reaction on the part of USA federal regulatory authorities to one study, which attempted to collect survey data about the psychiatric history of respondents’ parents, has generated such concern and caution that longstanding practices for the collection of social determinants of health data are being questioned and are at risk of being disallowed by Institutional Review Boards (IRBs). In this paper, the authors consider third party research rights and risks from the perspective of social and behavioral scientists.

Lunnay, Belinda, Joseph Borlagdan, Darlene McNaughton, and Paul Ward. 2015. "Ethical Use of Social Media to Facilitate Qualitative Research."  Qualitative Health Research 25 (1):99-109. doi: 10.1177/1049732314549031.
Increasingly, qualitative health researchers might consider using social media to facilitate communication with participants. Ambiguity surrounding the potential risks intrinsic to social media could hinder ethical conduct and discourage use of this innovative method. The authors  used some core principles of traditional human research ethics, that is, respect, integrity, and beneficence, to design their photo elicitation research that explored the social influences of drinking alcohol among 34 underage women in metropolitan South Australia. Facebook aided our communication with participants, including correspondence ranging from recruitment to feeding back results and sharing research data. This article outlines the ethical issues they encountered when using Facebook to interact with participants and provides guidance to researchers planning to incorporate social media as a tool in their qualitative studies.

Lustgarten, Samuel D. 2015. "Emerging Ethical Threats to Client Privacy in Cloud Communication and Data Storage."  Professional Psychology: Research & Practice 46 (3):154-160. doi: 10.1037/pro0000018.
This article reviews and contends that current APA ethics and record-keeping guidelines, the Health Insurance Portability and Accountability Act, and the Health Information Technology for Economic and Clinical Health Act do not adequately account for this new information and other emerging threats to client confidentiality. As psychologists bear the responsibility for being informed, protecting and maintaining client records, and preventing breaches, it is vital that the field establish specific best practices and present regular security updates to colleagues.

Margi, Joshi and Sharon S. Krag. 2010. Issues in Data Management. Science and Engineering Ethics. 16(4): 743-747.
Data management raises a number of issues, both regulatory and non-regulatory. Researchers should understand how data are defined by their particular institutions and regulatory authorities. Data are the bases of scientific communication and provide a strong defense against allegations of scientific misconduct. Authorization is often necessary before collection of data can commence. Proper handling, retention, and storage of data, especially that involving humans, are crucial for the researcher. Data ownership by the institution leads to a responsibility by the institution to educate all its researchers in responsible data management practices.

McEwen, Jean E., Joy T. Boyer, and Kathie Y. Sun. 2013. "Evolving approaches to the ethical management of genomic data."  Trends in Genetics 29 (6):375-382. doi: 10.1016/j.tig.2013.02.001.
The ethical landscape in the field of genomics is rapidly shifting. Plummeting sequencing costs, along with ongoing advances in bioinformatics, now make it possible to generate an enormous volume of genomic data about vast numbers of people. The informational richness, complexity, and frequently uncertain meaning of these data, coupled with evolving norms surrounding the sharing of data and samples and persistent privacy concerns, have generated a range of approaches to the ethical management of genomic information. As calls increase for the expanded use of broad or even open consent, and as controversy grows about how best to handle incidental genomic findings, these approaches, informed by normative analysis and empirical data, will continue to evolve alongside the science.

Nan, Zhang, and Zhao Wei. 2007. "Privacy-Preserving Data Mining Systems."  Computer 40 (4):52-58.
The article discusses privacy concerns relating to data mining. Data mining extracts knowledge to support a variety of domains including marketing, medical diagnosis, and national security. It is still a challenge to mine certain types of data without violating the data owner's privacy. Analysts say any design of a privacy-preserving data mining system requires a clear definition of privacy. They say systems should operate on a "minimum-necessary" design principle. Minimum captures the idea that all unnecessary private information should not be disclosed.

Navarro, Robert. 2008. "An ethical framework for sharing patient data without consent."  Informatics in Primary Care 16 (4):257-262.
There is no consensus on how to share patient records privately. Data privacy concepts are surveyed and a framework is presented for the safe sharing of sensitive data. It is argued that tailoring the data sharing to the privacy breach risks of each project holds out the best compromise for keeping the trust of the public and providing for the best quality data where detailed patient consent is not possible. Any harm arising from data sharing must come from the data being identified, either fully or partially. The first step is an agreement on an acceptable privacy breach risk. Next, proceed to measure that risk for the proposed data when held by a given recipient. Finally, select from a menu of mitigation strategies (people, process and technical) to achieve acceptable risk. The framework is tested against the current UK approach administered by the Patient Information Advisory Group. The hard problem of non-consented data sharing should be divided into the easier (though non-trivial) ones of data and recipient breach risk measurement. Directed research in these two areas will help move the data sharing problem into the 'solved' pile.

Nyika, Aceme, Wencelaus Kilama, Godfrey B. Tangwa, Roma Chilengi, and Paulina Tindana. 2009. "Capacity building of ethics review committees across Africa based on the results of a comprehensive needs assessment survey."  Developing World Bioethics 9 (3):149-56. doi: 10.1111/j.1471-8847.2008.00243.x.
A needs assessment survey of ethics review committees (ERCs) across Africa was conducted in order to establish their major needs and areas of weaknesses in terms of ethical review capacity.  Overall, the survey identified the major constraints on ERCs as lack of office equipment,  lack of electronic data management systems, inadequate resources, lack of or insufficient expertise on the committees, and poor recognition of the importance of the role of the committees. Consequently, the authors are addressing the identified needs and weaknesses through the Bill and Melinda Gates Foundation-funded capacity building project. The impact of the intervention project will be assessed during and at the end of the four-year longitudinal project.

Ossorio, Pilar N. 2011. "Bodies of Data: Genomic Data and Bioscience Data Sharing."  Social Research 78 (3):907-932.
The article discusses the legal, ethical, and social implications (ELSI) of sharing data from genomics projects and the biosciences, examining the use of controlled-access mechanisms meant to protect data sources, biobanking and the use of human biological materials, and research regulations. The author discusses ELSI research on genome projects, looking at differences between the treatment of biological materials and data which describes them, and examines ways in which genome scientists comply with data access restrictions to protect their careers or fail to comply with regulations due to the conviction that genetic data will not be used against sources.

Pankajdeep, Kaur, and Awal Monga. 2015. "Big Data Management."  International Journal of Advance Foundation And Research In Science & Engineering 1 (Special Issue).
Big Data Management (BDM) is the governance and management of huge volumes of all types of data. Big data management is the drastic change to technology that will help to make a better society and the industrial sector. This document deals with the different key factors kept in mind while the management of big data so that organizations can get secure and error free accurate data. This document gives the steps which can be used for managing Big Data and also the role of Hadoop in managing it. Different types of data are increasing and realigning so that industries can fully leverage big data, not just manage it. At the same time, big data must evidently find a permanent and an important place in enterprise data management. Big Data Management deals with the integration, manipulation, quality and governance and management of Big Data including the key factors- Volume, Velocity and Variety of Big Data.

Pascal, Chris B. 2006. "Managing Data for Integrity: Policies and Procedures for Ensuring the Accuracy and Quality of the Data in the Laboratory."  Science & Engineering Ethics 12 (1):23-39.
Management of the research data is an extremely important responsibility of the Principal Investigator (PI) and other members of the research team. Without accurate data, no worthwhile conclusions can be drawn from the research study. Integrity in data management is critical to the success of the research group and to public trust in the research outcomes. One of the primary responsibilities of the PI is to provide proper training to the junior members of the lab. This effort can be buttressed by institutional data policies that are implemented at the group level. Extensive and frequent guidance in good research practices by the PI and other senior research staff is critical to the proper training of new scientists.

Pryor, Graham, and Martin Donnelly. 2009. "Skilling Up to Do Data: Whose Role, Whose Responsibility, Whose Career?"  International Journal of Digital Curation 4 (2):158-170. doi: 10.2218/ijdc.v4i2.105.
What are the roles necessary to effective data management and what kinds of expertise are needed by the researchers and data specialists who are filling those roles? These questions were posed at a workshop of data creators and curators whose delegates challenged the DCC and RIN to identify the training needs and career opportunities for the broad cohort that finds itself working in data management – sometimes by design but more often by accident. This paper revisits previous investigations into the roles and responsibilities required by a “data workforce”, presents a representative spectrum of informed opinion from the DCC Research Data Management Forum, and makes some recommendations for raising capability, capacity and status.

Rajaretnam, Thilla. 2014. "Data Mining and Data Matching: Regulatory and ethical considerations relating to privacy and confidentiality in medial data."  Journal of International Commercial Law & Technology 9 (4):294-310.
The application of data mining techniques to health-related data is beneficial to medical research. However, the use of data mining or knowledge discovery in databases, and data matching and profiling techniques, raises ethical concerns relating to consent and undermines the confidentiality of medical data. Data mining and data matching requires active collaboration between the medical practitioner and the data miner. This article examines the ethical management of medical data including personal information and sensitive information in the healthcare sector. It offers some ethical and legal perspectives on privacy and the confidentiality of medical data.

Richardson, Julie, and Diane Hoffman-Kim. 2010. "The Importance of Defining 'Data' in Data Management Policies."  Science & Engineering Ethics 16 (4):749-751. doi: 10.1007/s11948-010-9231-5.
What comprises 'data' varies from one institution to another based on the information which is deemed important by individual institutions. To effectively and efficiently produce, collect, and retain data, an organization develops specific defining characteristics of data to meet its informational needs. Procedures to maintain and retain knowledge among laboratory members and principal investigators will allow for improved efficiency of data collection. Optimization of communication, maintenance of inventories, record keeping, and updating relevant training programs are all critical to supporting the quality and integrity of a particular organization's data. Concurrent revisions to such procedures will ensure that the definition of data as well as the means by which it is collected and maintained remain appropriate to the needs of the individual organization.

Stodden, Victoria, Peixuan Guo, and Zhaokun Ma. 2013. "Toward Reproducible Computational Research: An Empirical Analysis of Data and Code Policy Adoption by Journals."  PLoS ONE 8 (6):1-8. doi: 10.1371/journal.pone.0067111.
Journal policy on research data and code availability is an important part of the ongoing shift toward publishing reproducible computational science. This article extends the literature by studying journal data sharing policies by year (for both 2011 and 2012) for a referent set of 170 journals. The authors also evaluated code sharing policies, supplemental materials policies, and open access status for these 170 journals for each of 2011 and 2012. As part of the project, they built a predictive model of open data and code policy adoption as a function of impact factor and publisher and found that higher impact journals were more likely to have open data and code policies and scientific societies were more likely to have open data and code policies than commercial publishers.

Tal, Aner. 2016. "Rigor Mortis: Statistical thoroughness in reporting and the making of truth."  Public Understanding of Science 25 (2):141-145. doi: 10.1177/0963662515619836.
Should a uniform checklist be adopted for methodological and statistical reporting? The current article discusses this notion, with particular attention to the use of old versus new statistics, and a consideration of the arguments brought up by Von Roten. The article argues that an overly exhaustive checklist that is uniformly applied to all submitted papers may be unsuitable for multidisciplinary work, and would further result in undue clutter and potentially distract reviewers from pertinent considerations in their evaluation of research articles.

Vieira, Ricardo, Filipe Ferreira, José Barateiro, and José Borbinha. 2014. "Data Management with Risk Management in Engineering and Science Projects."  New Review of Information Networking 19 (2):49-66. doi: 10.1080/13614576.2014.918519.
Engineering and Science projects are facing new data management challenges, which are currently being addressed through the development of data management plans. However, project stakeholders are unable to proper assess if their plan is representative of good data management practices. To address this issue, the authors propose a risk management framework to assist the definition of a data management plan in engineering and science projects. To validate their proposal, they present an application to a civil engineering project concerning dam safety.

Vollmer, Sara H., and George Howard. 2010. "Statistical Power, the Belmont Report, and the Ethics of Clinical Trials."  Science & Engineering Ethics 16 (4):675-691. doi: 10.1007/s11948-010-9244-0.
Achieving a good clinical trial design increases the likelihood that a trial will take place as planned, including that data will be obtained from a sufficient number of participants, and the total number of participants will be the minimal required to gain the knowledge sought. A good trial design also increases the likelihood that the knowledge sought by the experiment will be forthcoming. Achieving such a design is more than good sense-it is ethically required in experiments when participants are at risk of harm. This paper argues that doing a power analysis effectively contributes to ensuring that a trial design is good. The ethical importance of good trial design has long been recognized for trials in which there is risk of serious harm to participants. However, whether the quality of a trial design, when the risk to participants is only minimal, is an ethical issue is rarely discussed. This paper argues that even in cases when the risk is minimal, the quality of the trial design is an ethical issue, and that this is reflected in the emphasis the Belmont Report places on the importance of the benefit of knowledge gained by society. The paper also argues that good trial design is required for true informed consent.

Zarate, Oscar A., Julia Green Brody, Phil Brown, Monica D. Ramirez-Andreotta, Laura Perovich, and Jacob Matz. 2016. "Balancing Benefits and Risks of Immortal Data."  Hastings Center Report 46 (1):36-45. doi: 10.1002/hast.523.
An individual's health, genetic, or environmental-exposure data, placed in an online repository, creates a valuable shared resource that can accelerate biomedical research and even open opportunities for crowd-sourcing discoveries by members of the public. But these data become 'immortalized' in ways that may create lasting risk as well as benefit. The ethical challenges of online data sharing are heightened as so-called big data becomes an increasingly important research tool and driver of new research structures. Big data is shifting research to include large numbers of researchers and institutions as well as large numbers of participants providing diverse types of data, so the participants' consent relationship is no longer with a person or even a research institution. In this article, the authors consider how expanded data sharing poses new challenges, illustrated by genomics and the transition to new models of consent. They use the case study of participants in an open data platform-the Personal Genome Project-to allow study participants to contribute their voices to inform ethical consent practices and protocol reviews for big-data research.

 

Data Sharing

Allison, John R., and William W. Cooper. 1992. "Data Disclosure and Data Sharing in Scientific Research."  Accountability in Research 2 (2):93-132. doi: 10.1080/08989629208573808.

Data sharing is examined for its bearing on (i) quality assurance and (ii) extensions of results in scientific research as well as (iii) part of a tradition of openness in science. It is suggested that sharing can be accomplished in a simple manner that is also sufficiently flexible to fit varying individual situations by asking authors of data dependent articles and grant proposals to footnote (a) whether they are willing to make their data available to others and, if so, (b) how the data may be accessed. Appendices report results from a survey of current policies and practices in professional societies and in Federal government fund granting agencies. Emphasis is on the social and management sciences.

 

Bishop, Libby. 2009. "Ethical Sharing and Reuse of Qualitative Data."  Australian Journal of Social Issues (Australian Council of Social Service) 44 (3):255-272.
The objective of this paper is to attempt to move beyond the impasse of ethical objections to reusing qualitative data. In doing so, there is no intention of dismissing the importance of ethical debates, in fact, quite the opposite. The debate about ethical reuse needs to be deepened and broadened. First, the current terrain of research ethics will be summarised and situated in the context of broader philosophical ethical frameworks. In contrast, the debates around ethics of archiving have often been narrowly focused on participants' rights. The framework of debate should be broadened first by recognising other entities traditionally deemed within the scope of research ethics, namely the scholarly community and the public. The second useful broadening of the framework is provided by a deontological ethical stance with its focus on duties. In the final section, this expanded framework will be used to rebut several common ethical arguments against archiving qualitative data: archiving violates confidentiality; informed consent for reusing data is impossible; reusing data violates

Brakewood, Beth, and Russell A. Poldrack. 2013. "The ethics of secondary data analysis: Considering the application of Belmont principles to the sharing of neuroimaging data."  NeuroImage 82:671-676. doi: 10.1016/j.neuroimage.2013.02.040.
The sharing of data is essential to increasing the speed of scientific discovery and maximizing the value of public investment in scientific research. However, the sharing of human neuroimaging data poses unique ethical concerns. We outline how data sharing relates to the Belmont principles of respect-for-persons, justice, and beneficence. Whereas regulators of human subjects research often view data sharing solely in terms of potential risks to subjects, we argue that the principles of human subject research require an analysis of both risks and benefits, and that such an analysis suggests that researchers may have a positive duty to share data in order to maximize the contribution that individual participants have made.

Bull, Susan, Phaik Yeong Cheah, Spencer Denny, Irene Jao, Vicki Marsh, Laura Merson, Neena Shah More, Le Nguyen Thanh Nhan, David Osrin, Decha Tangseefa, Douglas Wassenaar, and Michael Parker. 2015. "Best practices for ethical sharing of individual-level health research data from low- and middle-income settings."  Journal of Empirical Research on Human Research Ethics 10 (3):302-313. doi: 10.1177/1556264615594606.
Sharing individual-level data from clinical and public health research is increasingly being seen as a core requirement for effective and efficient biomedical research. This article discusses the results of a systematic review and multisite qualitative study of key stakeholders’ perspectives on best practices in ethical data sharing in low- and middle-income settings. Our research suggests that for data sharing to be effective and sustainable, multiple social and ethical requirements need to be met. An effective model of data sharing will be one in which considered judgments will need to be made about how best to achieve scientific progress, minimize risks of harm, promote fairness and reciprocity, and build and sustain trust.

Check Hayden, Erika. 2015. "Ebola teaches tough lessons about rapid research."  Nature 521 (7553):405-406. doi: 10.1038/521405a.
The article focuses on the World Health Organization's (WHO) work on a framework on pathogens that merit advanced research that would catalogue the requirements to prepare possible vaccines, treatments, and diagnostics for large-scale efficacy testing in case of a disease outbreak. It mentions the WHO's failure over large-scale testing of vaccines during the Ebola breakout in 2014-2015. It examines factors behind delays including disagreements over ethics and logistics and data sharing.

Duke, Clifford S., and John H. Porter. 2013. "The Ethics of Data Sharing and Reuse in Biology."  BioScience 63 (6):483-489. doi: 10.1525/bio.2013.63.6.10.
Recent increases in capabilities for gathering, storing, accessing, and sharing data are creating corresponding opportunities for scientists to use data generated by others in their own research. Although sharing data and crediting sources are among the most basic of scientific ethical principles, formal ethical guidelines for data reuse have not been articulated in the biological sciences community. This article offers a framework for developing ethical principles on data reuse, addressing issues such as citation and coauthorship, with the aim of stimulating a conversation in the science community and with the goal of having professional societies formally incorporate considerations of data reuse into their codes of ethics.

Eschenfelder, Kristin R., and Andrew Johnson. 2014. "Managing the Data Commons: Controlled Sharing of Scholarly Data."  Journal of the Association for Information Science & Technology 65 (9):1757-1774. doi: 10.1002/asi.23086.
This paper describes the range and variation in access and use control policies and tools used by 24 web-based data repositories across a variety of fields. It also describes the rationale provided by repositories for their decisions to control data or provide means for depositors to do so.

Ferguson Adam R., Jessica L. Nielson, Melissa H. Cragin,Bandrowski AE, Martone ME 2014. “Big data from small data: data-sharing in the 'long tail' of neuroscience.” Nature Neuroscience 17: 1442–1447.
The launch of the US BRAIN and European Human Brain Projects coincides with growing international efforts toward transparency and increased access to publicly funded research in the neurosciences. The need for data-sharing standards and neuroinformatics infrastructure is more pressing than ever. However, 'big science' efforts are not the only drivers of data-sharing needs, as neuroscientists across the full spectrum of research grapple with the overwhelming volume of data being generated daily and a scientific environment that is increasingly focused on collaboration. In this commentary, we consider the issue of sharing of the richly diverse and heterogeneous small data sets produced by individual neuroscientists, so-called long-tail data. We consider the utility of these data, the diversity of repositories and options available for sharing such data, and emerging best practices. We provide use cases in which aggregating and mining diverse long-tail data convert numerous small data sources into big data for improved knowledge about neuroscience-related disorders.

Giffels, Joe. 2010. "Sharing Data is a Shared Responsibility."  Science & Engineering Ethics 16 (4):801-803. doi: 10.1007/s11948-010-9230-6.
Research data should be made readily available. A robust data-sharing plan, led by the principal investigator of the research project, requires considerable administrative and operational resources. Because external support for data sharing is minimal, principal investigators should consider engaging existing institutional information experts, such as librarians and information systems personnel, to participate in data-sharing efforts.

Lemke, Amy A., Maureen E. Smith, Wendy A. Wolf, and Susan Brown Trinidad. 2011. "Broad data sharing in genetic research: Views of institutional review board professionals."  IRB: Ethics & Human Research 33 (3):1-5.
Given the central role of IRBs in facilitating data sharing in genetic research and the important ethical and regulatory issues data sharing raises, the authors conducted a survey to investigate the views, experiences, and attitudes of IRB professionals regarding broad data sharing in genetic research. Results suggest a need for increased education for IRB professionals regarding the NIH policy for sharing GWAS (genome-wide association studies) data; collaboration among IRB professionals and researchers to define best practices and perhaps harmonize standards; and further empirical research to determine information needs and preferences of prospective research participants in the context of wide data sharing.

Mascalzoni, Deborah, Edward S. Dove, Yaffa Rubinstein, Hugh J. S. Dawkins, Anna Kole, Pauline McCormack, Simon Woods, Olaf Riess, Franz Schaefer, Hanns Lochmüller, Bartha M. Knoppers, and Mats Hansson. 2015. "International Charter of principles for sharing bio-specimens and data."  European Journal of Human Genetics 23 (6):721-728. doi: 10.1038/ejhg.2014.197.
There is a growing international agreement on the need to provide greater access to research data and bio-specimen collections to optimize their long-term value and exploit their potential for health discovery and validation. This is especially evident for rare disease research. Currently, the rising value of data and bio-specimen collections does not correspond with an equal increase in data/sample-sharing and data/sample access. Contradictory legal and ethical frameworks across national borders are obstacles to effective sharing: more specifically, the absence of an integrated model proves to be a major logistical obstruction. The Charter intends to amend the obstacle by providing both the ethical foundations on which data sharing should be based, as well as a general Material and Data Transfer Agreement. This Charter is the result of a careful negotiation of different stakeholders' interest and is built on earlier consensus documents and position statements, which provided the general international legal framework. Further to this, the Charter provides tools that may help accelerate sharing. The Charter has been formulated to serve as an enabling tool for effective and transparent data and bio-specimen sharing and constitutes a mechanism to ensure uniformity of access across projects and countries,

McCormack, Pauline, Anna Kole, Sabina Gainotti, Deborah Mascalzoni, Caron Molster, Hanns Lochmuller, and Simon Woods. 2016. "'You should at least ask'. The expectations, hopes and fears of rare disease patients on large-scale data and biomaterial sharing for genomics research."  European Journal of Human Genetics doi: 10.1038/ejhg.2016.30.
Within the myriad articles about participants' opinions of genomics research, the views of a distinct group - people with a rare disease (RD) - are unknown. It is important to understand if their opinions differ from the general public by dint of having a rare disease and vulnerabilities inherent in this.  This article explores RD patients' attitudes to participation in genomics research, particularly around large-scale, international data and biosample sharing.

Merson, Laura, Tran Viet Phong, Le Nguyen Thanh Nhan, Nguyen Thanh Dung, Ta Thi Dieu Ngan, Nguyen Van Kinh, Michael Parker, and Susan Bull. 2015. "Trust, respect, and reciprocity: Informing culturally appropriate data-sharing practice in Vietnam."  Journal of Empirical Research on Human Research Ethics 10 (3):251-263. doi: 10.1177/1556264615592387.
International science funders and publishers are driving a growing trend in data sharing. There is mounting pressure on researchers in low- and middle-income settings to conform to new sharing policies, despite minimal empirically grounded accounts of the ethical challenges of implementing the policies in these settings. This study used in-depth interviews and focus group discussions with 48 stakeholders in Vietnam to explore the experiences, attitudes, and expectations that inform ethical and effective approaches to sharing clinical research data. Distinct views on the role of trust, respect, and reciprocity were among those that emerged to inform culturally appropriate best practices. The authors conclude by discussing the challenges that authors of data-sharing policies should consider in this unique context.

 

Parker, Michael, and Susan Bull. 2015. "Sharing public health research data: Toward the development of ethical data-sharing practice in low- and middle-income settings."  Journal of Empirical Research on Human Research Ethics 10 (3):217-224. doi: 10.1177/1556264615593494.
It is increasingly recognized that effective and appropriate data sharing requires the development of models of good data-sharing practice capable of taking seriously both the potential benefits to be gained and the importance of ensuring that the rights and interests of participants are respected and that risk of harms is minimized. Despite its potential importance, data sharing presents important ethical, social, and institutional challenges in low-income settings. In this article, the authors report on qualitative research conducted in five low- and middle-income countries exploring the experiences of key research stakeholders and their views about what constitutes good data-sharing practice.

 

Poldrack, Russell A., and Krzysztof J. Gorgolewski. 2014. "Making big data open: data sharing in neuroimaging."  Nature Neuroscience 17 (11):1510-1517. doi: 10.1038/nn.3818.
In the last decade, major advances have been made in the availability of shared neuroimaging data, such that there are more than 8,000 shared MRI (magnetic resonance imaging) data sets available online.  The authors outline the state of data sharing for task-based functional MRI (fMRI) data, with a focus on various forms of data and their relative utility for subsequent analyses. They also discuss challenges to the future success of data sharing and highlight the ethical argument that data sharing may be necessary to maximize the contribution of human subjects.


Sieber, Joan E. 1988. "Data sharing: Defining problems and seeking solutions."  Law and Human Behavior 12 (2):199-206. doi: 10.1007/BF01073128.
Describes a range of meanings or contexts for sharing of research data in which some kinds of sharing are more problematic than others. Objections to data sharing are bolstered by examples from the more difficult end of this range. Constrained sharing, including possible contractual and licensing agreements rather than merely "handing over the data," is a likely norm to emerge at the more difficult end of the spectrum. A case example suggests that even the most difficult cases may yield to constrained sharing solutions that benefit all parties involved. It is concluded that if research is conducted with sharing in mind, and if the scientist is acquainted with constrained sharing, then the dilemmas that pose as arguments against sharing will occur rarely.

Soranno, Patricia A., Kendra S. Cheruvelil, Kevin C. Elliott, and Georgina M. Montgomery. 2015. "It's Good to Share: Why Environmental Scientists’ Ethics Are Out of Date."  BioScience 65 (1):69-73.
Although there have been many recent calls for increased data sharing, the majority of environmental scientists do not make their individual data sets publicly available in online repositories. There has not been a strong ethical impetus for sharing data within the current culture, behaviors, and practices of environmental scientists. In this article, the authors describe a critical shift that is happening in both society and the environmental science community that makes data sharing not just good but ethically obligatory. This is a shift toward the ethical value of promoting inclusivity within and beyond science. An essential element of a truly inclusionary and democratic approach to science is to share data through publicly accessible data sets.

Verlinden, M., H. Nys, N. Ectors, and I. Huys. 2016. "Qualitative study on custodianship of human biological material and data stored in biobanks."  BMC Medical Ethics 17 (1):15. doi: 10.1186/s12910-016-0098-0.
Balancing the rights and obligations of custodians and applicants in relation to access to biobanks is of utmost importance to guarantee trust and confidence. This study aimed to reveal which issues divide different stakeholders in an attempt to determine the rights and/or obligations held on human biological materials (HBM) and data.

Wallis, Jillian C., Elizabeth Rolando, and Christine L. Borgman. 2013. "If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology."  PLoS ONE 8 (7):1-17. doi: 10.1371/journal.pone.0067332.
Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. The authors report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center. They found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data.  Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

 

Welsh, Robert K., Craig R. Lareau, Jeanne K. Clevenger, and Mark A. Reger. 2008. "Ethical and Legal Considerations Regarding Disputed Authorship with the Use of Shared Data."  Accountability in Research: Policies & Quality Assurance 15 (2):105-131. doi: 10.1080/08989620801946925.
Little guidance is currently available for handling disputes between research mentors and students when working with shared data. This article analyzes how the ethical guidelines from the American Psychological Association (APA), the Office of Research Integrity (ORI), and the American Educational Research Association (AERA) can inform common disputes in this area.

 

Wessels, Bridgette, Rachel L. Finn, Peter Linde, Paolo Mazzetti, Stefano Nativi, Susan Riley, Rod Smallwood, Mark J. Taylor, Victoria Tsoukala, Kush Wadhwa, and Sally Wyatt. 2014. "Issues in the development of open access to research data."  Prometheus 32 (1):49-66. doi: 10.1080/08109028.2014.956505.
This paper explores key issues in the development of open access to research data. The use of digital means for developing, storing and manipulating data is creating a focus on ‘data-driven science’. One aspect of this focus is the development of ‘open access’ to research data. The development of open access requires attention to the ways data are considered in different areas of research. The authors identify how open access is being unevenly developed across the research environment and the consequences this has in terms of generating data gaps. Data gaps refer to the way data becomes detached from published conclusions. To address these issues, they examine four main areas in developing open access to research data: stakeholder roles and values; technological requirements for managing and sharing data; legal and ethical regulations and procedures; institutional roles and policy frameworks.

Williams, Garrath, and Iris Pigeot. 2016. "Consent and confidentiality in the light of recent demands for data sharing."  Biometrical Journal.  doi: 10.1002/bimj.201500044.
This opinion paper critically looks at conventional approaches to research ethics that emphasize consent and data protection. Some of these measures, such as the sharing of individual-level data, pose problems for standard understandings of consent and privacy. The authors argue that  these interpretations tend to be overdemanding: They do not really protect research subjects and they hinder the research process. Accordingly, they that Individual consent must be situated alongside the wider distribution of knowledge created when the actions, commitments, and procedures of researchers and their institutions are opened to scrutiny. And instead of simply emphasizing privacy or data protection, researchers should understand confidentiality as a principle that facilitates the sharing of information while upholding important safeguards.

Zinner, Darren. E., Genevieve Pham-Kanter, and Eric. G. Campbell. 2016. "The Changing Nature of Scientific Sharing and Withholding in Academic Life Sciences Research: Trends From National Surveys in 2000 and 2013."  Academic Medicine 91 (3):433-40. doi: 10.1097/acm.0000000000001028.
Since 2000, federal funders and many journals have established policies requiring more open sharing of data and materials post-publication, primarily through online supplements and third-party repositories. This study examined changes in sharing and withholding practices among academic life scientists, particularly geneticists, between 2000 and 2013.