Open Data MOOC

Lesson 1.2: Ethics in open data lifecycle

Photo by Neil Palmer (CIAT) licensed under CC BY-SA 2.0

Aims and learning outcomes

The lesson aims to:
  • list the stages of the open data lifecycle
  • define responsible use of data
  • provide an overview on principles of responsible data
  • define responsible data concepts
  • provide an overview in responsible data challenges in agriculture
  • list ethics questions to ask at stages of the open data lifecycle.
After studying this lesson, you should be able to:
  • understand stages of the open data lifecycle
  • define challenges with use of data
  • understand responsible data principles
  • become aware of responsible data challenges in agriculture
  • identify ethics questions to ask at stages of the open data lifecycle.

1. Ethical and responsible use of open data

From its generation to administrative practice, data is involved in different stages which we will refer to here as the data lifecycle. There are various models to identify an open data lifecycle such as that of Van den Broek et al. comprising the steps: (1) identification, (2) preparation, (3) publication, (4) reuse, and (5) evaluation[1]. Ethical considerations should be involved in every steps of the open data lifecycle.
Sharing data enables recreation of new insights from existing data but it has also potential to introduce ethical risks. We can’t assume that ‘open,’ ‘shared’ and ‘public’ are automatically for the ‘public good’, particularly when a dataset is a combination of other datasets for which we don’t know when the data was first collected or for what purposes. Data has the potential to empower new voices and approaches, but it can also expose the vulnerable and marginalized. For example, when data on land productivity for an area of land for which the farmer lacks secure documentation becomes available to a wide audience, this can lead to competition over the particular plot of land and potentially lead to displacement for smallholder farmers or vulnerable communities, and ultimately, loss of their homes and livelihood. Responsible data ethics can often account for the difference between these binaries or polar extremes[2].
These challenges place an even bigger responsibility in the hands of intermediaries, who need to be aware of the opportunities and risks of the data that they are working with, and to embed sensitivity and responsibility in their data-handling practices. Adopting a critical approach to avert a ‘data for data’s sake’ methodology will go a long way towards ensuring fairness and balance. Future-proofing is also an issue: what seems unproblematic data right now, for example, may turn out to be very sensitive in the future. Being mindful of the particular vulnerabilities and circumstances of the worst-off within the communities that you work for and with can go a long way towards averting or containing harm.
Engaging with responsible data practices means upholding a certain set of ethical practices with regards to the way you use data to help projects enhance the good they aim to do, and to avoid inadvertent harm[3].

2. Responsible data in agriculture

Later in this lesson we will see that most of the discourse about ethical use of data relates to the privacy of individuals, often in a medical context. In the case of the agricultural sector we will see that it may also involve the socio-economic relationships between stakeholders in the value chain.
America’s first legal case on agricultural data is given here as a use case in responsible use of data. This was a lawsuit concerning data intermediary AgriStats, which undertakes the anonymization of farm data. Despite the mission statement of AgriStats being to ‘improve the bottom line profitability for our participants by providing accurate and timely comparative data while preserving confidentiality of individual companies’, it is believed to have shared individual production data with Tyson Foods, Perdue Farms and other integrators in the value chain. These value-chain partners used data to drive production prices down at the expense of the growers. Data provided by AgriStats allegedly revealed crucial data points of individual farms to the value-chain partners who used this for their benefit. In this case the growers had been sharing data with AgriStats in order to get better comparative data about their efficiency and thus help to improve broiler chicken production. The growers trusted AgriStats that this data would indeed be kept confidential and, although their business model clearly thrives on this trust, in this case data intermediary AgriStats failed to deliver[4].
Actors in the agricultural, nutrition and land data sectors face responsible data challenges. GODAN’s report on Responsible Data in Agriculture (Ferris and Rahman) provides an overview of responsible data challenges focusing on power imbalances. One of the examples given deals with the precision agriculture which provides farmers with information and farm management advice to improve their decision making and to optimize their activity. Most precision farming applications are employed in highly capital-intensive farming systems and most of the access to technologies and data remains in the hands of a few, large-scale farmers and service providers. Specialized companies offer access to software and data to assist with precision agriculture, but only the better resourced companies and farmers can take advantage of these new offerings. The report lists a few prominent challenges in the agricultural sector that are equally relevant for the nutrition sector as well:
  • Potential for data breaches. It is difficult to accurately assess what may be the consequences of data breaches from companies or actors who deal with a lot of data within the sector. Data breaches are not uncommon though and are growing in number; this issue seems certain to be significant in the future. This concern is also common among research institutions that may not yet have developed secure repositories for data.
  • Sensitive data. When considering vulnerable communities and contexts, it seems commonly understood within the sector that certain types of health, nutrition and agricultural data are sensitive in and of themselves and therefore precautions should be taken in determining whether to collect and share this data at all. For example, HIV status may be included in an individual’s anthropometric and health data. To address these sensitivities, it is possible to publish data in an aggregated format to avoid exposing individual’s sensitive information.
  • Data ownership. With increasing amounts of data being created about farming and by farmers, one key issue is around ownership of data. The issues around ownership of data generated through new areas of agriculture technology remain relatively unexplored.
  • Vulnerable communities. The issues and tensions mentioned above become even more stark when it comes to particularly vulnerable communities such as indigenous populations, migrant farmers and displaced smallholder farmers who are lacking in basic land rights; women are especially vulnerable in such circumstances.

3. Principles of responsible data reuse

Responsible data has been given a working definition of: ‘The duty to ensure people’s rights to consent, privacy, security and ownership around the information processes of collection, analysis, storage, presentation and reuse of data, while respecting the values of transparency and openness.’[5]
Two basic principles underlie the need for responsible data: empowerment – to empower users to be active participants rather than passive data ‘subjects’; and harm avoidance – to ensure that we do no harm and that the way in which we use data and technology does not facilitate or exacerbate harm done by others. Responsible data is not just about technical security and encryption, but also about prioritizing the dignity, respect and privacy of the people we work with, and making sure that the people reflected in the data we use are counted and heard, and able to make informed decisions about their lives[6].
The Responsible Data Forum outlined four underlying reasons to the question of why responsible data reuse:
  1. 1.
    Legal implications. Among a larger number of key regulations and directives that are applicable on the EU level is the General Data Protection Regulation (GDPR). The regulation applies if the data controller or processor (organization) or the data subject (person) is based in the EU. Furthermore, the Regulation also applies to organizations based outside the European Union if they process personal data of EU residents. According to the European Commission "Personal data is any information that relates to an identified or identifiable living individual. …The law protects personal data regardless of the technology used for processing that data. ...It also doesn’t matter how the data is stored – in an IT system, through video surveillance, or on paper; in all cases, personal data is subject to the protection requirements set out in the GDPR[7].
  2. 2.
    Ethics and integrity. High ethical standards, respect for dignity and organizational integrity are among the key employee motivators. If an organization negligently demonstrates a lack of care for the privacy and dignity of others, it either leads to a culture of double standards or may be seen to translate into a lack of care about employees’ rights.
  3. 3.
    Rights and dignity of others. Having responsible data policies sends a clear signal to all stakeholders that an organization does in fact care about its affected groups, especially those that are more vulnerable.
  4. 4.
    Reputation in front of donors, partners and customers. Having data reuse policies in place sends a clear signal to donors, partners, customers and other stakeholders that an organization treats its activities with care and high ethical standards. Increasingly the donor community is demanding that such policies are in place if organizations are to receive funding.
Data reusers must do all within their powers to avoid causing any harm to stakeholders that could arise as a direct or indirect result of open data reuse. Many conversations are shaped around a confrontation between the ‘do no harm’ principle and concepts of transparency and accountability. Although it may appear that this is a confrontation of opposites, the balancing rules for these two principles is that the ‘do no harm’ applies for the powerless and transparency and accountability is applied to the powerful[8] .

4. Responsible data concepts

Returning to the definition of responsible data, the very first concept included is the right to consent. Informed consent is the mechanism through which people agree to provide information for research or data collection projects. Generally, consent has been understood as something that is given by individuals during direct interaction with researchers or surveyors and is composed of three components: disclosure of the research objectives and any risks or negative consequences of participating; the capacity of individuals to understand the implications of participating; and the voluntariness of their participation. It is highly recommended to have the consent policies in written form, both to serve as a mechanism for possible planning and decision-making and also as an indication of high responsibility and ethical standards at organizations.
The largest part of the conversation about responsible and ethical data reuse revolves around the concept of privacy. Privacy is concerned with control over information, who can access it, and how it is used. Privacy is generally linked to individuals, families or community groups, and is a concept that is often used to demarcate a line between a ‘private’ and ‘public’ sphere. It is useful to have (institutional) policies in place describing possible risks of publishing the data and their mitigation plan. The key risk, obviously, is to publish data that can uncover particular individuals’ private information (Granickas). Guidelines of Open Data and Privacy issued by the Government of South Australia listed key risks for privacy as follows:
  • causing humiliation, embarrassment or anxiety for the individual; for example, from a release of health data, it might be concluded that an individual accessed treatment for a sensitive sexual health condition;
  • impact on the employment or relationships of individuals;
  • affecting decisions made about an individual or their ability to access services, such as their ability to obtain insurance;
  • resulting in financial loss or detriment;
  • posing a risk to safety, such as identifying a victim of violence or a witness to a crime[9].
Organizations are starting to adopt policies that consider the security of their employees in the light of possible dangers resulting from working with information. There is an increased attention paid to organizational policies that adapt security protocols and tactics to encompass: (1) digital information security; (2) physical and operational security; and (3) the psychosocial well-being required for good security implementation[10].
The Open Data Institute (ODI) developed its Data Ethics Canvas to help organizations in identifying and managing potential data ethics considerations including, principles, policies and processes. In a white paper, the ODI explored the relationship between data ethics and legal compliance, some existing data ethics frameworks and ethical considerations in data collection, sharing and use[12]. The Data Ethics Canvas is recommended for us at the start of a project where data collection is likely to impact on individuals or wider society and involving any type of data. The Data Ethics Canvas guides organizations to consider these following potential questions in a project:
  • What are your data sources?
  • Are there any limitations in your data sources?
  • Who has rights over your data sources?
  • What policies/ laws shape your use of this data?
  • Are you going to be sharing the data with other organizations?
  • What is your core purpose for using this data?
  • Do people understand your purpose?
  • Who could be negatively affected?
  • How are you minimizing negative impact?
  • Who will be positively affected by this project?
  • Are you communicating potential risks/issues, if any?
  • How can people engage with you?
  • When is your next review?
  • What are your actions?
These questions can help you identifying the ethical risks in any open data related projects. For more on this topic, please visit the given footnotes of the ODI.