Open Data MOOC
Comment on page

Lesson 1.1: What is open data?

Photo by Vostoc 91 licensed under CC BY-SA 2.0

Aims and learning outcomes

The lesson aims to:
  • introduce open data and its principles
  • identify benefits and challenges of open data for the agricultural sector
  • provide an overview about key actors involved in the open data publishing
  • present use cases in agriculture related to benefits of open data.
After studying this lesson, you should be able to:
  • acquire knowledge about definition of open data and its principles
  • recognize the value of open data in agriculture
  • define challenges of moving to an open data landscape
  • understand the benefits of publishing and using open data.

1. Introduction to open data

In today’s world, it is possible to forecast the future much better than ever or to answer seemingly complicated questions much more quickly based on data. Such questions might be: Where does our food come from? Can we manage risks in our farm and take control measures against droughts or pests? Are we able to predict problems such as floods or low yields? Can we make informed decisions on what to grow, what treatment to apply, when to plant, treat or harvest? Technologies today allow us to build services to answer these questions, but data only offers these opportunities when it is usable.
The notion of open data has been around for some years. The term ‘open data’ was first used in 1995 and the concept of open public data was defined in 2007 in a meeting of leading internet thinkers and activists including Tim O’Reilly and Lawrence Lessig [1]. Considerable amounts of data today are generated by the public sector, e.g. soil surveys, cultivar registrations, pesticide residues, health care, defence industries, infrastructure, public education, and telecommunications. In 2009 various governments, such as USA, UK and Canada, launched open government initiatives to open up their public information.
Open access to research and open publication of data are vital resources for food security and nutrition, driven by farmers, researchers, extension experts, policy makers, governments, international agencies and other private-sector and civil-society stakeholders participating in ‘innovation systems’ and along value chains. Lack of institutional, national and international policies and openness of data limit the effectiveness of agricultural and nutritional data from research and innovation. Making open data work for agriculture and nutrition requires a shared agenda to increase the supply, quality, and interoperability of data, alongside action to build capacity for the use of data by all stakeholders [2].
The nutrition and nutrition sectors are creating increasing amounts of data, from many different sources. From mobile technology used by health workers to open data released by government ministries, data is becoming ever more valuable, as agricultural business development and global food policy decisions are being made based upon it. But the sector is also home to severe resource inequality. The largest agricultural companies make billions of dollars per year, in contrast to subsistence farmers growing just enough to feed themselves, or smallholder farmers who grow enough to sell on a year-by-year basis [3].
In recent years there have been calls for a data revolution for nutrition. The scarcity of available data prevents us from identifying and learning from real progress at the global and national levels. It also hides inequalities within countries, making it more difficult for governments to know about them and for others to hold governments fully accountable [4]. National averages are not enough to see who is being left behind, as nutritional levels can vary even within households. Beyond just collecting data, it should be used actively to make better choices and inform and advocate decision-making from household all the way up to policy level.
The food and nutrition data collected by governments, international agencies, CSOs and others need to be collected using internationally recommended indicators and released as standardised open data to address the nutrition data gap.

2. What is open data?

The open data movement has been advocated strongly by governments to allow others to benefit from their data and their desire to be transparent, but research institutions and the private sector also generate data which they are willing to share as a common good [5].
Open data is data that can be freely used, reused (modified) and redistributed (shared) by anyone[6]. The Open Data Handbook emphasizes the importance of the definition of open and highlights key features about open data:
Availability and Access: The data must be available as a whole, and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form. Managing data can be costly in terms of time and resources needed. An example of costing for data management can be seen at UK Data Service[7].
Reuse and Redistribution: The data must be provided under terms that permit reuse and redistribution including intermixing with other datasets.
Universal Participation: Everyone must be able to use, reuse and redistribute - there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed[8].
Data exists on a spectrum. It can be closed, shared or open. Dataset may include sensitive information for security, personal or commercial reasons. For instance, health records may cover sensitive data which raise privacy issues. Personal nutrition data is often considered as sensitive. For these reasons, data can be closed or shared to limited persons or groups but not licensed to permit anyone to access, use and share it. Whether big, medium or small, whether state, commercial or personal, the important thing about data is how it is licensed. For data to be considered open, it must be:
  • accessible, which usually means published on the web
  • available in a machine-readable format
  • with a licence that permits anyone to access, use and share it – commercially and non-commercially.
The Data Spectrum in Figure 1 developed by The Open Data Institute (ODI), illustrates the degree of openness of data and helps to understand the language of data[9].
Figure 1 The Data Spectrum by the ODI licensed under CC BY
Many individuals and organisations collect a broad range of different types of data in order to perform their tasks. Government is particularly significant in this respect, both because of the quantity and centrality of the data it collects, but also because most of that government data is public data by law, and therefore could be made open and available for others to use[10].
There are many kinds of open data that have potential uses and applications:
  • Culture: data about cultural works and artefacts – for example titles and authors- and generally collected and held by galleries, libraries, archives and museums
  • Science: data that is produced as part of scientific research, from astronomy to zoology
  • Finance: data such as government accounts (expenditure and revenue) and information on financial markets (stocks, shares, bonds etc.)
  • Statistics: data produced by statistical offices such as the census and key demographic and health indicators
  • Weather: many types of information used to understand and predict the weather and climate
  • Environment: information related to the natural environment such the presence and level of pollutants, the quality of rivers and seas[11].

3. Open data principles

The Open Definition makes precise the meaning of ‘open’ with respect to knowledge, promoting a robust common in which anyone may participate, and interoperability is maximised. Knowledge is open if anyone is free to access, use, modify and share it – subject at most to measures that preserve provenance and openness[12].
Open data must comply with an open licence or a status. It must be in a public domain or under an open licence. Without a licence, the data can’t be reused.
It must be accessible and downloadable via the internet. Any additional information necessary for licence compliance must also accompany the work, such as an attribution to say that people who use the data must credit whoever is publishing it or a share-alike requirement to say that people who mix the data with other data have to also release the results as open data.
Open data must be in a machine-readable form which is processable by a computer and where the individual elements of the work can be easily accessed and modified. It must also be in an open format which places no restrictions, monetary or otherwise, upon its use and can be fully processed with at least one free/libre/open-source software tool.
The licence used for the open data should be compatible with other open licences. It should permit free use, redistribution, creation of derivatives, and compilation of the licensed work. It must allow any part of the work to be freely used, distributed, or modified separately from any other part of the work or from any collection of works in which it was originally distributed. The licence must not discriminate against any person or group.
The Open Data Charter, which is a collaboration between over 70 governments, agrees on six principles for how governments should be publishing information. Each of them is explained below briefly. On their site, the Charter also provides detailed action items to achieve each of these principles[13].
Open by Default: Free access to and use of government data (data held by national, regional, local, and city governments, international governmental bodies, and other types of institutions in the wider public sector) brings a significant value to society and the economy, and the government data should, therefore, be open by default. Resources, standards, and policies for the creation, use, exchange, and harmonisation of open data should be globally developed, adopted and promoted as long as citizens are confident that open data will not compromise their right to privacy.
Timely and Comprehensive: Data may require time, human and technical resources to be released and published. It is important to identify which data to prioritize for release by consulting with the data users. The data must be comprehensive, accurate, and of high quality.
Accessible and Usable: Opening up data enables stakeholders to make informed decisions. The data should be easily discoverable and accessible, and made available without any barriers.
Comparable and Interoperable: The data should be published in structured and standardised formats to support interoperability, traceability and reuse. It should also be easy to compare within and between sectors, across geographic locations, and over time in order to be the most effective and useful.
For Improved Governance and Citizen Engagement: Open data strengthens governance and provides a transparent and accountable foundation to improve decision-making and how land markets operate. It enables civic participation and better-informed engagement between governments and citizens.
For Inclusive Development and Innovation: Openness stimulates creativity and innovation. Open data by its nature offers an equitable resource for all people regardless of where they come from or who they are and provides a less digitally divided environment to access and use the data.

4. Benefits of open data

The benefits of open data are diverse and range from improved efficiency of public administrations, economic growth in the private sector to wider social welfare and citizen empowerment.
Performance can be enhanced by open data and contribute to improving the efficiency of public services in health and nutrition. Greater efficiency in processes and delivery of public services can be achieved thanks to cross-sector sharing of data, which can for example provide an overview of unnecessary spending. Resources can be better targeted thanks to local level, disaggregated data, showing which areas and populations have the greatest needs.
The economy can benefit from easier access to information, content and knowledge in turn contributing to the development of innovative services and the creation of new business models.
Social welfare can be improved as society benefits from information that is more transparent and accessible. Open Data enhances collaboration, participation and social innovation[14].
Figure 2 The data lifecycle by Mushonz licensed under CC BY-SA 4.0 via Wikimedia Commons
The availability of detailed open data is essential to improve delivery of services at the local level. Such cases are mySociety, the Hungarian 'right to know' portal, and Fix my Street Norway. To support the emergence of new data-driven businesses and the growth of existing ones, governments need to publish key datasets. By growing economies and improving services, open data allows governments to make savings in key areas, like provision of healthcare, education and utilities. In the UK, open data helped reveal £200 millions of savings in the health service. In France, energy data is being used to drive more efficient energy generation practices[15].
GODAN’s report on ‘How can we improve agriculture, food and nutrition with open data?’ specifies three ways that open data can help solve practical problems in the agriculture and nutrition sectors.
  • Enabling more efficient and effective decision making. Open data enables computers to pull data from various sources and process it for us and does not rely on humans to interpret and integrate information contained in web pages. Open data underpins new products and services by presenting information from a wide range of sources that helps everyone from policy-makers to smallholders find gaps in markets or fine-tune their products or services. A good example of it is in fisheries. In South Africa, the fisher community in collaboration with the University of Cape Town co-designed a suite of apps to support and improve the small-scale fisheries industry. Abalobi Fisher is one of those apps which provides valuable information about the weather and climate from open sources, plus records data about fisher practice and catch information. The hope is that it will showcase small-scale fisheries as a vital and valuable resource as well as a legitimised livelihood, not just to the local communities, but to the country as a whole. A relevant use case can be watched at Open Water – GODAN Documentary Web Series.
  • Fostering innovation that everyone can benefit from. As a raw material for creating tools, services, insights and applications, open data makes it inexpensive and easy to create new innovations. When data is open for all to experiment with, there is no need to invest large amounts in repeating already completed trials. When data is openly licensed, it also allows for novel combinations with other data to gain new insights. A story of Andrew from Allington, UK, who works alongside his family on their arable and dairy farm is a good example of using data to apply precision farming tactics to their land. The family has taken advantage of new tools and technology that allow them to easily view satellite data of their land, which has been opened up by the European Space Agency. His story can be watched at Open Skies – GODAN Documentary Web Series.
  • Driving organisational and sector change through transparency. Transparency around targets, subsidy distribution and pricing, for example, creates incentives which affect the behaviour of producers, regulators and consumers. By requiring companies, government departments and other organisations to publish key datasets – performance data, spend data or supply-chain data, for example – governments, regulators and companies can monitor, analyse and respond to trends in that sector. More importantly, publishing this data across a sector can ultimately transform how products and services are delivered. We can refer here to the same example, Abolobi Fisher from Open Water in the first section, of using data about fisher practice for small-scale fisheries industry to make informed decisions.
Providing farmers with more accurate, accessible, timely information – from large agriculture groups to the individual smallholders – will help to ensure food commodity markets function well in future. Progress will be driven largely by providing better access to accurate, timely information for individual smallholder farmers, businesses and policy-makers alike. Open data can and should be part of the solution. Open data promotes transparency across the sector to accelerate progress, identify areas for improvement and help create new insights[16].
Agmarknet in India is a good example of providing market information with more than 2700 data sources to the farmers, traders, policy makers and other stakeholders for better production and marketing decisions. The rice producer's federation of Colombia keeps data sets historically and helps small and medium growth farmers by measuring climate, yields and farming practices related to rice-growing in the country. You can watch the story of Blanca, who runs her farm outside the town of Ibagué with the assistance of early-warning systems and weather data, at Open Climate – GODAN Documentary Web Series.

5. Challenges

Open data acts as change agent. Implementing an open data initiative often involves cultural and institutional change. Opening data goes far beyond putting data on a website under an open licence. Applying the technology is relatively easy when compared with bringing about a cultural change, which can be much harder[17]. It requires consulting with potential data users internally within an institution as well as the external stakeholders.
However, this difficulty of adopting a change does not stop the amount of data which is increasingly becoming openly available. There are still challenges related to data management, licensing, interoperability and exploitation. There is a need to evolve policies, practices and ethics around closed, shared, and open data. In response to these challenges at the policy and technical levels, you will find detailed instructions in the following units and the lessons. Rather an unusual categorisation of the open data challenges was made by The Open Data Institute (ODI). These challenges[18] are identified in the following paragraphs and considerations of how to overcome the challenges are briefly included in each item.
Free is not Always Open. Making data freely available is only the first step in making it open for all to reuse. There are layers of barriers between information that anyone can get and data that anyone can use. This is because making the data open is not always about licences or the format issues, but it is also about comprehension and access. Open data should not require additional time, resources and expertise to be used.
Open is not Always Free. This means that open data should not be assumed to be free of charge. High standards in open data releases and lowering the barriers for data users can be costly to the data publishers. It is known that open data business models exist to help public sector and third-sector organisations such as associations and social enterprises save money and deliver or make money by publishing open data[19].
Analysis is not Always Easy. The diversity of tools available can make open data easy to visualise. However easily made analyses should not stop us criticising the figures. Such analyses should be carefully treated whether they represent comprehensive and accurate figures or not. This matter when people make decisions based on analyses.
Open Data is not Always ‘Good’. Open data helps people to make better informed decisions, but it can equally mislead people into making poor decisions, or enable individuals to make such decisions which lead to a more divided society. However, at least open data provides equitable opportunity for everyone to access the data and criticise and can then hopefully resolve disputes. Privacy rights, completeness, accuracy, reliability and relevance should be considered in data release and publish.
These challenges are provided here in broad terms that apply across every sector. When it comes to land, agriculture and nutrition in particular, they no doubt face these same challenges. However, these challenges are best addressed at the level of a particular problem in a specific field, where standards can be identified or developed, and data released as part of solving a problem. This is especially true when advocates can point to a clear theory of change. GODAN addresses this issue with care and sets out five strategic steps[20] for pursuing solution-focused open data initiatives for agriculture and nutrition:
  • Engage with the growing open data community, including key problem owners and experts at GODAN, to identify the challenges that open data can help solve.
  • Build open data strategies and projects with a focus on finding solutions to land tenure, agriculture and nutrition problems.
  • Develop the infrastructure, assets and capacities for open data in relevant organisations and networks.
  • Use open data and support users of relevant data.
  • Learn through ongoing evaluation, reflection and sharing to ensure we can all continue to improve our practice.

6. Open data actors

Agriculture, nutrition and land data ecosystem involves many private and public-sector actors collecting, analysing and using different bits of data to inform their actions internally, and sometimes also externally with other stakeholders. In the context of this course, it is important to identify the players in the field and see how their role is related to open data.
Public sector actors, such as agricultural, economic and statistical agencies, collect, aggregate and share relevant data within this sector. This data may be simply collected and held, but also may be shared with other government agencies to assist in policy-making decisions or opened and disseminated widely. Some of the key datasets collected by governmental institutions include registry data, land use data, production yields, livestock, weather, market prices and farmers registries. The intergovernmental institutions are also an important player in open data. While they might not collect the data first-hand, they are important aggregators of a lot of that information - nutrition, land cover, population, agricultural censi, etc.
Researchers from universities, think tanks, institutes, organisations and companies collect and analyse data on subjects from customary tenure regimes and practices, soil and agricultural land management practices, climate and weather, plant sciences, to animal sciences, and many more. Others collect data from farmers via surveys or interviews to understand local markets and farmer constraints. Institutions range from ultra-specialised, commercially focused ones to large, international organisations working on global issues, such as conservation and food security. Bigger agricultural companies often have dedicated Research & Development departments.
Figure 3 Data overview by GODAN (Responsible Data in Agriculture)
Agribusinesses collect, analyse and use data to inform changes to the services or products they intend to market to clients. Data may be aggregated from a variety of sources, be they in-house, from government agencies or from their clients themselves. Businesses market products, such as agricultural equipment, fertiliser and seeds, or services, such as satellite imagery and financial services.
Civil society such as non-governmental organizations, farmer associations and other networks of global, regional and grassroots institutions that work on agriculture, nutrition and land rights issues are also an important source of (often grey) literature and data. These institutions also collect data on malnutrition and other types of data as a means for advocacy or to counter the official governmental data perspective.
Farmers produce primary-source agriculture data on their own farms. It may be collected by the previously identified entities or, in large-scale enterprises, analysed in-house. Farmers may use the information produced from this data, or from outside sources, such as the public sector, service providers or research institutions, to inform their farming practices. According to the interviews with farmers held by GODAN, the most valuable information to growers includes data on weather, soil and land, property ownership and markets[21].