Open Data MOOC

Lesson 5.2: Licensing Open Data

Photo by NASA/Kathryn Hansen licensed under CC BY 2.0

Aims and learning outcomes

This lesson aims to:
  • provide an overview on licensing and which permissions come with licensing
  • define what an open licence is
  • introduce standard open licences in relation to open data
  • explain how to apply open licences
After studying this lesson, you should be able to:
  • understand the reasons why open data should come with an open licence
  • understand the implications of the different open licences that may be applicable to the data that you (intend to) use
  • have an overview of the different open licences that can be used
  • apply open licences to the data that you produced

1. Licensing and reuse

Licensing means that the copyright owner retains ownership but authorises a third party to carry out certain acts covered by the economic rights, generally for a specific period of time and for a specific purpose.[1] In order to facilitate the reuse of data, it is indispensable that others know the terms of use for the database and the data content. To ensure that, the rights holder should mark the data with associated permissions. There are two ways of communicating permissions to potential reusers of data. The rights holder can license a second party to do things that would otherwise infringe on the rights held. Alternatively, the rights holder can give up the rights to a resource so that infringement becomes a non-issue[2]. In both cases, only the rights holder can grant permissions or waive the rights with a licence.

2. What is an open licence?

An ‘open licence’ may sound a contradiction. In general, a licence on a certain piece of content is an agreement between two parties: the licensor and the licensee. It usually comes with provisions for terms, territory and renewal conditions.
  • The terms lay down what the licensor allows the licensee to do with the content. For example, the licensee may be granted to right to use the software that the licensor owns, or
  • The territory is the geographical area where the licence is valid. For example, a distributor may have the right to distribute books in Europe, but not in the USA.
  • A renewal clause is customary because an agreement usually has a duration and can (or cannot) be renewed after expiration of the licence.
In the early days some people mistakenly assumed that they could do anything with the content that they found on the web. That is a misunderstanding. If you do not have a licence, you are not allowed to do anything with data or other content beyond what is considered as ‘fair use’. If a provider wants data to be open, to be used, redistributed and mixed with other content it should come with an appropriate licence. Such open licences are different from many other content licences:
  • to achieve universal participation no licensee is specified
  • to make all uses possible the rights holder waives most or all rights so no specific terms apply
  • open data is distributed via the Internet, so the licence is not limited to a specific territory
  • the duration is the same as the duration of the rights that are being waived (we have seen that copyrights expire after a certain time) so there are no renewal clauses
The following open licences are defined as complying with principles set by the Open Definition:
  • public domain licence which has no restrictions at all
  • attribution licence which requires credit to the rights holder
  • attribution and share-alike licence which requires attribution and share any derived content or data under the same licence[3].

3. Standard open licences

Theoretically providers could choose to make up their own bespoke open licence. But that is quite complex because the data can be reused anywhere in the world so the licence should be valid in many different legislations. Fortunately, there numerous standard open licences that exist in many languages and for many different legislations. These licences come with statements on different levels:
  • a machine readable version
  • the ‘commons deed’, a text that is meant to be understandable for everyone, not just legal experts
  • the ‘legal code’, a text that contains the legal statements that are formulated in such a way that they can be used in court proceedings; there are legal code documents for different national legislations.
Standard open licences are:
  • Creative Commons (CC)
  • Open Data Commons (ODC)
  • Government licences, such as the UK Open Government Licence or the French Licence Ouverte
There are debates about the differences between Creative Commons and Open Data Commons. Creative Commons licences can be applied to many different things that creators want to make available in the public domain, like music and music recordings, pictures, or texts. Open Data Commons licences deal with collections held in databases, and the structure of databases, but not the individual content items in the database.
Both CC and ODC licences are used for open data. Government licences are often used to deal with legal requirements that should be met for government organisations, such as a Freedom of Information Act. But a CC or ODC is often used for government data.

3.1. Creative Commons licences for creative content

Creative Commons (CC) is a non-profit organisation established in 2001. CC helps to avoid the time and the effort to granting/obtaining permission by providing tools to have the relevant licence on the work in a digital environment. CC licences are available in English by default, but they are also translated into other languages in other national legal systems. CC licences consist of four conditions and six main combinations.
Here are the four main conditions for CC open licences:
Attribution (BY): All CC licences require that you must give credit to the rights holder in the way it was requested.
ShareAlike (SA): You are allowed to copy, distribute, display, perform, and modify the work, as long as you distribute any modified work on the same terms.
NonCommercial (NC): You are allowed to copy, distribute, display, perform, and (unless NoDerivatives is chosen) modify and use the work for any purpose other than commercially.
NoDerivatives (ND): You are allowed to copy, distribute, display and perform only original copies of the work.
In addition to these four conditions, CC also provides public domain tools for which copyright interests and database rights are waived, allowing the data to be used as freely as possible:
CC Zero (CC0): The author waives all of his/her copyright and neighbouring and related rights on the work; the rights waived include database rights, so CC0 is suitable to use for data.
CC Public Domain Mark (PDM): CC provides a public domain mark to generate a licence and anyone can use to assert that a work is already in the public domain.
Six main combination of licences and their details are given below:
  1. 1.
    Attribution CC BY: This licence lets others distribute, remix, tweak, and build upon your work, even commercially, as long as they credit you for the original creation.
  2. 2.
    Attribution-ShareAlike CC BY-SA: This licence lets others remix, tweak, and build upon your work even for commercial purposes, as long as they credit you and license their new creations under the identical terms.
  3. 3.
    Attribution-NonCommercial CC BY-NC: This licence lets others remix, tweak, and build upon your work non-commercially, and although their new works must also acknowledge you and be non-commercial, they don’t have to license their derivative works on the same terms.
  4. 4.
    Attribution-NoDerivs CC BY-ND: This licence allows for redistribution, commercial and noncommercial, as long as it is passed along unchanged and in whole, with credit to you.
  5. 5.
    Attribution-NonCommercial-ShareAlike (CC BY-NC-SA): This licence lets others remix, tweak, and build upon your work non-commercially, as long as they credit you and license their new creations under the identical terms.
  6. 6.
    Attribution-NonCommercial-NoDerivs CC BY-NC-ND: This licence is the most restrictive of our six main licences, only allowing others to download your works and share them with others as long as they credit you, but they can’t change them in any way or use them commercially.
‘Non-commercial’ and ‘no derivative works’ rights are seldom or never reserved for open data. If no derivative works would be allowed, combinations with other datasets or their use in apps would be blocked. There is also a grey area between commercial and non-commercial distribution, and if commercial use is excluded there is no universal participation.
It is recommended to use the latest version of the CC licences which are international. The versions of the licences prior to version 4 were not specifically aimed at data, so using them for such may presents some problems. The most significant is that they do not explicitly cover sui generis database rights such as the one in force in the European Union.[4]
All versions of the licences treat datasets and databases as a whole: they do not treat the individual data themselves differently from the collection/database. Therefore, they should be carefully applied in certain complex cases such as collections of variously copyrighted works.[5] The degree of openness in CC licences is also matter. Some of the CC licences are more ‘free’ than the others which are CC0, PDM, CC BY, and CC BY-SA and described as free culture licences.

3.2. Open data licences for databases

The Open Data Commons Project started in 2007, then transferred to Open Knowledge Foundation in 2009; it produced similar licences to CC but designed specifically for databases. Open Data Commons has three licences as follows:
  1. 1.
    Public Domain Dedication and Licence (PDDL) – ‘Public domain for data/databases’: It allows to copy, distribute and use the database (share); to produce works from the database (create); and to modify, transform and build upon the database (adapt). The PDDL imposes no restrictions on the use of the PDDL licensed database. It accomplishes the same thing in the same way as CC0 but is worded specifically in database terms.
  2. 2.
    Attribution Licence (ODC-By) – ‘Attribution for data/databases’: It allows to copy, distribute and use the database (share); to produce works from the database (create); and to modify, transform and build upon the database (adapt) as long as the user attributes any public use of the database, or works produced from the database, in the manner specified in the licence.
  3. 3.
    Open Database Licence (ODC-ODbL) – ‘Attribution Share-Alike for data/databases’: It gives the same permissions as ODC-By. In addition, (i) any adapted version of this database or works produced from an adapted database should also be offered under the ODbL; (ii) a licensor can apply technical restrictions to new work as long as an alternative copy without the restrictions is made equally available.
Table 1 Standard open licences compliant with Open Definition
Attribution (BY)
Share Alike (SA)
CC Zero (CC)
All rights waived. Recommended for scientific data to make data mining and meta analyses possible
Public Domain Dedication and Licence (PDDL)
All rights waived. Recommended for scientific data to make data mining and meta analyses possible
Creative Commons Attribution 4.0 (CC BY)[17]
Open Data Commons Attribution Licence (ODC BY)
Creative Commons Attribution Share Alike (CC BY SA)
Open Database Licence (ODbL)

4. How to use open licences?

Open licences usually come with layers including human-readable and machine-readable versions. They both should clearly indicate which licence applied to your content or data and how it can be reused by others. Creative Commons and Open Data Commons define what statements and marks should be used for each of their licences on their web sites.
Creative Commons offers a web-based tool, the license chooser, to help select the right licence for your needs. Open Data Commons similarly provides you with instructions on how to apply licences (Figure 1 and 2).
Figure 1 Example of a CC licence
Figure 2 Example of an ODC licence
Having the machine-readable licence including a complete description of the metadata is important for your content and data to be correctly harvested by machines, e.g. search engines and web APIs. ODI’s Publisher's Guide to the Open Data Rights Statement Vocabulary offers a great source on the topic. This is equally important for the licensed work to be searched, browsed or filtered correctly on search engines. This topic was discussed widely in the previous units.

Further readings