Data Format

This documentation is a version of the official data format description. For more information, consult the official version.


In the Unpaywall schema, there are two types of objects. Every OA paper is represetnted by a “DOI object”, which can have multiple “OA locations”.

  • DOI objects. This contains information about the paper, such as the title, publication date, and authors.
  • OA location objects. There can be many OA locations for each DOI, with features like PDF links, licenses, etc.

DOI Object

Key Example Meaning
best_oa_location See OA Location below The best available location object
data_standard 1 Whether the data was found using only Crossref (1) or other sources too (2)
doi ‘10.7717/peerj.4375’ The doi of the article in question.
doi_url A URL to the paper, via the doi.
first_oa_location See OA Location below The OA Location Object with the earliest oa_date.
genre ‘journal-article’ The type of resource (not necessarily a paper)
is_paratext false E.g. this would be true if the document was a TOC.
is_oa true Whether an OA copy could be found
journal_is_in_doaj true Whether the journal is indexed by DOAJ.
journal_is_oa true Whether all articles in the journal are OA.
journal_issns ‘2167-8359’ ISSNs for the print and/or electronic versions of the journal.
journal_issn_l ‘2167-8359’ ISSN serving as primary key in case there is more than one ISSN.
journal_name ‘PeerJ’ The name of the journal.
oa_locations A list of OA locations like best_location
oa_status ‘gold’ gold, hybrid, bronze, green, or closed.
published_date ‘2018-02-13’ Date of publication.
publisher ‘PeerJ’ Publisher of the resource
title ‘The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles’ The title
updated ‘2020-04-18T11:3…’ Last time Unpaywall updated this record
year 2018 Year of publication
z_authors JSON description of authors Contributors, as listed in CrossRef.

OA Location

Key Example Meaning
host_type ‘publisher’ publisher or repository (e.g. preprint server)
license ‘cc-by’ License used
oa_date ‘2018-03-23’ When this document first became available at this location.
updated ‘2019-10-21T21:…’ Last time Unpaywall updated this record
url URL to the article (PDF or HTML)
url_for_landing_page URL to the landing page, which may contain a link to the full text
url_for_pdf URL to a PDF copy of the text (may redirect)
version ‘publishedVersion’ submittedVersion, acceptedVersion, or publishedVersion
is_best true Whether this is the best version.
pmh_id null Unpaywall internal debugging field
evidence ‘open (via page says license)’ Unpaywall internal debugging field