Data Format

This documentation is a version of the official data format description. For more information, consult the official version.

Objects

In the Unpaywall schema, there are two types of objects. Every OA paper is represetnted by a “DOI object”, which can have multiple “OA locations”.

  • DOI objects. This contains information about the paper, such as the title, publication date, and authors.
  • OA location objects. There can be many OA locations for each DOI, with features like PDF links, licenses, etc.

DOI Object

Key Example Meaning
best_oa_location See OA Location below The best available location object
data_standard 1 Whether the data was found using only Crossref (1) or other sources too (2)
doi ‘10.7717/peerj.4375’ The doi of the article in question.
doi_url https://doi.org/10.7717/peerj.4375 A URL to the paper, via the doi.
first_oa_location See OA Location below The OA Location Object with the earliest oa_date.
genre ‘journal-article’ The type of resource (not necessarily a paper)
is_paratext false E.g. this would be true if the document was a TOC.
is_oa true Whether an OA copy could be found
journal_is_in_doaj true Whether the journal is indexed by DOAJ.
journal_is_oa true Whether all articles in the journal are OA.
journal_issns ‘2167-8359’ ISSNs for the print and/or electronic versions of the journal.
journal_issn_l ‘2167-8359’ ISSN serving as primary key in case there is more than one ISSN.
journal_name ‘PeerJ’ The name of the journal.
oa_locations A list of OA locations like best_location
oa_status ‘gold’ gold, hybrid, bronze, green, or closed.
published_date ‘2018-02-13’ Date of publication.
publisher ‘PeerJ’ Publisher of the resource
title ‘The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles’ The title
updated ‘2020-04-18T11:3…’ Last time Unpaywall updated this record
year 2018 Year of publication
z_authors JSON description of authors Contributors, as listed in CrossRef.

OA Location

Key Example Meaning
host_type ‘publisher’ publisher or repository (e.g. preprint server)
license ‘cc-by’ License used
oa_date ‘2018-03-23’ When this document first became available at this location.
updated ‘2019-10-21T21:…’ Last time Unpaywall updated this record
url https://peerj.com/articles/4375.pdf URL to the article (PDF or HTML)
url_for_landing_page https://doi.org/10.7717/peerj.4375 URL to the landing page, which may contain a link to the full text
url_for_pdf https://peerj.com/articles/4375.pdf URL to a PDF copy of the text (may redirect)
version ‘publishedVersion’ submittedVersion, acceptedVersion, or publishedVersion
is_best true Whether this is the best version.
pmh_id null Unpaywall internal debugging field
evidence ‘open (via page says license)’ Unpaywall internal debugging field