API

Unpywall Object

class unpywall.Unpywall

Base class that contains useful functions for retrieving information from the Unpaywall REST API (https://api.unpaywall.org). This client uses version 2 of the API.

static doi(dois: list, format: str = 'raw', progress: bool = False, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)

Parses information for a given DOI from the Unpaywall API service and returns it as a pandas DataFrame.

Parameters:
  • dois (list) – A list of DOIs.
  • format (str) – The format of the DataFrame.
  • progress (bool) – Whether the progress of the API call should be printed out or not.
  • errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
  • force (bool) – Whether to force the cache to retrieve a new entry.
  • ignore_cache (bool) – Whether to use or ignore the cache.
Returns:

A pandas DataFrame that contains information from the Unpaywall API service.

Return type:

DataFrame

static download_pdf_file(doi: str, filename: str, filepath: str = '.', progress: bool = False) → None

This function downloads a PDF from a given DOI.

Parameters:
  • doi (str) – The DOI of the requested paper.
  • filename (str) – The filename for the PDF.
  • filepath (str) – The path to store the downloaded PDF.
  • progress (bool) – Whether the progress of the API call should be printed out or not.
static download_pdf_handle(doi: str) → _io.BytesIO

This function returns a file-like object containing the requested PDF.

Parameters:doi (str) – The DOI of the requested paper.
Returns:The handle of the PDF file.
Return type:BytesIO

This function returns a list of URLs for all open-access copies listed in Unpaywall.

Parameters:doi (str) – The DOI of the requested paper.
Returns:A list of URLs leading to open-access copies.
Return type:list

This function returns a link to the best OA location (not necessarily a PDF).

Parameters:doi (str) – The DOI of the requested paper.
Returns:The URL of the best OA location (not necessarily a PDF).
Return type:str
static get_json(doi: str = None, query: str = None, is_oa: bool = False, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)

This function returns all information in Unpaywall about the given DOI.

Parameters:
  • doi (str) – The DOI of the requested paper.
  • query (str) – The text to search for.
  • is_oa (bool) – A boolean value indicating whether the returned records should be Open Access or not.
  • errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
  • force (bool) – Whether to force the cache to retrieve a new entry.
  • ignore_cache (bool) – Whether to use or ignore the cache.
Returns:

A JSON data structure containing all information returned by Unpaywall about the given DOI.

Return type:

JSON object

Raises:

AttributeError – If the Unpaywall API did not respond with json.

This function returns a link to an OA pdf (if available).

Parameters:doi (str) – The DOI of the requested paper.
Returns:The URL of an OA PDF (if available).
Return type:str
static init_cache(cache=None) → None

This method initilializes a cache that is used to store records from the Unpaywall database.

Parameters:cache (UnpywallCache) – A custom cache to be used instead of the standard cache.
Raises:AttributeError – If the custom cache is not of type UnpywallCache.
static query(query: str, is_oa: bool = False, format: str = 'raw', errors: str = 'raise') → pandas.core.frame.DataFrame

Parses information for a given query from the Unpaywall API service and returns it as a pandas DataFrame.

Parameters:
  • query (str) – The text to search for.
  • is_oa (bool) – A boolean value indicating whether the returned records should be Open Access or not.
  • format (str) – The format of the DataFrame.
  • errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
Returns:

A pandas DataFrame that contains information from the Unpaywall API service.

Return type:

DataFrame

static view_pdf(doi: str, mode: str = 'viewer', progress: bool = False) → None

This function opens a local copy of a PDF from a given DOI.

Parameters:
  • doi (str) – The DOI of the requested paper.
  • mode (str) – The mode for viewing a PDF.
  • progress (bool) – Whether the progress of the API call should be printed out or not.

Cache Object

class unpywall.cache.UnpywallCache(name: str = None, timeout=None)

This class stores query results from Unpaywall. It has a configurable timeout that can also be set to never expire.

name

The filename used to save and load the cache by default.

Type:string
content

A dictionary mapping dois to requests.Response objects.

Type:dict
access_times

A dictionary mapping dois to the datetime when each was last updated.

Type:dict
delete(doi: str) → None

Remove an individual doi from the cache.

Parameters:doi (str) – The DOI to be removed from the cache.
download(doi: str, errors: str)

Retrieve a record from Unpaywall.

Parameters:
  • doi (str) – The DOI to be retrieved.
  • errors (str) – Whether to ignore or raise errors.
get(doi: str, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)

Return the record for the given doi.

Parameters:
  • doi (str) – The DOI to be retrieved.
  • errors (str) – Whether to ignore or raise errors.
  • force (bool) – Whether to force the cache to retrieve a new entry.
  • ignore_cache (bool) – Whether to use or ignore the cache.
Returns:

record – The response from Unpaywall.

Return type:

requests.Response

load(name=None) → None

Load the cache from a file.

Parameters:name (str or None) – The filename that the cache will be loaded from. If None, self.name will be used.
reset_cache() → None

Set the cache to a blank state.

save(name=None) → None

Save the current cache contents to a file.

Parameters:name (str or None) – The filename that the cache will be saved to. If None, self.name will be used.
timed_out(doi: str) → bool

Return whether the record for the given doi has expired.

Parameters:doi (str) – The DOI to be removed from the cache.
Returns:is_timed_out – Whether the given entry has timed out.
Return type:bool

Utils

class unpywall.utils.UnpywallCredentials(email: str)

This class provides tools for setting up an email for the Unpaywall service.

email

An email that is necessary for using the Unpaywall API service.

Type:str
static validate_email(email: str) → str

This method takes an email as input and raises an error if the email is not valid. Otherwise the email will be returned.

Parameters:email (str) – An email that is necessary for using the Unpaywall API service.
Returns:The email that was given as input.
Return type:str
Raises:ValueError – If the email parameter is empty or not valid.
class unpywall.utils.UnpywallURL(doi: str = None, query: str = None, is_oa: bool = False)

This class provides the Unpaywall URL.

doi

The DOI of the requested paper.

Type:str
query

The text to search for.

Type:str
is_oa

A boolean value indicating whether the returned records should be Open Access or not.

Type:bool
doi_url

The URL for the DOI-Endpoint.

Type:str
query_url

The URL for the Query-Endpoint

Type:str