API¶
Unpywall Object¶
-
class
unpywall.
Unpywall
¶ Base class that contains useful functions for retrieving information from the Unpaywall REST API (https://api.unpaywall.org). This client uses version 2 of the API.
-
static
doi
(dois: list, format: str = 'raw', progress: bool = False, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)¶ Parses information for a given DOI from the Unpaywall API service and returns it as a pandas DataFrame.
Parameters: - dois (list) – A list of DOIs.
- format (str) – The format of the DataFrame.
- progress (bool) – Whether the progress of the API call should be printed out or not.
- errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
- force (bool) – Whether to force the cache to retrieve a new entry.
- ignore_cache (bool) – Whether to use or ignore the cache.
Returns: A pandas DataFrame that contains information from the Unpaywall API service.
Return type: DataFrame
-
static
download_pdf_file
(doi: str, filename: str, filepath: str = '.', progress: bool = False) → None¶ This function downloads a PDF from a given DOI.
Parameters: - doi (str) – The DOI of the requested paper.
- filename (str) – The filename for the PDF.
- filepath (str) – The path to store the downloaded PDF.
- progress (bool) – Whether the progress of the API call should be printed out or not.
-
static
download_pdf_handle
(doi: str) → _io.BytesIO¶ This function returns a file-like object containing the requested PDF.
Parameters: doi (str) – The DOI of the requested paper. Returns: The handle of the PDF file. Return type: BytesIO
-
static
get_all_links
(doi: str) → list¶ This function returns a list of URLs for all open-access copies listed in Unpaywall.
Parameters: doi (str) – The DOI of the requested paper. Returns: A list of URLs leading to open-access copies. Return type: list
-
static
get_doc_link
(doi: str) → str¶ This function returns a link to the best OA location (not necessarily a PDF).
Parameters: doi (str) – The DOI of the requested paper. Returns: The URL of the best OA location (not necessarily a PDF). Return type: str
-
static
get_json
(doi: str = None, query: str = None, is_oa: bool = False, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)¶ This function returns all information in Unpaywall about the given DOI.
Parameters: - doi (str) – The DOI of the requested paper.
- query (str) – The text to search for.
- is_oa (bool) – A boolean value indicating whether the returned records should be Open Access or not.
- errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
- force (bool) – Whether to force the cache to retrieve a new entry.
- ignore_cache (bool) – Whether to use or ignore the cache.
Returns: A JSON data structure containing all information returned by Unpaywall about the given DOI.
Return type: JSON object
Raises: AttributeError
– If the Unpaywall API did not respond with json.
-
static
get_pdf_link
(doi: str) → str¶ This function returns a link to an OA pdf (if available).
Parameters: doi (str) – The DOI of the requested paper. Returns: The URL of an OA PDF (if available). Return type: str
-
static
init_cache
(cache=None) → None¶ This method initilializes a cache that is used to store records from the Unpaywall database.
Parameters: cache (UnpywallCache) – A custom cache to be used instead of the standard cache. Raises: AttributeError
– If the custom cache is not of type UnpywallCache.
-
static
query
(query: str, is_oa: bool = False, format: str = 'raw', errors: str = 'raise') → pandas.core.frame.DataFrame¶ Parses information for a given query from the Unpaywall API service and returns it as a pandas DataFrame.
Parameters: - query (str) – The text to search for.
- is_oa (bool) – A boolean value indicating whether the returned records should be Open Access or not.
- format (str) – The format of the DataFrame.
- errors (str) – Either ‘raise’ or ‘ignore’. If the parameter errors is set to ‘ignore’ than errors will not raise an exception.
Returns: A pandas DataFrame that contains information from the Unpaywall API service.
Return type: DataFrame
-
static
view_pdf
(doi: str, mode: str = 'viewer', progress: bool = False) → None¶ This function opens a local copy of a PDF from a given DOI.
Parameters: - doi (str) – The DOI of the requested paper.
- mode (str) – The mode for viewing a PDF.
- progress (bool) – Whether the progress of the API call should be printed out or not.
-
static
Cache Object¶
-
class
unpywall.cache.
UnpywallCache
(name: str = None, timeout=None)¶ This class stores query results from Unpaywall. It has a configurable timeout that can also be set to never expire.
-
name
¶ The filename used to save and load the cache by default.
Type: string
-
content
¶ A dictionary mapping dois to requests.Response objects.
Type: dict
-
access_times
¶ A dictionary mapping dois to the datetime when each was last updated.
Type: dict
-
delete
(doi: str) → None¶ Remove an individual doi from the cache.
Parameters: doi (str) – The DOI to be removed from the cache.
-
download
(doi: str, errors: str)¶ Retrieve a record from Unpaywall.
Parameters: - doi (str) – The DOI to be retrieved.
- errors (str) – Whether to ignore or raise errors.
-
get
(doi: str, errors: str = 'raise', force: bool = False, ignore_cache: bool = False)¶ Return the record for the given doi.
Parameters: - doi (str) – The DOI to be retrieved.
- errors (str) – Whether to ignore or raise errors.
- force (bool) – Whether to force the cache to retrieve a new entry.
- ignore_cache (bool) – Whether to use or ignore the cache.
Returns: record – The response from Unpaywall.
Return type: requests.Response
-
load
(name=None) → None¶ Load the cache from a file.
Parameters: name (str or None) – The filename that the cache will be loaded from. If None, self.name will be used.
-
reset_cache
() → None¶ Set the cache to a blank state.
-
save
(name=None) → None¶ Save the current cache contents to a file.
Parameters: name (str or None) – The filename that the cache will be saved to. If None, self.name will be used.
-
timed_out
(doi: str) → bool¶ Return whether the record for the given doi has expired.
Parameters: doi (str) – The DOI to be removed from the cache. Returns: is_timed_out – Whether the given entry has timed out. Return type: bool
-
Utils¶
-
class
unpywall.utils.
UnpywallCredentials
(email: str)¶ This class provides tools for setting up an email for the Unpaywall service.
-
email
¶ An email that is necessary for using the Unpaywall API service.
Type: str
-
static
validate_email
(email: str) → str¶ This method takes an email as input and raises an error if the email is not valid. Otherwise the email will be returned.
Parameters: email (str) – An email that is necessary for using the Unpaywall API service. Returns: The email that was given as input. Return type: str Raises: ValueError
– If the email parameter is empty or not valid.
-
-
class
unpywall.utils.
UnpywallURL
(doi: str = None, query: str = None, is_oa: bool = False)¶ This class provides the Unpaywall URL.
-
doi
¶ The DOI of the requested paper.
Type: str
-
query
¶ The text to search for.
Type: str
-
is_oa
¶ A boolean value indicating whether the returned records should be Open Access or not.
Type: bool
-
doi_url
¶ The URL for the DOI-Endpoint.
Type: str
-
query_url
¶ The URL for the Query-Endpoint
Type: str
-