Skip to contents

This function downloads antimicrobial susceptibility testing (AST) data from the NCBI Pathogen Detection database via the BioSample API. Data are retrieved in batches, parsed from XML, and returned as a tidy tibble with metadata including BioSample ID, Bioproject ID, and organism name.

Usage

download_ncbi_ast(
  species,
  antibiotic = NULL,
  max_records = 15000,
  batch_size = 200,
  sleep_time = 0.34,
  force_antibiotic = FALSE,
  reformat = FALSE,
  interpret_eucast = FALSE,
  interpret_clsi = FALSE,
  interpret_ecoff = FALSE
)

Arguments

species

Character. Organism name for the search query (e.g., "Salmonella enterica"). Required.

antibiotic

Character or vector. Optional antibiotic name/s to filter the returned data. Strings will be processed using the AMR package to standardize names before matching, so e.g. "amikacin" or "Amikacin" or "ami" will be parsed to "amikacin" before matching. This can be turned off by setting force_antibiotic=TRUE. Full list of allowed antibiotic names in NCBI: https://www.ncbi.nlm.nih.gov/biosample/docs/antibiogram/.

max_records

Integer. Maximum number of BioSample records to retrieve. Default is 15000.

batch_size

Integer. Number of records fetched per API request. Default is 200 which is recommended by NCBI.

sleep_time

Numeric. Seconds to pause between batch requests to avoid overloading NCBI servers. Default is 0.34.

force_antibiotic

Logical. If TRUE, turns off standardizing the antibiotic name using the AMR package before filtering, so that matching is done exactly on the input string/s. Default is FALSE.

reformat

Logical. If TRUE, reformats the output using import_ncbi_biosample for compatibility with AMR analysis workflows. Default is FALSE. When set to TRUE, the data can also be interpreted against breakpoints/ECOFF by setting the interpret_*=TRUE.

interpret_eucast

Logical. Passed to interpret_ast via import_ncbi_biosample. If TRUE, interprets MIC values using EUCAST breakpoints. Default is FALSE. Only used if reformat=TRUE.

interpret_clsi

Logical. Passed to interpret_ast via import_ncbi_biosample. If TRUE, interprets MIC values using CLSI breakpoints. Default is FALSE. Only used if reformat=TRUE.

interpret_ecoff

Logical. Passed to interpret_ast via import_ncbi_biosample. If TRUE, interprets MIC values using ECOFF cutoffs. Default is FALSE. Only used if reformat=TRUE.

Value

A tibble with one row per AST measure, with corresponding BioSample metadata.

Details

The function constructs an Entrez query of the form: "<organism> AND antibiogram[filter]". XML records are downloaded in batches, parsed, and combined into a single table. The resulting tibble contains AST test results and associated metadata including:

  • id: BioSample identifier

  • BioProject: BioProject accession ID

  • organism: Organism name

  • Antibiotic, Phenotype, Measurement, Units, Method, System, Manufacturer, Panel, Standard: AST data columns

The function can optionally filter by a one or more antibiotics. It can also optionally reformat data for compatibility with AMRgen functions via import_ncbi_ast, and interpret the raw data measures against breakpoints or ECOFF. See import_ncbi_ast for details of output formats when these options are used.

NCBI API usage

Users are encouraged to set an NCBI API key via rentrez::set_entrez_key() to increase request limits and comply with NCBI usage policies.

Examples

if (FALSE) { # \dontrun{
# Download AST data for Klebsiella quasipneumoniae
ast <- download_ncbi_ast("Klebsiella quasipneumoniae")

# Download Klebsiella quasipneumoniae data, filter to amikacin and ampicillin
ast <- download_ncbi_ast(
  "Klebsiella quasipneumoniae",
  antibiotic = c("amikacin", "Amp")
)

# Download and reformat for AMRgen workflow with EUCAST interpretation
ast <- download_ncbi_ast(
  "Klebsiella quasipneumoniae",
  reformat = TRUE,
  interpret_eucast = TRUE
)
} # }