
Import and process antimicrobial phenotype data from common sources
Source:R/import_pheno.R
import_ast.RdThis function imports an antibiotic susceptibility testing (AST) datasets in formats exported by EBI, NCBI, WHOnet and several automated AST instruments (Vitek, Microscan, Sensititre). It assumes that the input file is a tab-delimited text file (e.g., TSV) or CSV (which may be compressed) and parses relevant columns (antibiotic names, species names, MIC or disk data) into suitable classes using the AMR package. It optionally can use the AMR package to interpret susceptibility phenotype (SIR) based on EUCAST or CLSI guidelines (human breakpoints and/or ECOFF). If expected columns are not found warnings will be given, and interpretation may not be possible.
Usage
import_ast(
input,
format = "ebi",
interpret_eucast = FALSE,
interpret_clsi = FALSE,
interpret_ecoff = FALSE,
species = NULL,
ab = NULL,
source = NULL
)Arguments
- input
A string representing a dataframe, or a path to an input file, containing the AST data a supported format. These files may be downloaded from public sources such as the EBI AMR web browser (https://www.ebi.ac.uk/amr/data/?view=experiments), EBI FTP site (ftp://ftp.ebi.ac.uk/pub/databases/amr_portal/releases/), or NCBI browser (e.g. https://www.ncbi.nlm.nih.gov/pathogens/ast#Pseudomonas%20aeruginosa), or using the functions download_ebi or download_ncbi_ast; or the files may be exported from supported AST instruments.
- format
A string indicating the format of the data: "ebi" (default), "ebi_web", "ebi_ftp", "ncbi", "vitek", "microscan", "sensititre", or "whonet". This determines whether the data is passed on to the
import_ebi_ast()(ebi/ebi_web),import_ebi_ast_ftp()(ebi_ftp),import_ncbi_ast()(ncbi),import_vitek_ast()(vitek),import_microscan_ast()(microscan),import_sensititre_ast()(sensititre), orimport_whonet_ast()(whonet) function to process.- interpret_eucast
A logical value (default is FALSE). If
TRUE, the function will interpret the susceptibility phenotype (SIR) for each row based on the MIC or disk diffusion values, against EUCAST human breakpoints. These will be reported in a new columnpheno_eucast, of class 'sir'.- interpret_clsi
A logical value (default is FALSE). If
TRUE, the function will interpret the susceptibility phenotype (SIR) for each row based on the MIC or disk diffusion values, against CLSI human breakpoints. These will be reported in a new columnpheno_clsi, of class 'sir'.- interpret_ecoff
A logical value (default is FALSE). If
TRUE, the function will interpret the wildtype vs nonwildtype status for each row based on the MIC or disk diffusion values, against epidemiological cut-off (ECOFF) values. These will be reported in a new columnecoff, of class 'sir' and coded as 'R' (nonwildtype) or 'S' (wildtype).- species
(optional) Name of the species to use for phenotype interpretation. By default, the organism field in the input file will be assumed to specify the species for each sample, but if this is missing or you want to override it in the interpretation step, you may provide a single species name via this parameter.
- ab
(optional) Name of the antibiotic to use for phenotype interpretation. By default, the antibiotic field in the input file will be assumed to specify the antibiotic for each sample, but if this is missing or you want to override it in the interpretation step, you may provide a single antibiotic name via this parameter.
- source
(optional) A single value to record as the source of these data points, e.g. "EBI_browser". By default, the publications field (for EBI data) or BioProject field (for NCBI data) will be used to indicate the source for each row in the input file, but if this is missing or you want to override it with a single value for all samples, you may provide a source name via this parameter.
Value
A data frame with the processed AST data, including additional columns:
id: The biosample identifier.spp_pheno: The species phenotype, formatted using theas.mofunction.drug_agent: The antibiotic used in the test, formatted using theas.abfunction.mic: The minimum inhibitory concentration (MIC) value, formatted using theas.micfunction.disk: The disk diffusion measurement (in mm), formatted using theas.diskfunction.method: The AST method (e.g., "broth dilution", "disk diffusion", "Etest", "agar dilution"). Expected values are based on the NCBI/EBI antibiogram specification.platform: The AST platform/instrument (e.g., "Vitek", "Phoenix", "Sensititre").guideline: The AST standard recorded in the input file as being used for the AST assay.pheno_eucast: The phenotype newly interpreted against EUCAST human breakpoint standards (as S/I/R), based on the MIC or disk diffusion data.pheno_clsi: The phenotype newly interpreted against CLSI human breakpoint standards (as S/I/R), based on the MIC or disk diffusion data.ecoff: The phenotype newly interpreted against the ECOFF (as S/R), based on the MIC or disk diffusion data.pheno_provided: The original phenotype interpretation provided in the input file.source: The source of each data point (from the publications or bioproject field in the input file, or replaced with a single value passed in as the 'source' parameter).
Examples
# small example E. coli AST data from NCBI
ecoli_ast_raw
#> # A tibble: 10 × 21
#> `#BioSample` `Organism group` `Scientific name` `Isolation type` Location
#> <chr> <chr> <chr> <chr> <chr>
#> 1 SAMN36015110 E.coli and Shigella Escherichia coli clinical India: A…
#> 2 SAMN11638310 E.coli and Shigella Escherichia coli clinical India
#> 3 SAMN05729964 E.coli and Shigella Escherichia coli clinical Brazil: …
#> 4 SAMN10620111 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 5 SAMN10620168 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 6 SAMN10620104 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 7 SAMN10620102 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 8 SAMN10620129 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 9 SAMN10620121 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> 10 SAMN10620086 E.coli and Shigella Escherichia coli clinical USA: Roc…
#> # ℹ 16 more variables: `Isolation source` <chr>, Isolate <chr>,
#> # Antibiotic <chr>, `Resistance phenotype` <chr>, `Measurement sign` <chr>,
#> # `MIC (mg/L)` <dbl>, `Disk diffusion (mm)` <lgl>,
#> # `Laboratory typing platform` <chr>, Vendor <chr>,
#> # `Laboratory typing method version or reagent` <chr>,
#> # `Testing standard` <chr>, `Create date` <dttm>, pheno_clsi_mic <sir>,
#> # pheno_clsi_disk <sir>, ecoff_mic <sir>, ecoff_disk <sir>
# import without re-interpreting resistance
pheno <- import_ast(ecoli_ast_raw, format = "ncbi")
#> Reading in as NCBI AST format
#> Warning: Expected AST method column 'Laboratory typing method' not found in input
#> Warning: Expected column 'BioProject' not found in input
head(pheno)
#> # A tibble: 6 × 29
#> id drug_agent mic disk guideline method platform pheno_provided
#> <chr> <ab> <mic> <dsk> <chr> <chr> <chr> <sir>
#> 1 SAMN36015110 CIP <128.00 NA CLSI broth… NA R
#> 2 SAMN11638310 CIP 256.00 NA CLSI broth… NA R
#> 3 SAMN05729964 CIP 64.00 NA CLSI Etest Etest R
#> 4 SAMN10620111 CIP >=4.00 NA CLSI broth… NA R
#> 5 SAMN10620168 CIP >=4.00 NA CLSI broth… NA R
#> 6 SAMN10620104 CIP <=0.25 NA CLSI broth… NA S
#> # ℹ 21 more variables: spp_pheno <mo>, `Organism group` <chr>,
#> # `Scientific name` <chr>, `Isolation type` <chr>, Location <chr>,
#> # `Isolation source` <chr>, Isolate <chr>, Antibiotic <chr>,
#> # `Resistance phenotype` <chr>, `Measurement sign` <chr>, `MIC (mg/L)` <dbl>,
#> # `Disk diffusion (mm)` <lgl>, `Laboratory typing platform` <chr>,
#> # Vendor <chr>, `Laboratory typing method version or reagent` <chr>,
#> # `Testing standard` <chr>, `Create date` <dttm>, pheno_clsi_mic <sir>, …
# import and re-interpret resistance (S/I/R) and WT/NWT (vs ECOFF) using AMR package
pheno <- import_ast(ecoli_ast_raw, format = "ncbi", interpret_eucast = TRUE, interpret_ecoff = TRUE)
#> Reading in as NCBI AST format
#> Warning: Expected AST method column 'Laboratory typing method' not found in input
#> Warning: Expected column 'BioProject' not found in input
head(pheno)
#> # A tibble: 6 × 33
#> id drug_agent mic disk pheno_eucast ecoff guideline method platform
#> <chr> <ab> <mic> <dsk> <sir> <sir> <chr> <chr> <chr>
#> 1 SAMN360… CIP <128.00 NA NI NI CLSI broth… NA
#> 2 SAMN116… CIP 256.00 NA R NWT CLSI broth… NA
#> 3 SAMN057… CIP 64.00 NA R NWT CLSI Etest Etest
#> 4 SAMN106… CIP >=4.00 NA R NWT CLSI broth… NA
#> 5 SAMN106… CIP >=4.00 NA R NWT CLSI broth… NA
#> 6 SAMN106… CIP <=0.25 NA S NI CLSI broth… NA
#> # ℹ 24 more variables: pheno_provided <sir>, spp_pheno <mo>,
#> # `Organism group` <chr>, `Scientific name` <chr>, `Isolation type` <chr>,
#> # Location <chr>, `Isolation source` <chr>, Isolate <chr>, Antibiotic <chr>,
#> # `Resistance phenotype` <chr>, `Measurement sign` <chr>, `MIC (mg/L)` <dbl>,
#> # `Disk diffusion (mm)` <lgl>, `Laboratory typing platform` <chr>,
#> # Vendor <chr>, `Laboratory typing method version or reagent` <chr>,
#> # `Testing standard` <chr>, `Create date` <dttm>, pheno_clsi_mic <sir>, …