This function performs a Positive Predictive Value (PPV) analysis for AMR markers associated with a given antibiotic and drug class. It calculates the PPV for solo markers and visualizes the results using various plots.
Usage
solo_ppv_analysis(
geno_table,
pheno_table,
antibiotic = NULL,
drug_class_list = NULL,
geno_sample_col = NULL,
pheno_sample_col = NULL,
sir_col = NULL,
ecoff_col = "ecoff",
icat = FALSE,
marker_col = "marker",
binary_matrix = NULL,
min = 1,
axis_label_size = 9,
pd = position_dodge(width = 0.8),
excludeRanges = NULL,
colours_SIR = c(S = "#3CAEA3", SDD = "#8FD6C4", I = "#F6D55C", R = "#ED553B"),
colours_ppv = c(R = "maroon", I = "skyblue", NWT = "navy")
)Arguments
- geno_table
(Required if
binary_matrixnot provided) A data frame containing genotype data, formatted withimport_amrfp(). Only used ifbinary_matrixnot provided.- pheno_table
(Required if
binary_matrixnot provided) A data frame containing phenotype data, formatted withimport_ast(). Only used ifbinary_matrixnot provided.- antibiotic
(Required if
binary_matrixnot provided) A character string specifying the antibiotic of interest to filter phenotype data. The value must match one of the entries in thedrug_agentcolumn ofpheno_table. Only used ifbinary_matrixnot provided or if breakpoints required.- drug_class_list
(Required if
binary_matrixnot provided) A character vector of drug classes to filter genotype data for markers related to the specified antibiotic. Markers ingeno_tablewill be filtered based on whether theirdrug_classmatches any value in this list. Only used ifbinary_matrixnot provided.- geno_sample_col
A character string (optional) specifying the column name in
geno_tablecontaining sample identifiers. Defaults toNULL, in which case it is assumed the first column contains identifiers. Only used ifbinary_matrixnot provided.- pheno_sample_col
A character string (optional) specifying the column name in
pheno_tablecontaining sample identifiers. Defaults toNULL, in which case it is assumed the first column contains identifiers. Only used ifbinary_matrixnot provided.- sir_col
A character string specifying the column name in
pheno_tablethat contains the resistance interpretation (SIR) data. The values should be"S","I","R"or otherwise interpretable byAMR::as.sir(). If not provided, the first column prefixed with "phenotype*" will be used if present, otherwise an error is thrown. Only used ifbinary_matrixnot provided.- ecoff_col
A character string specifying the column name in
pheno_tablethat contains resistance interpretations (SIR) made against the ECOFF rather than a clinical breakpoint. The values should be"S","I","R"or otherwise interpretable byAMR::as.sir(). Defaultecoff. Set toNULLif not available. Only used ifbinary_matrixnot provided.- icat
A logical indicating whether to calculate PPV for
"I"(if such a category exists in the phenotype column) (defaultFALSE).- marker_col
A character string specifying the column name in
geno_tablecontaining the marker identifiers. Default"marker". Only used ifbinary_matrixnot provided.- binary_matrix
A data frame containing the original binary matrix output from the
get_binary_matrix()function. If not provided (or set toNULL), user must specifygeno_table,pheno_table,antibiotic,drug_class_listand optionallygeno_sample_col,pheno_sample_col,sir_col,ecoff_col,marker_colto pass toget_binary_matrix().- min
Minimum number of genomes with the solo marker, to include the marker in the plot (default
1).- axis_label_size
Font size for axis labels in the PPV plot (default
9).- pd
A
ggplot2::position_dodge()object controlling horizontal spacing of points and confidence intervals in the PPV plot. Defaultposition_dodge(width = 0.8).- excludeRanges
Vector of phenotype categories (comprised of
"R","I","NWT") for which we should ignore MIC values expressed as ranges when calculating PPVs. To include MICs expressed as ranges set this toNULL.- colours_SIR
A named vector of colours for the percentage bar plot. The names should be the phenotype categories (e.g.,
"R","I","S"), and the values should be valid colour names or hexadecimal colour codes. Default values are those used in the AMR packageAMR::scale_fill_sir().- colours_ppv
A named vector of colours for the plot of PPV estimates. The names should be
"R","I","NWT", and the values should be valid colour names or hexadecimal colour codes.
Value
A list containing the following elements:
solo_stats: A data frame summarizing the PPV for resistance (R vs S/I) and NWT (R/I vs S), including the number of positive hits, sample size, PPV, and 95% confidence intervals for each marker.combined_plot: A combined ggplot object showing the PPV plot for the solo markers, and a bar plot for the phenotype distribution.solo_binary: A dataframe with binary values indicating the presence or absence of the solo markers.solo_binary_norange: A dataframe with binary values indicating the presence or absence of the solo markers, excluding samples for which MIC or disk measures are expressed as ranges.amr_binary: A copy of the genotype-phenotype binary matrix for all markers (either provided as input or generated by the function)plot_order: Ordered list of rows in the output plot (to facilitate alignment with other plots)
Details
The function analyzes the predictive power of individual AMR markers when they are found 'solo' in the genome with no other markers associated with the same class. The phenotype data are matched with genotype presence/absence and then stratified to compute PPV for resistance and non-wild-type interpretations. The function also generates plots to aid in interpretation.
Examples
if (FALSE) { # \dontrun{
geno_table <- import_amrfp(ecoli_geno_raw, "Name")
head(ecoli_ast)
# Generate binary matrix
binary_matrix <- get_binary_matrix(
geno_table = geno_table,
pheno_table = ecoli_ast,
antibiotic = "Ciprofloxacin",
drug_class_list = c("Quinolones"),
sir_col = "pheno_clsi",
keep_assay_values = TRUE,
keep_assay_values_from = "mic"
)
# Run solo PPV analysis plot analysis using this binary_matrix
solo_ppv_analysis(binary_matrix = binary_matrix)
# Alternatively, generate binary matrix and solo PPV analysis in one step
# (note `antibiotic` and `drug_class` list are optional here, and only used
# for titling the plot)
soloPPV_cipro <- solo_ppv_analysis(
binary_matrix = binary_matrix,
antibiotic = "Ciprofloxacin",
drug_class_list = c("Quinolones")
)
soloPPV_cipro$solo_stats
soloPPV_cipro$combined_plot
} # }
