This function generates a set of visualizations to analyze AMR gene combinations, MIC values, and gene prevalence from a given binary matrix. It creates several plots, including MIC distributions, a bar plot for the percentage of strains per combination, a dot plot for gene combinations in strains, and a plot for gene prevalence. It also outputs a table summarizing the MIC distribution (median, lower, upper) and number resistant, for each marker combination.
Usage
amr_upset(
binary_matrix,
min_set_size = 2,
order = "",
plot_set_size = FALSE,
plot_category = TRUE,
print_category_counts = FALSE,
print_set_size = FALSE,
boxplot_colour = "grey",
assay = "mic"
)Arguments
- binary_matrix
A data frame containing the original binary matrix output from the
get_binary_matrixfunction. Expected columns are an identifier (column 1, any name),pheno(class sir, with S/I/R categories to colour points),mic(class mic, with MIC values to plot), and other columns representing gene presence/absence (binary coded, i.e., 1 = present, 0 = absent).- min_set_size
An integer specifying the minimum size for a gene set to be included in the analysis and plots. Default is 2. Only genes with at least this number of occurrences are included in the plots.
- order
A character string indicating the order of the combinations on the x-axis. Options are:
"" (default): decreasing frequency of combinations
"genes": order by the number of genes in each combination
"value": order by the median assay value (MIC or disk zone) for each combination.
- plot_set_size
Logical indicating whether to include a bar plot showing the set size (i.e., number of times each combination of markers is observed). Default is FALSE.
- plot_category
Logical indicating whether to include a stacked bar plot showing, for each marker combination, the proportion of samples with each phenotype classification (specified by the
phenocolumn in the input file). Default is TRUE.- print_category_counts
Logical indicating whether, if
plot_categoryis TRUE, to print the number of strains in each resistance category for each marker combination in the plot. Default is FALSE.- print_set_size
Logical indicating whether, if
plot_set_sizeis TRUE, to print the number of strains with each marker combination on the plot. Default is FALSE.- boxplot_colour
Colour for lines of the box plots summarising the MIC distribution for each marker combination. Default is "grey".
- assay
A character string indicating whether to plot MIC or disk diffusion data. Must be one of:
"mic": plot MIC data stored in column
mic"disk": plot disk diffusion data stored in column
disk
Value
A list containing the following elements:
plot: A grid of plots displaying: (i) grid showing the marker combinations observed, MIC distribution per marker combination, frequency per marker and (optionally) phenotype classification and/or number of samples for each marker combination.summary: A data frame summarizing each marker combination observed, including median MIC (and interquartile range), number of resistant isolates, and positive predictive value for resistance.
Examples
ecoli_geno <- import_amrfp(ecoli_geno_raw, "Name")
binary_matrix <- get_binary_matrix(
geno_table = ecoli_geno,
pheno_table = ecoli_ast,
antibiotic = "Ciprofloxacin",
drug_class_list = c("Quinolones"),
sir_col = "pheno_clsi",
keep_assay_values = TRUE,
keep_assay_values_from = "mic"
)
#> Defining NWT in binary matrix using ecoff column provided: ecoff
amr_upset(binary_matrix, min_set_size = 3, order = "value", assay = "mic")
#> $plot
#>
#> $summary
#> # A tibble: 103 × 14
#> marker_list marker_count n R.ppv R NWT.ppv NWT
#> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 "" 0 2590 0.00386 10 0.0197 51
#> 2 "qnrB" 1 1 1 1 1 1
#> 3 "parE_E460K, gyrA_S83W" 2 1 1 1 1 1
#> 4 "parE_D475E" 1 61 0 0 0 0
#> 5 "qnrA1" 1 2 0 0 1 2
#> 6 "gyrA_S83A" 1 3 0 0 0.667 2
#> 7 "qnrB4" 1 2 1 2 1 2
#> 8 "parE_I355T" 1 24 0 0 0 0
#> 9 "marR_S3N" 1 38 0.105 4 0.263 10
#> 10 "marR_S3N, parE_D475E" 2 4 0 0 0.25 1
#> # ℹ 93 more rows
#> # ℹ 7 more variables: median_excludeRangeValues <dbl>,
#> # q25_excludeRangeValues <dbl>, q75_excludeRangeValues <dbl>,
#> # n_excludeRangeValues <int>, median_ignoreRanges <dbl>,
#> # q25_ignoreRanges <dbl>, q75_ignoreRanges <dbl>
#>
