Skip to contents

Performs logistic regression to analyze the relationship between genetic markers and phenotype (R, and NWT) for a specified antibiotic.

Usage

amr_logistic(
  geno_table,
  pheno_table,
  antibiotic = NULL,
  drug_class_list = NULL,
  geno_sample_col = NULL,
  pheno_sample_col = NULL,
  sir_col = "pheno",
  ecoff_col = "ecoff",
  marker_col = "marker.label",
  binary_matrix = NULL,
  maf = 10,
  fit_glm = FALSE,
  single_plot = TRUE,
  colors = c("maroon", "blue4"),
  axis_label_size = 9
)

Arguments

geno_table

(Required if binary_matrix not provided) A data frame containing genotype data, formatted with import_amrfp(). Only used if binary_matrix not provided.

pheno_table

(Required if binary_matrix not provided) A data frame containing phenotype data, formatted with import_ast(). Only used if binary_matrix not provided.

antibiotic

(Required if binary_matrix not provided) A character string specifying the antibiotic of interest to filter phenotype data. The value must match one of the entries in the drug_agent column of pheno_table. Only used if binary_matrix not provided or if breakpoints required.

drug_class_list

(Required if binary_matrix not provided) A character vector of drug classes to filter genotype data for markers related to the specified antibiotic. Markers in geno_table will be filtered based on whether their drug_class matches any value in this list. Only used if binary_matrix not provided.

geno_sample_col

A character string (optional) specifying the column name in geno_table containing sample identifiers. Defaults to NULL, in which case it is assumed the first column contains identifiers. Only used if binary_matrix not provided.

pheno_sample_col

A character string (optional) specifying the column name in pheno_table containing sample identifiers. Defaults to NULL, in which case it is assumed the first column contains identifiers. Only used if binary_matrix not provided.

sir_col

A character string specifying the column name in pheno_table that contains the resistance interpretation (SIR) data. The values should be "S", "I", "R" or otherwise interpretable by AMR::as.sir(). If not provided, the first column prefixed with "phenotype*" will be used if present, otherwise an error is thrown. Only used if binary_matrix not provided.

ecoff_col

A character string specifying the column name in pheno_table that contains resistance interpretations (SIR) made against the ECOFF rather than a clinical breakpoint. The values should be "S", "I", "R" or otherwise interpretable by AMR::as.sir(). Default ecoff. Set to NULL if not available. Only used if binary_matrix not provided.

marker_col

(Optional) Name of the column containing the marker identifiers, whose unique values will be treated as predictors in the regression. Defaults to "marker".

binary_matrix

A data frame containing the original binary matrix output from the get_binary_matrix() function. If not provided (or set to NULL), user must specify geno_table, pheno_table, antibiotic, drug_class_list and optionally geno_sample_col, pheno_sample_col, sir_col, ecoff_col, marker_col to pass to get_binary_matrix().

maf

(Optional) An integer specifying the minimum allele frequency (MAF) threshold. Markers with a MAF lower than this value will be excluded. Defaults to 10.

fit_glm

(Optional) Change to TRUE to fit model with glm. Otherwise fit model with logistf (default FALSE).

single_plot

(Optional) A logical value. If TRUE, a single plot is produced comparing the estimates for resistance (R) and non-resistance (NWT). Otherwise, two plots are printed side-by-side. Defaults to TRUE.

colors

(Optional) A vector of two colors, to use for R and NWT models in the plots. Defaults to c("maroon", "blue4").

axis_label_size

(Optional) A numeric value controlling the size of axis labels in the plot. Defaults to 9.

Value

A list with the following components:

  • binary_matrix: The binary matrix of genetic data and phenotypic resistance information (either provided as input or generated by the function).

  • modelR: The fitted logistic regression model for resistance (R).

  • modelNWT: The fitted logistic regression model for non-resistance (NWT).

  • plot: A ggplot object comparing the estimates for resistance and non-resistance with corresponding statistical significance indicators.

Examples

# Example usage of the amr_logistic function
result <- amr_logistic(
  geno_table = import_amrfp(ecoli_geno_raw, "Name"),
  pheno_table = ecoli_ast,
  sir_col = "pheno_clsi",
  antibiotic = "Ciprofloxacin",
  drug_class_list = c("Quinolones"),
  maf = 10
)
#> Generating geno-pheno binary matrix
#>  Defining NWT in binary matrix using ecoff column provided: ecoff 
#> ...Fitting logistic regression model to R using logistf
#>    Filtered data contains 3630 samples (793 => 1, 2837 => 0) and 19 variables.
#> Warning: logistf.fit: Maximum number of iterations for full model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
#> ...Fitting logistic regression model to NWT using logistf
#>    Filtered data contains 3630 samples (929 => 1, 2701 => 0) and 19 variables.
#> Warning: logistf.fit: Maximum number of iterations for full model exceeded. Try to increase the number of iterations or alter step size by passing 'logistf.control(maxit=..., maxstep=...)' to parameter control
#> Generating plots

# To access the plot:
print(result$plot)