Normalize TSS counts — normalize_counts • TSRexploreR

edgeR, DESeq2, or CPM normalization of TSS counts

normalize_counts(
  experiment,
  data_type = c("tss", "tss_features"),
  method = "DESeq2",
  threshold = 1,
  n_samples = 1
)

Arguments

experiment	TSRexploreR object.
data_type	Whether TSS ('tss') or gene/transcript counts (in development) should be normalized.
method	Either 'edgeR', 'DESeq2', or 'CPM'.
threshold	TSSs or TSRs with a score below this value will not be considered.
n_samples	Filter out TSSs or features not meeting the the selected threshold in this number of samples.

Value

TSRexploreR object with normalized counts.

Details

This function performs one of three normalizations on TSS or gene/transcript counts. The simplest of these is counts per million (CPM), which accounts for sequencing depth. While CPM is appropriate for comparing replicates, it is considered to be too simple for cases in which there are expected to be substantial differences in RNA composition between samples. For between-sample comparisons, the trimmed median of M-values (TMM) or median-of-ratios (MOR) approaches, implemented in edgeR and DESeq2, respectively, can be used. Both of these methods are designed to reduce the impact of library size on such comparisons. Prior to TMM or MOR normalization, it is recommend to remove features with few or no reads, as they may bias the final results. To facilitate this filtering, two arguments are provided: 'threshold' and 'n_samples'. Features must have greater than or equal to 'threshold' number of raw counts in at least 'n_samples' number of samples to proceed through normalization.

When clustering TSSs into TSRs using 'tss_clustering', both the raw and normalized counts will be stored in the new TSRs.

Examples

data(TSSs)
sample_sheet <- data.frame(
  sample_name=c(
    sprintf("S288C_D_%s", seq_len(3)),
    sprintf("S288C_WT_%s", seq_len(3))
  ),
  file_1=rep(NA, 6), file_2=rep(NA, 6),
  condition=c(
    rep("Diamide", 3),
    rep("Untreated", 3)
  )
)

exp <- TSSs %>%
  tsr_explorer(sample_sheet=sample_sheet) %>%
  format_counts(data_type="tss")

exp <- normalize_counts(exp, method="CPM")