normalize_counts.Rd
edgeR, DESeq2, or CPM normalization of TSS counts
normalize_counts( experiment, data_type = c("tss", "tss_features"), method = "DESeq2", threshold = 1, n_samples = 1 )
experiment | TSRexploreR object. |
---|---|
data_type | Whether TSS ('tss') or gene/transcript counts (in development) should be normalized. |
method | Either 'edgeR', 'DESeq2', or 'CPM'. |
threshold | TSSs or TSRs with a score below this value will not be considered. |
n_samples | Filter out TSSs or features not meeting the the selected threshold in this number of samples. |
TSRexploreR object with normalized counts.
This function performs one of three normalizations on TSS or gene/transcript counts. The simplest of these is counts per million (CPM), which accounts for sequencing depth. While CPM is appropriate for comparing replicates, it is considered to be too simple for cases in which there are expected to be substantial differences in RNA composition between samples. For between-sample comparisons, the trimmed median of M-values (TMM) or median-of-ratios (MOR) approaches, implemented in edgeR and DESeq2, respectively, can be used. Both of these methods are designed to reduce the impact of library size on such comparisons. Prior to TMM or MOR normalization, it is recommend to remove features with few or no reads, as they may bias the final results. To facilitate this filtering, two arguments are provided: 'threshold' and 'n_samples'. Features must have greater than or equal to 'threshold' number of raw counts in at least 'n_samples' number of samples to proceed through normalization.
When clustering TSSs into TSRs using 'tss_clustering', both the raw and normalized counts will be stored in the new TSRs.
data(TSSs) sample_sheet <- data.frame( sample_name=c( sprintf("S288C_D_%s", seq_len(3)), sprintf("S288C_WT_%s", seq_len(3)) ), file_1=rep(NA, 6), file_2=rep(NA, 6), condition=c( rep("Diamide", 3), rep("Untreated", 3) ) ) exp <- TSSs %>% tsr_explorer(sample_sheet=sample_sheet) %>% format_counts(data_type="tss") exp <- normalize_counts(exp, method="CPM")