Analyze TSS shifts between samples within a consensus TSR set.

tss_shift(
  experiment,
  sample_1,
  sample_2,
  comparison_name,
  tss_threshold = NULL,
  max_distance = 100,
  min_threshold = 10,
  n_resamples = 1000L,
  fdr_cutoff = 0.05
)

Arguments

experiment

TSRexploreR object.

sample_1

First sample to compare. Vector with sample names for TSSs and TSRs, with names 'TSS' and 'TSR'.

sample_2

Second sample to compare. Vector with sample name for TSSs and TSRs, with names 'TSS' and 'TSR'.

comparison_name

Name assigned to the results in the TSRexploreR object.

tss_threshold

Minimum number of raw counts required at a TSS for it to be considered in the shifting analysis.

max_distance

TSRs less than this distance apart will be merged.

min_threshold

Minimum number of raw counts required in each TSR for both samples.

n_resamples

Number of resamplings for permutation test.

fdr_cutoff

Differential features not meeting this significance threshold will not be considered.

Value

TSRexploreR object with shifting scores added.

Details

This function assesses the difference between TSS distributions from two distinct samples in a set of consensus TSRs by calculating a signed version of the earth mover's distance (EMD) that we term earth mover's score (EMS). For this approach, we imagine that the two TSS distributions in questions are piles of dirt, and ask how much dirt from one pile we would need to move, how far, and in which direction, to mimic the distribution of the other sample. The resulting EMS is between -1 and 1, with larger magnitudes indicating larger shifts and the sign indicating direction (negative values indicate upstream shifts and positive values indicate downstream shifts). The positive and negative components of the EMS are also reported. Lastly, the function calculates a p-value for the null hypothesis that there is no difference between the two samples using a permutation test and an FDR-corrected p-value calculated by the Benjamini-Hochberg procedure.

This function also calculates unsigned EMD, which indicates how much TSS "mass" has been moved between two distributions without regard to direction. This is useful in the detection of "balanced" shifts, wherein approximately equal mass is moved upstream and downstream. Examples of this are TSR splitting or merging and a change in TSR shape (e.g., peaked to broad). These will generally be marked by a low EMS, balanced positive and negative scores, and a high EMD. As for EMS, raw and FDR-corrected p-values are reported for EMD.

'sample_1' and 'sample_2' should be the names of the two samples to compare. 'sample_1' should be the control and 'sample_2' the treatment sample. The results will be stored back in the TSRexploreR object with the name given by 'comparison_name'. 'tss_threshold' applies a global threshold to remove TSSs below a certain score, and 'min_threshold' is the minimal score that both TSRs must have to be considered. 'max_distance' is the maximum distance between two two TSRs to be considered for shifting.

Examples

data(TSSs) assembly <- system.file("extdata", "S288C_Assembly.fasta", package = "TSRexploreR") samples <- data.frame( sample_name=c(sprintf("S288C_D_%s", seq_len(3)), sprintf("S288C_WT_%s", seq_len(3))), file_1=rep(NA, 6), file_2=rep(NA, 6), condition=c(rep("Diamide", 3), rep("Untreated", 3)) ) exp <- TSSs %>% tsr_explorer(sample_sheet=samples, genome_assembly=assembly) %>% format_counts(data_type="tss") %>% tss_clustering(threshold=3) %>% merge_samples(data_type = "tss", merge_group="condition") %>% merge_samples(data_type = "tsr", merge_group="condition")
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
#> Warning: Arguments in '...' ignored
exp <- tss_shift( exp, sample_1=c(TSS="S288C_WT_1", TSR="S288C_WT_1"), sample_2=c(TSS="S288C_D_1", TSR="S288C_D_1"), comparison_name="Untreated_vs_Diamide", max_distance = 100, min_threshold = 10, n_resamples = 1000L )
#> Warning: Arguments in '...' ignored
#> Warning: Some sequences have fewer than nthresh scores for at least one sample. #> These are ignored and returned as NA.
#> Joining, by = c("fhash", "sample_indicator")