Dinucleotide Analysis — plot_dinucleotide

Analysis of -1 and +1 dinucleotide frequencies.

plot_dinucleotide_frequencies(
  experiment,
  genome_assembly = NULL,
  samples = "all",
  threshold = NULL,
  use_normalized = FALSE,
  dominant = FALSE,
  data_conditions = NULL,
  ncol = 3,
  return_table = FALSE,
  ...
)

Arguments

experiment	TSRexploreR object.
genome_assembly	Genome assembly in FASTA or BSgenome format.
samples	A vector of sample names to analyze.
threshold	TSSs or TSRs with a score below this value will not be considered.
use_normalized	Whether to use the normalized (TRUE) or raw (FALSE) counts.
dominant	If TRUE, will only consider the highest-scoring TSS per gene, transcript, or TSR or highest-scoring TSR per gene or transcript.
data_conditions	Apply advanced conditions to the data.
ncol	Integer specifying the number of columns to arrange multiple plots.
return_table	Return a table of results instead of a plot.
...	Arguments passed to geom_col

Value

ggplot2 object of dinucleotide plot. If 'return_table' is TRUE, a data.frame of underlying data is returned.

Details

It has been shown in many organisms that particular base preferences exist at the -1 and +1 positions, where +1 is the TSS and -1 is the position immediately upstream. This plotting function returns a ggplot2 barplot of -1 and +1 dinucleotide frequencies,

'genome_assembly' must be a valid genome assembly in either fasta or BSgenome format. fasta formatted genome assemblies should have the file extension '.fasta' or '.fa'. BSgenome assemblies are precompiled Bioconductor libraries for common organisms.

A set of arguments to control data structure for plotting are included. 'threshold' will define the minimum number of raw counts a TSS or TSR must have to be considered. 'dominant' specifies whether only the dominant TSS should be considered from the 'mark_dominant' function. For TSSs this can be either dominant per TSR or gene. 'data_conditions' allows for the advanced filtering, ordering, and grouping of data.

Examples

data(TSSs_reduced)
assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR")

exp <- TSSs_reduced %>%
  tsr_explorer(genome_assembly=assembly) %>%
  format_counts(data_type="tss")

p <- plot_dinucleotide_frequencies(exp)