Make a color map for the sequences around TSSs.

plot_sequence_colormap(
  experiment,
  samples = "all",
  genome_assembly = NULL,
  threshold = NULL,
  use_normalized = FALSE,
  distance = 10,
  dominant = FALSE,
  data_conditions = NULL,
  ncol = 1,
  base_colors = c(A = "#109649", C = "#255C99", G = "#F7B32C", T = "#D62839"),
  font_size = 6,
  rasterize = FALSE,
  raster_dpi = 150,
  ...
)

Arguments

experiment

TSRexploreR object.

samples

A vector of sample names to analyze.

genome_assembly

Genome assembly in FASTA or BSgenome format.

threshold

TSSs or TSRs with a score below this value will not be considered.

use_normalized

Whether to use the normalized (TRUE) or raw (FALSE) counts.

distance

Bases to add on each side of each TSS.

dominant

If TRUE, will only consider the highest-scoring TSS per gene, transcript, or TSR or highest-scoring TSR per gene or transcript.

data_conditions

Apply advanced conditions to the data.

ncol

Integer specifying the number of columns to arrange multiple plots.

base_colors

Named vector specifying colors for each base.

font_size

Size of text for plots.

rasterize

Rasterize a ggplot.

raster_dpi

If rasterization is set, this controls the rasterization DPI.

...

Arguments passed to geom_tile.

Value

ggplot2 object of sequence colormap.

Details

This plotting function generates a ggplot2 base color map for the sequences around TSSs. Color maps represent each base surrounding a TSS as a different color. Since the base composition for every TSS region can be seen in one plot, it's a good companion for sequence logos.

The color of each base is set using the 'base_colors' argument. The argument input should be a named vector, with the base as the name, and the desired color of the base as the vector element.

'genome_assembly' must be a valid genome assembly in either fasta or BSgenome format. fasta formatted genome assemblies should have the file extension '.fasta' or '.fa'. BSgenome assemblies are precompiled Bioconductor libraries for common organisms.

'distance' controls the length upstream and downstream of the TSS from which the sequence will be retrieved.

A set of functions to control data structure for plotting are included. 'threshold' will define the minimum number of reads a TSS or TSR must have to be considered. 'dominant' specifies whether only the dominant TSS or TSR is considered from the 'mark_dominant' function. For TSSs this can be either dominant per TSR or gene, and for TSRs it is just the dominant TSR per gene. 'data_conditions' allows for the advanced filtering, ordering, and grouping of data.

The plot can be rasterized using ggrastr using 'rasterize', and the rasterization DPI set using 'raster_dpi'.

See also

plot_sequence_logo to plot a sequence logo.

Examples

data(TSSs_reduced) assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR") exp <- TSSs_reduced %>% tsr_explorer(genome_assembly=assembly) %>% format_counts(data_type="tss") p <- plot_sequence_colormap(exp, distance=5)