Create a sequence logo for the sequences around TSSs.

plot_sequence_logo(
  experiment,
  samples = "all",
  genome_assembly = NULL,
  threshold = NULL,
  use_normalized = FALSE,
  distance = 10,
  dominant = FALSE,
  data_conditions = NULL,
  ncol = 1,
  font_size = 6,
  base_colors = c(A = "#109649", C = "#255C99", G = "#F7B32C", T = "#D62839"),
  ...
)

Arguments

experiment

TSRexploreR object.

samples

A vector of sample names to analyze.

genome_assembly

Genome assembly in FASTA or BSgenome format.

threshold

TSSs or TSRs with a score below this value will not be considered.

use_normalized

Whether to use the normalized (TRUE) or raw (FALSE) counts.

distance

Bases to add on each side of each TSS.

dominant

If TRUE, will only consider the highest-scoring TSS per gene, transcript, or TSR or highest-scoring TSR per gene or transcript.

data_conditions

Apply advanced conditions to the data.

ncol

Integer specifying the number of columns to arrange multiple plots.

font_size

Font size for plots.

base_colors

Colors for each base.

...

Arguments passed to ggseqlogo.

Value

ggplot2 object with sequence logo.

Details

This plotting function uses the ggseqlogo library to make sequence logos from the sequences retrieved by the 'tss_sequences' function. Sequence logos illustrate positional biases for certain bases at specific positions in a set of centered sequences. This is particularly important for TSS analysis since literature has shown strong base preferences spanning TSSs and surrounding sequences.

'genome_assembly' must be a valid genome assembly in either fasta or BSgenome format. fasta formatted genome assemblies should have the file extension '.fasta' or '.fa'. BSgenome assemblies are precompiled Bioconductor libraries for common organisms.

'distance' controls the length upstream and downstream of the TSS for which the sequence will be retrieved.

The color of each base is set using the 'base_colors' argument. The argument input should be a named vector, with the base as the name and the desired color of the base as the vector element.

A set of functions to control data structure for plotting are included. 'threshold' will define the minimum number of reads a TSS or TSR must have to be considered. 'dominant' specifies whether only the dominant TSS or TSR is considered from the 'mark_dominant' function. For TSSs this can be either dominant per TSR or gene, and for TSRs it is just the dominant TSR per gene. 'data_conditions' allows for the advanced filtering, ordering, and grouping of data.

See also

plot_sequence_colormap for a sequence color map plot.

Examples

data(TSSs_reduced) assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR") exp <- TSSs_reduced %>% tsr_explorer(genome_assembly=assembly) %>% format_counts(data_type="tss") p <- plot_sequence_logo(exp, distance=5)