plot_sequence_colormap.Rd
Make a color map for the sequences around TSSs.
plot_sequence_colormap( experiment, samples = "all", genome_assembly = NULL, threshold = NULL, use_normalized = FALSE, distance = 10, dominant = FALSE, data_conditions = NULL, ncol = 1, base_colors = c(A = "#109649", C = "#255C99", G = "#F7B32C", T = "#D62839"), font_size = 6, rasterize = FALSE, raster_dpi = 150, ... )
experiment | TSRexploreR object. |
---|---|
samples | A vector of sample names to analyze. |
genome_assembly | Genome assembly in FASTA or BSgenome format. |
threshold | TSSs or TSRs with a score below this value will not be considered. |
use_normalized | Whether to use the normalized (TRUE) or raw (FALSE) counts. |
distance | Bases to add on each side of each TSS. |
dominant | If TRUE, will only consider the highest-scoring TSS per gene, transcript, or TSR or highest-scoring TSR per gene or transcript. |
data_conditions | Apply advanced conditions to the data. |
ncol | Integer specifying the number of columns to arrange multiple plots. |
base_colors | Named vector specifying colors for each base. |
font_size | Size of text for plots. |
rasterize | Rasterize a ggplot. |
raster_dpi | If rasterization is set, this controls the rasterization DPI. |
... | Arguments passed to geom_tile. |
ggplot2 object of sequence colormap.
This plotting function generates a ggplot2 base color map for the sequences around TSSs. Color maps represent each base surrounding a TSS as a different color. Since the base composition for every TSS region can be seen in one plot, it's a good companion for sequence logos.
The color of each base is set using the 'base_colors' argument. The argument input should be a named vector, with the base as the name, and the desired color of the base as the vector element.
'genome_assembly' must be a valid genome assembly in either fasta or BSgenome format. fasta formatted genome assemblies should have the file extension '.fasta' or '.fa'. BSgenome assemblies are precompiled Bioconductor libraries for common organisms.
'distance' controls the length upstream and downstream of the TSS from which the sequence will be retrieved.
A set of functions to control data structure for plotting are included. 'threshold' will define the minimum number of reads a TSS or TSR must have to be considered. 'dominant' specifies whether only the dominant TSS or TSR is considered from the 'mark_dominant' function. For TSSs this can be either dominant per TSR or gene, and for TSRs it is just the dominant TSR per gene. 'data_conditions' allows for the advanced filtering, ordering, and grouping of data.
The plot can be rasterized using ggrastr using 'rasterize', and the rasterization DPI set using 'raster_dpi'.
plot_sequence_logo
to plot a sequence logo.
data(TSSs_reduced) assembly <- system.file("extdata", "S288C_Assembly.fasta", package="TSRexploreR") exp <- TSSs_reduced %>% tsr_explorer(genome_assembly=assembly) %>% format_counts(data_type="tss") p <- plot_sequence_colormap(exp, distance=5)