Don’t Skip the Repeats
By Jeffrey Hyacinthe
Junk DNA: vestigial remains or the genome’s dark matter? For a long time, genetic repeats and transposable elements were characterized as such – useless, nuisances and unknowable. There was some sense to that vision. Once the 20 000 genes of the human genome were identified, these being the parts that actually did something, what could explain all the rest? Current research points towards regulation. There are promoters, enhancers and sequences contributing to structural integrity, but what about all the repeated sequences? In recent years, the body of evidence for the role and importance of repeats has grown considerably such that they should no longer be ignored.
What are repeats?
Repetitive sequences include simple repeats and transposable elements (TE). These genomic sequences are duplicated throughout the genome and encompass more than 50% of the human genome. Simple repeats are short repeated sequences in succession while TEs are DNA sequences that are able to relocate within the genome. The two main methods of transposition define the two TE classes. Retrotransposons transpose through a “copy and paste” mechanism whereby DNA is transcribed to RNA and that RNA is reverse-transcribed elsewhere within the genome. DNA transposons use a “cut and paste” mechanism where a DNA intermediate is used1. Their various origins, along with the mutations they undergo, has led to a complex phylogeny of TE families and subfamilies, each with their own properties and features. Through these two mechanisms, TEs spread multiple copies of themselves throughout the genome, most having lost the ability to be expressed or be further transposed.
Why were they overlooked?
While DNA sequences that hop around might seem like obvious elements of interest, it is worth noting that their repeated presence makes their analysis a challenge. Most sequencing approaches rely on mapping reads to a reference genome. Since repeated fragments map to multiple locations, TEs cannot be appropriately placed and are often discarded2.
In addition, TEs transpose simply because they can. Where they end up could disrupt normal gene function, but for the most part they do not affect anything and become degraded by genetic drift. In summary, TEs are mostly silent genomic sequences that do not code for relevant host genes, degrade over time and challenge our current genomic analysis approaches. They are not the most intuitive elements, are they?
Why are they worth considering?
Current approaches building upon databases such as repeatmasker3 enable increasingly accurate TE measurements that reveal their involvement with regulatory activity. In fact, the evidence of their role in regulation is so strong that instead of doubting TE’s utility, the question is now how wide-reaching is their influence? The most definitive impact of TEs is co-option, the integration of TEs as part of the host regulatory genome. For example, a MER41 TE integrated an interferon-inducible enhancer previously absent in a melanoma 2 (AIM2) gene regulating inflammation from viral infection. It has also been found that some TEs can still be expressed and their transcripts interact as non-coding RNA, which can lead to regulation of distant genes4. Furthermore, TE insertions are not random due to their preferences for various genome features and compartments1. Thus, TEs can be associated with other features of the genome such as the epigenome. LINE-1 TEs have been found to be hypomethylated in cancer and without methylation TEs are more likely to be expressed. The resulting increase in expression could be used as a cancer biomarker and lead to clinical applications5. In some of my own preliminary work, I find that TEs tend to be differentially represented across cell tissue types in histone mark ChIP-seq, suggesting that TEs involvement with regulation may be cell type specific. These examples are only a few of the various ways in which TEs have, and continue, to shape our genome.
It is worth remembering that even if repeats account for the majority of the genome, they are not the answer to everything. They still remain largely outside of genes and are mostly inactive. However, they could also be the overlooked component that just might explain your latest genetic discovery.
References
- 1. Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, (2018).
- 2. Goerner-Potvin, P. & Bourque, G. Computational tools to unmask transposable elements. Nat. Rev. Genet. 19, 688–704 (2018).
- 3. Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0. (2013).
- 4. Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
- 5. Ardeljan, D., Taylor, M. S., Ting, D. T. & Burns, K. H. The Human Long Interspersed Element-1 Retrotransposon: An Emerging Biomarker of Neoplasia. Clin. Chem. 63, 816–822 (2017).