UCSC liftover tool
Documentation for UCSC liftover.
Issue: separate peaks map to same coordinates after liftover
- When using the tool to liftover regions from hg19 to hg38, separate peaks in specific regions of hg19 map to overlapping coordinates after liftover.
- This is because some contigs were not carried forward from hg19 to hg38, due to problems with the hg19 assembly that were resolved in hg38
- All lifted over peaks with overlapping coordinates should be removed from analysis because the peaks were called on an uncertain region in the old genome assembly
- Issues like this likely apply to all genome updates, not only human
- The same issue also applies with Ensembl assembly converter
Solution
Remove any peaks with overlapping coordinates after liftover before using the lifted over peak file:
#!/bin/bash
module load bedtools
EXTL=("../data/external/")
# Sort the lifted over peakfile for use with bedtools
sort -k1,1 -k2,2n ${EXTL}wong_fig3c_peaks_GRCh38.bed > ${EXTL}peaks.tmp && mv ${EXTL}peaks.tmp ${EXTL}wong_fig3c_peaks_GRCh38.bed
# Bedtools merge count rows contributing to merged peaks (overlapping peaks will have count > 1)
bedtools merge -i ${EXTL}wong_fig3c_peaks_GRCh38.bed -c 1 -o count > ${EXTL}counted.bed
# Get non-overlapping peaks
awk '/\t1$/{print}' ${EXTL}counted.bed > ${EXTL}filtered.bed
# Intersect original file with non-overlapping peaks and output overlapping peaks
bedtools intersect -wa -a ${EXTL}wong_fig3c_peaks_GRCh38.bed -b ${EXTL}filtered.bed > ${EXTL}wong_fig3c_peaks_GRCh38_correct_liftover.bed
bedtools intersect -v -a ${EXTL}wong_fig3c_peaks_GRCh38.bed -b ${EXTL}filtered.bed > ${EXTL}wong_fig3c_peaks_GRCh38_overlapping.bed