Python package to quickly download genomes from the UCSC.
Project description
Python package to quickly download genomes from the UCSC.
How do I install this package?
As usual, just download it using pip:
pip install ucsc_genomes_downloader
Tests Coverage
Since some software handling coverages sometime get slightly different results, here’s three of them:
Usage examples
Simply instanziate a new genome
Create a new Genome object for the given genome hg19.
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")
Downloading lazily a genome’s chromosome
Download mitochondrial genome “chromosome” for the genome “sacCer3” (downloads the chromosomes only when required).
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3")
chrM = sacCer3["chrM"] # Downloads and returns mitochondrial genome
Downloading eagerly a genome
Download all genome’s chromosomes immediately.
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_download=False)
Loading eagerly a genome
Load (and downloads if necessary) into RAM all the genome’s chromosomes immediately.
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_load=False)
Testing if a genome is cached
if hg19.is_cached():
print("Genome is cached!")
Getting gaps regions
If you need a bed file containing the regions with gaps you can use:
all_gaps = hg19.gaps() # Returns gaps for all chromosomes
chrM_gaps = hg19.gaps(chromosomes=["chrM"]) # Returns gaps for chromosome chrM
Getting filled regions
If you need a bed file containing the filled regions you can use:
all_filled = hg19.filled() # Returns filled for all chromosomes
chrM_filled = hg19.filled(chromosomes=["chrM"]) # Returns filled for chromosome chrM
Getting BED sequences
Given a BED-like pandas dataframe, you can get the corresponding sequences as follows:
my_bed = pd.read_csv("path/to/my.bed", sep="\t")
sequences = hg19.bed_to_sequence(my_bed)
Removing genome’s cache
hg19.delete()
Utilities
Retrieving a list of the available genomes
You can get a complete list of the genomes available from the UCSC website with the following method:
from ucsc_genomes_downloader import get_available_genomes
all_genomes = get_available_genomes()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for ucsc_genomes_downloader-1.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 97660430aca70769e7c75decdf0c21f3bc8d659346a9f16aa7569b858fff5635 |
|
MD5 | 47b6ef74392c938116ef5d41b6d33e98 |
|
BLAKE2b-256 | f83204300e97a33ba146a7ec4fcaa62f4aff16918ed0f36ef781dc1c5790225d |