Python package to quickly download genomes from the UCSC.
Project description
Python package to quickly download genomes from the UCSC.
How do I install this package?
As usual, just download it using pip:
pip install ucsc_genomes_downloader
Tests Coverage
Since some software handling coverages sometime get slightly different results, here’s three of them:
Usage examples
Simply instanziate a new genome
Create a new Genome object for the given genome hg19.
from ucsc_genomes_downloader import Genome
hg19 = Genome("hg19")
Downloading lazily a genome’s chromosome
Download mitochondrial genome “chromosome” for the genome “sacCer3” (downloads the chromosomes only when required).
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3")
chrM = sacCer3["chrM"] # Downloads and returns mitochondrial genome
Downloading eagerly a genome
Download all genome’s chromosomes immediately.
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_download=False)
Loading eagerly a genome
Load (and downloads if necessary) into RAM all the genome’s chromosomes immediately.
from ucsc_genomes_downloader import Genome
sacCer3 = Genome("sacCer3", lazy_load=False)
Testing if a genome is cached
if hg19.is_cached():
print("Genome is cached!")
Getting gaps regions
If you need a bed file containing the regions with gaps you can use:
all_gaps = hg19.gaps() # Returns gaps for all chromosomes
chrM_gaps = hg19.gaps(chromosomes=["chrM"]) # Returns gaps for chromosome chrM
Getting filled regions
If you need a bed file containing the filled regions you can use:
all_filled = hg19.filled() # Returns filled for all chromosomes
chrM_filled = hg19.filled(chromosomes=["chrM"]) # Returns filled for chromosome chrM
Getting BED sequences
Given a BED-like pandas dataframe, you can get the corresponding sequences as follows:
my_bed = pd.read_csv("path/to/my.bed", sep="\t")
sequences = hg19.bed_to_sequence(my_bed)
Removing genome’s cache
hg19.delete()
Utilities
Retrieving a list of the available genomes
You can get a complete list of the genomes available from the UCSC website with the following method:
from ucsc_genomes_downloader import get_available_genomes
all_genomes = get_available_genomes()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for ucsc_genomes_downloader-1.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5ff856a00ed048df5755efd0ffa89fab7d81caa23014370acfa34aff3185338 |
|
MD5 | ee0fbfc7978f92d99a923cc5705bf906 |
|
BLAKE2b-256 | c199304ce00f394093dc5245ea9b442c4730f0f55aebbe02561018e860d0ac41 |