The genomes software is a package written in Python. It is designed to provide easy access to some meta data about genome assemblies. You can retrieve the list of chromosomes for any assembly. In addition, you can get the lengths in base pairs of every chromosome.
To install you can simply type:
$ sudo easy_install genomes
That’s it. However, if that doesn’t work because you don’t have sufficient permissions, you can simply install it somewhere else (for instance in your home):
$ cd ~
$ pip install -e git+https://github.com/xapple/genomes
Here are all the things you can do with it:
from genomes import Assembly
a = Assembly('sacCer2')
print a.chrmeta
print a.guess_chromsome_name('chr1')
The only object provided by the library.
Parameters: | assembly (string) – A valid assembly name. |
---|
A dictionary of chromosome metadata:
>>> from genomes import Assembly
>>> a = Assembly('TAIR10')
>>> print a.chrmeta
{u'c': {'length': 154478, 'refseq': u'NC_000932.1'}, u'm': {'length': 366924, 'refseq': u'NC_001284.2'}, u'1': {'length': 30427671, 'refseq': u'NC_003070.9'}, u'3': {'length': 23459830, 'refseq': u'NC_003074.8'}, u'2': {'length': 19698289, 'refseq': u'NC_003071.7'}, u'5': {'length': 26975502, 'refseq': u'NC_003076.8'}, u'4': {'length': 18585056, 'refseq': u'NC_003075.7'}}
Searches the assembly for chromosome synonym names, and returns the canonical name of the chromosome.
Parameters: | chromosome_name (string) – Any given name for a chromosome in this assembly. |
---|---|
Returns: | The same or an other name for the chromosome. Returns None if the chromosome is not known about. |
>>> from genomes import Assembly
>>> a = Assembly('sacCer2')
>>> print a.guess_chromosome_name('chrR')
2micron
>>> a = Assembly('hg19')
>>> print a.guess_chromosome_name('NC_000023.9')
chrX