The track object itself is iterable and will yield the name of all chromosomes.
import track
with track.load('tracks/all_genes.sql') as t:
for chrom in t: print chrom
if 'chrY' in t: print 'Male'
if len(t) != 23: print 'Aneuploidy'
A list the value types that each feature in the track will contain. For instance:
['start', 'end', 'name', 'score', 'strand']
Setting this attribute will influence the behaviour of all future read() and write() calls.
A list of all available chromosome. For instance:
['chr1, 'chr2', 'chr3', 'chr4', 'chr5', 'chrC', 'chrM']
You cannot set this attribute. To add new chromosomes, just write() to them.
A dictionary of meta data associated to the track (information like the source, etc). For instance:
{'datatype': 'signal', 'source': 'SGD', 'orig_name': 'splice_sites.bed'}
Giving a name to your track is optional. The default name is the filename. This attribute is stored inside the info dictionary.
Giving a datatype to your track is optional. The default datatype is None. Other possible datatypes are features, signal or relational. Changing the datatype imposes some conditions on the entries that the track contains. This attribute is stored inside the info dictionary.
import track
with track.new('tmp/track.sql') as t:
t.datatype = 'signal'
Giving an assembly to your track is optional. However, if you set this variable for your track, you should input a valid assembly name such as ‘sacCer2’. Doing so will set the chrmeta attribute and rename all the chromosome to their canonical names if a correspondence is found. This attribute is also stored inside the info dictionary.
import track
track.convert('tracks/genes.bed', 'tracks/genes.sql')
with track.load('tracks/genes.sql') as t:
t.assembly = 'hg19'
Contains extra chromosomal meta data such as chromosome length information. chrmeta is a dictionary where each key is a chromosome names. For instance:
{'chr1': {'length': 197195432}, 'chr2': {'length': 129993255}}
You would hence use it like this:
import track
with track.load('tmp/track.sql') as t:
print t.chrmeta['chr1']['length']
Of course, genomic formats such as bed cannot store this kind of meta data. Hence, when loading tracks in these text formats, this information is lost once the track is closed.
Read data from the track.
Parameters: |
|
---|---|
Returns: | a generator object yielding rows. A row can be referenced like a tuple or like a dictionary. |
selection can be the name of a chromosome, in which case all the data on that chromosome will be returned.
selection can be left empty, then the data from all chromosome is returned.
selection can also be a dictionary specifying: regions, score intervals or strands. If you specify a region in which case only features contained in that region will be returned. But you can also input a tuple specifying a score interval in which case only features contained in those score boundaries will be returned. You can even specify a strand. The dictionary can contain one or several of these arguments. See code example for more details.
Adding the parameter 'inclusion':'strict' to a region dictionary will return only features exactly contained inside the interval instead of features simply included in the interval. To combine multiple selections you can specify a list including chromosome names and region dictionaries. As expected, if such is the case, the joined data from those selections will be returned with an added chr field in front since the results may span several chromosomes.
import track
with track.load('tracks/example.sql') as t:
data = t.read()
data = t.read('chr2')
data = t.read('chr3', ['name', 'strand'])
data = t.read(['chr1','chr2','chr3'])
data = t.read({'chr':'chr1', 'start':100})
data = t.read({'chr':'chr1', 'start':10000, 'end':15000})
data = t.read({'chr':'chr1', 'start':10000, 'end':15000, 'inclusion':'strict'})
data = t.read({'chr':'chr1', 'strand':1})
data = t.read({'chr':'chr1', 'score':(10,100)})
data = t.read({'chr':'chr1', 'start':10000, 'end':15000, 'strand':-1, 'score':(10,100)})
data = t.read({'chr':'chr5', 'start':0, 'end':200}, ['strand', 'start', 'score'])
Write data to a genomic file. Will write many feature at once into a given chromosome.
Parameters: |
|
---|---|
Returns: | None |
import track
with track.load('tracks/example.sql') as t:
t.write('chr1', [(10, 20, 'A', 0.0, 1), (40, 50, 'B', 0.0, -1)])
with track.load('tracks/example.sql') as t:
def example_generator():
for i in xrange(5):
yield (10, 20, 'X')
t.write('chr2', example_generator(), fields=['start','end','name'])
with track.load('tracks/new.sql') as t2:
with track.load('tracks/orig.sql') as t1:
t1.write('chr1', t2.read('chr1'))
Store the changes that were applied to the track on the disk. If the track was loaded from a text file such as ‘bed’, the file is rewritten with the changes included. If the track was loaded as an SQL file, the changes are committed to the database. Calling rollback will revert all changes to the track since the last call to save(). By default, when the track is closed, all changes are saved.
Returns: | None |
---|
import track
with track.load('tracks/rp_genes.bed') as t:
t.remove('chr19_gl000209_random')
t.save()
Revert all changes to the track since the last call to save().
Returns: | None |
---|
import track
with track.load('tracks/rp_genes.bed') as t:
t.remove('chr19_gl000209_random')
t.export('tmp/clean.bed')
t.rollback()
Rebuilds the database making it shrink in file size. This method is useful when, after having executed many inserts, updates, and deletes, the SQLite file is fragmented and full of empty space.
Returns: | None |
---|
import track
with track.load('tracks/rp_genes.bed') as t:
t.remove('chr19_gl000209_random')
t.vaccum()
Close the current track. This method is useful when for some special reason you are not using the with ... as` form for loading tracks.
Returns: | None |
---|
import track
t = track.load('tracks/rp_genes.bed')
t.remove('chr19_gl000209_random')
t.close()
Create a new sqlite3 cursor object connected to the track database. You can use this attribute to make your own SQL queries and fetch the results. More information is available on the sqlite3 documentation pages.
Returns: | A new sqlite3 cursor object |
---|
import track
with track.load('tracks/rp_genes.sql') as rpgenes:
cursor = rpgenes.cursor()
cursor.execute("select name from sqlite_master where type='table'")
results = cursor.fetchall()
Export the current track to a given format. A new file is created at the specified path. The current track object is unchanged
Parameters: | |
---|---|
Returns: | None |
import track
with track.load('tracks/rp_genes.bed') as t:
t.remove('chr19_gl000209_random')
t.export('tmp/clean.bed')
t.rollback()
Remove data from a given chromosome.
Parameters: | chromosome (string) – is the name of the chromosome that one wishes to delete or a list of chromosomes to delete. |
---|---|
Returns: | None. |
import track
with track.load('tracks/example.sql') as t:
t.remove('chr1')
with track.load('tracks/example.sql') as t:
t.remove(['chr1', 'chr2', 'chr3'])
Rename a chromosome from previous_name to new_name
Parameters: | |
---|---|
Returns: | None. |
import track
with track.load('tracks/rp_genes.bed') as t:
t.rename('chr4', 'chrIV')
Search for parameters inside your track. You can specify several parameters.
Parameters: |
|
---|---|
Returns: | a generator object yielding rows. A row can be referenced like a tuple or like a dictionary. |
import track
with track.load('tracks/rp_genes.bed') as t:
results = t.search({'gene_id':3})
results = t.search({'gene_id':3, 'gene_name':'YCCA3'}, 'chr1')
Count the number of features or entries in a given selection.
Parameters: | selection – is the name of a chromosome, a list of chromosomes, a particular span or a list of spans. In other words, a value similar to the selection parameter of the read method. If left empty, will count every feature in a track |
---|---|
Returns: | an integer. |
import track
with track.load('tracks/example.sql') as t:
num = t.count('chr1')
num = t.count(['chr1','chr2','chr3'])
num = t.count({'chr':'chr1', 'start':10000, 'end':15000})
Remove the given fields from all chromosomes. This is equivalent to dropping full columns in the database.
Parameters: | fields (list) – A list of fields such as ['score','strand']. |
---|---|
Returns: | None. |
import track
with track.load('tracks/example.sql') as t:
print t.fields
t.delete_fields(['score','strand'])
print t.fields
Set the chrmeta attribute of the track by loading a chromosome file. The chromosome file is structured as tab-separated text file containing two columns: the first specifies a chromosomes name and the second its length as an integer.
Parameters: | path (string) – is the file path to the chromosome file to load. |
---|---|
Returns: | None. |
Output the information contained in the chrmeta attribute into a plain text file. The chromosome file is structured as tab-separated text file containing two columns: the first specifies a chromosomes name and the second its length as an integer
Parameters: | path (string) – is the file path to the chromosome file to create. |
---|---|
Returns: | None. |
Create an iterable with as many elements as there are base pairs in the chromosomes specified by the chromosome parameter. Every element of the iterable is a float indicating the score at that position. If the track has no score associated, ones are inserted where features are present.
Parameters: | chromosome (string) – is the name of the chromosome on which one wants to create a score vector from. |
---|---|
Returns: | an iterable yielding floats. |
import track
with track.new('tmp/track.sql') as t:
scores = t.get_full_score_vector('chr1')
Create an iterable with as many elements as there are base pairs in the interval between start and end. Every element of the iterable is a float indicating the score at that position. If the track has no score associated, ones are inserted where features are present.
Parameters: |
|
---|---|
Returns: | an iterable yielding floats. |
import track
with track.new('tmp/track.sql') as t:
scores = t.get_partial_score_vector('chr1', 100, 200)
Convert all entries of a track from the UCSC standard to the Ensembl standard effectively adding one to every start position.
Returns: | None. |
---|
import track
with track.load('tracks/example.sql') as t:
t.ucsc_to_ensembl()
Converts all entries of a track from the Ensembl standard to the UCSC standard effectively subtracting one from every start position.
Returns: | None. |
---|
import track
with track.load('tracks/rp_genes.bed') as t:
t.ensembl_to_ucsc()
Convert the name of all chromosomes from the roman numeral standard to the arabic numeral standard. For instance, ‘chrI’ will become ‘chr1’ while ‘chrII’ will become ‘chr2’, etc.
Parameters: | names (dict) – an optional dictionary specifying how to translate particular cases. Example: {'chrM':'chrQ', '2micron':'chrR'} |
---|---|
Returns: | None. |
import track
with track.new('tmp/track.sql') as t:
scores = t.roman_to_integer()
Convert the name of all chromosomes from the arabic numeral standard to the roman numeral standard. For instance, ‘chr1’ will become ‘chrI’ while ‘chr2’ will become ‘chrII’, etc.
Parameters: | names (dict) – an optional dictionary specifying how to translate particular cases. Example: {'chrQ':'chrM', 'chrR':'2micron'} |
---|---|
Returns: | None. |
import track
with track.new('tmp/track.sql') as t:
scores = t.roman_to_integer()