Using the Track object

class track.Track(path, readonly=False, autosave=True, orig_path=None, orig_format=None)[source]

The track object itself is iterable and will yield the name of all chromosomes.

import track
with track.load('tracks/all_genes.sql') as t:
    for chrom in t: print chrom
    if 'chrY' in t: print 'Male'
    if len(t) != 23: print 'Aneuploidy'

Track attributes

Track.fields[source]

A list the value types that each feature in the track will contain. For instance:

['start', 'end', 'name', 'score', 'strand']

Setting this attribute will influence the behaviour of all future read() and write() calls.

Track.chromosomes[source]

A list of all available chromosome. For instance:

['chr1, 'chr2', 'chr3', 'chr4', 'chr5', 'chrC', 'chrM']

You cannot set this attribute. To add new chromosomes, just write() to them.

Track.info[source]

A dictionary of meta data associated to the track (information like the source, etc). For instance:

{'datatype': 'signal', 'source': 'SGD', 'orig_name': 'splice_sites.bed'}

Track.name[source]

Giving a name to your track is optional. The default name is the filename. This attribute is stored inside the info dictionary.

Track.datatype[source]

Giving a datatype to your track is optional. The default datatype is None. Other possible datatypes are features, signal or relational. Changing the datatype imposes some conditions on the entries that the track contains. This attribute is stored inside the info dictionary.

import track
with track.new('tmp/track.sql') as t:
    t.datatype = 'signal'
Track.assembly[source]

Giving an assembly to your track is optional. However, if you set this variable for your track, you should input a valid assembly name such as ‘sacCer2’. Doing so will set the chrmeta attribute and rename all the chromosome to their canonical names if a correspondence is found. This attribute is also stored inside the info dictionary.

import track
track.convert('tracks/genes.bed', 'tracks/genes.sql')
with track.load('tracks/genes.sql') as t:
    t.assembly = 'hg19'
Track.chrmeta[source]

Contains extra chromosomal meta data such as chromosome length information. chrmeta is a dictionary where each key is a chromosome names. For instance:

{'chr1': {'length': 197195432}, 'chr2': {'length': 129993255}}

You would hence use it like this:

import track
with track.load('tmp/track.sql') as t:
    print t.chrmeta['chr1']['length']

Of course, genomic formats such as bed cannot store this kind of meta data. Hence, when loading tracks in these text formats, this information is lost once the track is closed.

Track.modified[source]

A boolean value which indicates if the track has been changed since it was opened. This value is set to False when you load a track and is set to True as soon, as you write, rename or remove. Changing the info or chrmeta attributes will also set this value to True.

Track methods

Track.read(selection=None, fields=None, order='')[source]

Read data from the track.

Parameters:
  • selection – A chromosome name, or a dictionary specifying a region, see below.
  • fields (list of strings) – is an optional list of fields which will influence the length of the tuples returned and the way in which the information is returned. The default is to read every field available for the given chromosome. If the track.fields attribute is set, that will be used.
  • order (comma-separated string) – is an optional sublist of fields which will influence the order in which the tuples are yielded. By default results are not sorted.
Returns:

a generator object yielding rows. A row can be referenced like a tuple or like a dictionary.

selection can be the name of a chromosome, in which case all the data on that chromosome will be returned.

selection can be left empty, then the data from all chromosome is returned.

selection can also be a dictionary specifying: regions, score intervals or strands. If you specify a region in which case only features contained in that region will be returned. But you can also input a tuple specifying a score interval in which case only features contained in those score boundaries will be returned. You can even specify a strand. The dictionary can contain one or several of these arguments. See code example for more details.

Adding the parameter 'inclusion':'strict' to a region dictionary will return only features exactly contained inside the interval instead of features simply included in the interval. To combine multiple selections you can specify a list including chromosome names and region dictionaries. As expected, if such is the case, the joined data from those selections will be returned with an added chr field in front since the results may span several chromosomes.

import track
with track.load('tracks/example.sql') as t:
    data = t.read()
    data = t.read('chr2')
    data = t.read('chr3', ['name', 'strand'])
    data = t.read(['chr1','chr2','chr3'])
    data = t.read({'chr':'chr1', 'start':100})
    data = t.read({'chr':'chr1', 'start':10000, 'end':15000})
    data = t.read({'chr':'chr1', 'start':10000, 'end':15000, 'inclusion':'strict'})
    data = t.read({'chr':'chr1', 'strand':1})
    data = t.read({'chr':'chr1', 'score':(10,100)})
    data = t.read({'chr':'chr1', 'start':10000, 'end':15000, 'strand':-1, 'score':(10,100)})
    data = t.read({'chr':'chr5', 'start':0, 'end':200}, ['strand', 'start', 'score'])
Track.write(chromosome, data, fields=None)[source]

Write data to a genomic file. Will write many feature at once into a given chromosome.

Parameters:
  • chromosome (string) – is the name of the chromosome on which one wants to write. For instance, if one is using the BED format this will become the first column, while if one is using the SQL format this will become the name of the table to be created.
  • data (an iteratable) – must be an iterable object that yields tuples or rows of the correct length. As an example, the read function of this class produces such objects. data can have a fields attribute describing what the different elements of the tuple represent. data can also simply be a list of tuples.
  • fields (list of strings) – is a parameter describing what the different elements in data represent. It is optional and is used only if data doesn’t already have a fields attribute.
Returns:

None

import track
with track.load('tracks/example.sql') as t:
    t.write('chr1', [(10, 20, 'A', 0.0, 1), (40, 50, 'B', 0.0, -1)])
with track.load('tracks/example.sql') as t:
    def example_generator():
        for i in xrange(5):
            yield (10, 20, 'X')
    t.write('chr2', example_generator(), fields=['start','end','name'])
with track.load('tracks/new.sql') as t2:
    with track.load('tracks/orig.sql') as t1:
        t1.write('chr1', t2.read('chr1'))
Track.save()[source]

Store the changes that were applied to the track on the disk. If the track was loaded from a text file such as ‘bed’, the file is rewritten with the changes included. If the track was loaded as an SQL file, the changes are committed to the database. Calling rollback will revert all changes to the track since the last call to save(). By default, when the track is closed, all changes are saved.

Returns:None
import track
with track.load('tracks/rp_genes.bed') as t:
    t.remove('chr19_gl000209_random')
    t.save()
Track.rollback()[source]

Revert all changes to the track since the last call to save().

Returns:None
import track
with track.load('tracks/rp_genes.bed') as t:
    t.remove('chr19_gl000209_random')
    t.export('tmp/clean.bed')
    t.rollback()
Track.vacuum()[source]

Rebuilds the database making it shrink in file size. This method is useful when, after having executed many inserts, updates, and deletes, the SQLite file is fragmented and full of empty space.

Returns:None
import track
with track.load('tracks/rp_genes.bed') as t:
    t.remove('chr19_gl000209_random')
    t.vaccum()
Track.close()[source]

Close the current track. This method is useful when for some special reason you are not using the with ... as` form for loading tracks.

Returns:None
import track
t = track.load('tracks/rp_genes.bed')
t.remove('chr19_gl000209_random')
t.close()
Track.cursor()[source]

Create a new sqlite3 cursor object connected to the track database. You can use this attribute to make your own SQL queries and fetch the results. More information is available on the sqlite3 documentation pages.

Returns:A new sqlite3 cursor object
import track
with track.load('tracks/rp_genes.sql') as rpgenes:
    cursor = rpgenes.cursor()
    cursor.execute("select name from sqlite_master where type='table'")
    results = cursor.fetchall()
Track.export(path, format=None)[source]

Export the current track to a given format. A new file is created at the specified path. The current track object is unchanged

Parameters:
  • path (string) – is the path to track file to create.
  • format (string) – is an optional parameter specifying the format of the track to create when it cannot be guessed from the file extension.
Returns:

None

import track
with track.load('tracks/rp_genes.bed') as t:
    t.remove('chr19_gl000209_random')
    t.export('tmp/clean.bed')
    t.rollback()
Track.remove(chromosome)[source]

Remove data from a given chromosome.

Parameters:chromosome (string) – is the name of the chromosome that one wishes to delete or a list of chromosomes to delete.
Returns:None.
import track
with track.load('tracks/example.sql') as t:
    t.remove('chr1')
with track.load('tracks/example.sql') as t:
    t.remove(['chr1', 'chr2', 'chr3'])
Track.rename(previous_name, new_name)[source]

Rename a chromosome from previous_name to new_name

Parameters:
  • previous_name (string) – is the name of the chromosome that one wishes to rename.
  • new_name (string) – is the name that that chromosome will now be referred by.
Returns:

None.

import track
with track.load('tracks/rp_genes.bed') as t:
    t.rename('chr4', 'chrIV')
Track.search(query_dict, fields=None, chromosome=None, exact_match=False)[source]

Search for parameters inside your track. You can specify several parameters.

Parameters:
  • selection – list of the fields you want to have in the result (to insure that all result will have the same number of columns)
  • query_dict (dict) – A dictionary specifying keys and values to search for. See examples.
  • chromosome (string) – Optionally, the name of the chromosome on which one wants to search. If None, the search is performed on all chromosomes and every feature contains a new field specifying its chromosome.
  • exact_match (bool) – By default, will find all entries which contain the query. If set to True, will only find entries that exactly match the query.
Returns:

a generator object yielding rows. A row can be referenced like a tuple or like a dictionary.

import track
with track.load('tracks/rp_genes.bed') as t:
    results = t.search({'gene_id':3})
    results = t.search({'gene_id':3, 'gene_name':'YCCA3'}, 'chr1')
Track.count(selection=None)[source]

Count the number of features or entries in a given selection.

Parameters:selection – is the name of a chromosome, a list of chromosomes, a particular span or a list of spans. In other words, a value similar to the selection parameter of the read method. If left empty, will count every feature in a track
Returns:an integer.
import track
with track.load('tracks/example.sql') as t:
    num = t.count('chr1')
    num = t.count(['chr1','chr2','chr3'])
    num = t.count({'chr':'chr1', 'start':10000, 'end':15000})
Track.delete_fields(fields)[source]

Remove the given fields from all chromosomes. This is equivalent to dropping full columns in the database.

Parameters:fields (list) – A list of fields such as ['score','strand'].
Returns:None.
import track
with track.load('tracks/example.sql') as t:
    print t.fields
    t.delete_fields(['score','strand'])
    print t.fields
Track.load_chr_file(path)[source]

Set the chrmeta attribute of the track by loading a chromosome file. The chromosome file is structured as tab-separated text file containing two columns: the first specifies a chromosomes name and the second its length as an integer.

Parameters:path (string) – is the file path to the chromosome file to load.
Returns:None.
Track.export_chr_file(path)[source]

Output the information contained in the chrmeta attribute into a plain text file. The chromosome file is structured as tab-separated text file containing two columns: the first specifies a chromosomes name and the second its length as an integer

Parameters:path (string) – is the file path to the chromosome file to create.
Returns:None.
Track.get_full_score_vector(chromosome)[source]

Create an iterable with as many elements as there are base pairs in the chromosomes specified by the chromosome parameter. Every element of the iterable is a float indicating the score at that position. If the track has no score associated, ones are inserted where features are present.

Parameters:chromosome (string) – is the name of the chromosome on which one wants to create a score vector from.
Returns:an iterable yielding floats.
import track
with track.new('tmp/track.sql') as t:
    scores = t.get_full_score_vector('chr1')
Track.get_partial_score_vector(chromosome, start, end)[source]

Create an iterable with as many elements as there are base pairs in the interval between start and end. Every element of the iterable is a float indicating the score at that position. If the track has no score associated, ones are inserted where features are present.

Parameters:
  • chromosome (string) – is the name of the chromosome on which one wants to create a score vector from.
  • start (int) – The base pair position where scores will start being read from. Defaults to 0.
  • end (int) – The base pair position where scores will stop being read from. Defaults to the length of the chromosome.
Returns:

an iterable yielding floats.

import track
with track.new('tmp/track.sql') as t:
    scores = t.get_partial_score_vector('chr1', 100, 200)
Track.ucsc_to_ensembl()[source]

Convert all entries of a track from the UCSC standard to the Ensembl standard effectively adding one to every start position.

Returns:None.
import track
with track.load('tracks/example.sql') as t:
    t.ucsc_to_ensembl()
Track.ensembl_to_ucsc()[source]

Converts all entries of a track from the Ensembl standard to the UCSC standard effectively subtracting one from every start position.

Returns:None.
import track
with track.load('tracks/rp_genes.bed') as t:
    t.ensembl_to_ucsc()
Track.roman_to_integer(names=None)[source]

Convert the name of all chromosomes from the roman numeral standard to the arabic numeral standard. For instance, ‘chrI’ will become ‘chr1’ while ‘chrII’ will become ‘chr2’, etc.

Parameters:names (dict) – an optional dictionary specifying how to translate particular cases. Example: {'chrM':'chrQ', '2micron':'chrR'}
Returns:None.
import track
with track.new('tmp/track.sql') as t:
    scores = t.roman_to_integer()
Track.integer_to_roman(names=None)[source]

Convert the name of all chromosomes from the arabic numeral standard to the roman numeral standard. For instance, ‘chr1’ will become ‘chrI’ while ‘chr2’ will become ‘chrII’, etc.

Parameters:names (dict) – an optional dictionary specifying how to translate particular cases. Example: {'chrQ':'chrM', 'chrR':'2micron'}
Returns:None.
import track
with track.new('tmp/track.sql') as t:
    scores = t.roman_to_integer()

Table Of Contents

Previous topic

Loading a track

Next topic

Manipulating tracks

This Page