seqenv.seqsearch package

Submodules

seqenv.seqsearch.blast module

class seqenv.seqsearch.blast.BLASTquery(query_path, db_path, params=None, algorithm='blastn', version='plus', out_path=None, executable=None)[source]

Bases: object

A blast job. Possibly the standard BLAST algorithm or BLASTP or BLASTX etc. Typically you could use it like this:

import sys, os records_path = os.path.expanduser(sys.argv[1]) centers_path = ‘centers.fasta’ db = parallelblast.BLASTdb(centers_path) db.makeblastdb() params = {‘executable’: “~/share/blastplus/blastn”,

‘-outfmt’: 0, ‘-evalue’: 1e-2, ‘-perc_identity’: 97, ‘-num_threads’: 16}

search = parallelblast.BLASTquery(records_path, db, params) search.run()

You can also call search.non_block_run() to run maybe searches in parallel.

command
filter(filtering)[source]

We can do some special filtering on the results. For the moment only minimum coverage.

non_block_run()[source]

Special method to run the query in a thread without blocking.

results

Parse the results.

run()[source]
wait()[source]

If you have run the query in a non-blocking way, call this method to pause until the query is finished.

seqenv.seqsearch.parallel module

class seqenv.seqsearch.parallel.ParallelSeqSearch(input_fasta, seq_type, database, algorithm='blast', num_threads=None, filtering=None, out_path=None)[source]

Bases: seqenv.seqsearch.SeqSearch

The same thing as a SeqSearch but operates by chopping the input up into smaller pieces and running the algorithm on each piece separately, finally joining the outputs. In addition, the pieces can be run separately on the local machine, or distributed to different compute nodes using the SLURM system.

blast_queries

Make all BLAST search objects.

join_outputs()[source]

Join the outputs

queries

A list of all the queries to run.

run()[source]

Run the search

splitable

The input fasta file, but with the ability to split it.

vsearch_queries

Make all VSEARCH search objects.

seqenv.seqsearch.vsearch module

class seqenv.seqsearch.vsearch.VSEARCHquery(query_path, db_path, params=None, out_path=None, executable=None)[source]

Bases: object

A vsearch job.

command
run()[source]

Module contents

class seqenv.seqsearch.SeqSearch(input_fasta, seq_type, database, algorithm='blast', num_threads=None, filtering=None, out_path=None)[source]

Bases: object

A sequence similarity search. Could use different algorithms such as BLAST, VSEARCH, BLAT etc.

Input: - List of sequences in a FASTA file
  • The type of the sequences

  • A database to search against

  • The type of algorithm to use

  • Number of threads to use

  • The desired output path

  • The filtering options: * BLAST supported: - Minimum identity

    • E value
    • Maximum targets
    • Minimum query coverage (via manual output format)
    • VSEARCH supported: - ?

Output: - Sorted list of identifiers in the database (object with significance value and identity attached)

blast_params

A dictionary of options to pass to the blast executable. The params should depend on the filtering options.

blast_query

Make a BLAST search object.

filter()[source]

Filter the results accordingly

query

The similarity search object with all the relevant parameters.

results

Parse the results.

run()[source]

Run the search

vsearch_query

Make a VSEARCH search object.