seqenv.seqsearch package¶
Submodules¶
seqenv.seqsearch.blast module¶
-
class
seqenv.seqsearch.blast.
BLASTquery
(query_path, db_path, params=None, algorithm='blastn', version='plus', out_path=None, executable=None)[source]¶ Bases:
object
A blast job. Possibly the standard BLAST algorithm or BLASTP or BLASTX etc. Typically you could use it like this:
import sys, os records_path = os.path.expanduser(sys.argv[1]) centers_path = ‘centers.fasta’ db = parallelblast.BLASTdb(centers_path) db.makeblastdb() params = {‘executable’: “~/share/blastplus/blastn”,
‘-outfmt’: 0, ‘-evalue’: 1e-2, ‘-perc_identity’: 97, ‘-num_threads’: 16}search = parallelblast.BLASTquery(records_path, db, params) search.run()
You can also call search.non_block_run() to run maybe searches in parallel.
-
command
¶
-
filter
(filtering)[source]¶ We can do some special filtering on the results. For the moment only minimum coverage.
-
results
¶ Parse the results.
-
seqenv.seqsearch.parallel module¶
-
class
seqenv.seqsearch.parallel.
ParallelSeqSearch
(input_fasta, seq_type, database, algorithm='blast', num_threads=None, filtering=None, out_path=None)[source]¶ Bases:
seqenv.seqsearch.SeqSearch
The same thing as a SeqSearch but operates by chopping the input up into smaller pieces and running the algorithm on each piece separately, finally joining the outputs. In addition, the pieces can be run separately on the local machine, or distributed to different compute nodes using the SLURM system.
-
blast_queries
¶ Make all BLAST search objects.
-
queries
¶ A list of all the queries to run.
-
splitable
¶ The input fasta file, but with the ability to split it.
-
vsearch_queries
¶ Make all VSEARCH search objects.
-
seqenv.seqsearch.vsearch module¶
Module contents¶
-
class
seqenv.seqsearch.
SeqSearch
(input_fasta, seq_type, database, algorithm='blast', num_threads=None, filtering=None, out_path=None)[source]¶ Bases:
object
A sequence similarity search. Could use different algorithms such as BLAST, VSEARCH, BLAT etc.
- Input: - List of sequences in a FASTA file
The type of the sequences
A database to search against
The type of algorithm to use
Number of threads to use
The desired output path
The filtering options: * BLAST supported: - Minimum identity
- E value
- Maximum targets
- Minimum query coverage (via manual output format)
- VSEARCH supported: - ?
Output: - Sorted list of identifiers in the database (object with significance value and identity attached)
-
blast_params
¶ A dictionary of options to pass to the blast executable. The params should depend on the filtering options.
-
blast_query
¶ Make a BLAST search object.
-
query
¶ The similarity search object with all the relevant parameters.
-
results
¶ Parse the results.
-
vsearch_query
¶ Make a VSEARCH search object.