seqmagick¶
Contents
Motivation¶
We often have to convert between sequence formats and do little tasks on them, and it’s not worth writing scripts for that. Seqmagick is a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments:
seqmagick convert a.fasta b.phy # convert from fasta to phylip
seqmagick mogrify --ungap a.fasta # remove all gaps from a.fasta, in place
seqmagick info *.fasta # describe all FASTA files in the current directory
And more.
Installation¶
First, you’ll need to install BioPython. NumPy (which parts of BioPython
depend on) is not required for seqmagick
to function. Once done, install
the latest release with:
pip install seqmagick
Or install the bleeding edge version:
pip install git+git://github.com/fhcrc/seqmagick.git@master#egg-info=seqmagick
Use¶
Seqmagick can be used to query information about sequence files, convert between types, and modify sequence files. All functions are accessed through subcommands:
seqmagick <subcommand> [options] arguments
Supported File Extensions¶
By default, seqmagick
infers the file type from extension. Currently mapped
extensions are:
Extension | Format |
---|---|
.afa | fasta |
.aln | clustal |
.fa | fasta |
.faa | fasta |
.fas | fasta |
.fasta | fasta |
.fastq | fastq |
.ffn | fasta |
.fna | fasta |
.fq | fastq |
.frn | fasta |
.gb | genbank |
.gbk | genbank |
.needle | emboss |
.nex | nexus |
.phy | phylip |
.phylip | phylip |
.phyx | phylip-relaxed |
.qual | qual |
.sff | sff-trim |
.sth | stockholm |
.sto | stockholm |
Note
NEXUS-format output requires the --alphabet
flag.
Default Format¶
When reading from stdin or writing to stdout, seqmagick
defaults to fasta
format. This behavior may be overridden with the --input-format
and
--output-format
flags.
If an extension is not listed, you can either rename the file to a supported
extension, or specify it manually via --input-format
or --output-format
.
Compressed file support¶
most commands support gzip (files ending in .gz
) and bzip (files ending in
.bz2
or .bz
) compressed inputs and outputs. File types for these files
are inferred using the extension of the file after stripping the file extension
indicating that the file is compressed, so input.fasta.gz
would be inferred
to be in FASTA format.
Acknowledgements¶
seqmagick is written and maintained by the Matsen Group at the Fred Hutchinson Cancer Research Center.
Contributing¶
We welcome contributions! Simply fork the repository on GitHub and send a pull request.