Fork me on GitHub

seqmagick

Motivation

We often have to convert between sequence formats and do little tasks on them, and it’s not worth writing scripts for that. Seqmagick is a kickass little utility built in the spirit of imagemagick to expose the file format conversion in Biopython in a convenient way. Instead of having a big mess of scripts, there is one that takes arguments:

seqmagick convert a.fasta b.phy    # convert from fasta to phylip
seqmagick mogrify --ungap a.fasta  # remove all gaps from a.fasta, in place
seqmagick info *.fasta             # describe all FASTA files in the current directory

And more.

Installation

Install the latest release with:

pip install seqmagick

This should also install BioPython. NumPy (which parts of BioPython depend on) is not required for seqmagick to function, but may be installed as a dependency of BioPython.

To install the bleeding edge version:

pip install git+https://github.com/fhcrc/seqmagick.git@master#egg-info=seqmagick

Note that as of version 0.8.0, this package requires Python 3.5+. If you want to use the most recent version compatible with Python 2.7:

pip install seqmagick==0.6.2

Use

Seqmagick can be used to query information about sequence files, convert between types, and modify sequence files. All functions are accessed through subcommands:

seqmagick <subcommand> [options] arguments

Supported File Extensions

By default, seqmagick infers the file type from extension. Currently mapped extensions are:

Extension Format
.afa fasta
.aln clustal
.fa fasta
.faa fasta
.fas fasta
.fasta fasta
.fastq fastq
.ffn fasta
.fna fasta
.fq fastq
.frn fasta
.gb genbank
.gbk genbank
.needle emboss
.nex nexus
.phy phylip
.phylip phylip
.phyx phylip-relaxed
.qual qual
.sff sff-trim
.sth stockholm
.sto stockholm

Note

NEXUS-format output requires the --alphabet flag.

Default Format

When reading from stdin or writing to stdout, seqmagick defaults to fasta format. This behavior may be overridden with the --input-format and --output-format flags.

If an extension is not listed, you can either rename the file to a supported extension, or specify it manually via --input-format or --output-format.

Compressed file support

most commands support gzip (files ending in .gz) and bzip (files ending in .bz2 or .bz) compressed inputs and outputs. File types for these files are inferred using the extension of the file after stripping the file extension indicating that the file is compressed, so input.fasta.gz would be inferred to be in FASTA format.

Acknowledgements

seqmagick is written and maintained by the Matsen Group at the Fred Hutchinson Cancer Research Center.

Contributing

We welcome contributions! Simply fork the repository on GitHub and send a pull request.