Friday, April 30, 2010

FASTQC

This java app reads a fastq file and generated various quality metrics.
It can be run from both the command-line or a GUI. It is also pretty fast - processing over 4M reads in under 8 seconds - but I ran it on a beast of a machine with 256 GB of RAM.
It is available for download from this site:

Here is the command-line:
/usr/local/bin/java-1.6.0 -Xmx250m -cp /usr/local/devel/BCIS/external_software/FastQC uk.ac.bbsrc.babraham.FastQC.FastQCApplication < input.fastq >

Where the text in blue represents the installation location.

The output will be a set of HTML and image files zipped inside a folder named < input >_fastqc.

Sun grid engine submission

Here's the typical command I use for making grid submissions:

SGE submission:
qsub -P -m a -M $USER\@jcvi.org -l < fast|medium|default queue > -V -cwd -shell n -b y < command >

Using CLC Bio's command-line tools

This is what I've learned:

In order to run CLC Bio programs from the command line, you need to have a license.properties file in your current working directory. But not any license.properties file, the one associated to the specific version of software you are using. For example, if you are using a command that is on the path, you need to track down which license.properties file it is associated with.

Mapping reads - (greater than 50 bp in length):
For mapping reads back to a reference which have 90% identity over 90% of the length of the query
/usr/local/packages/clc-ngs-cell-2.0.5-linux_64/clc_ref_assemble_long -o < output.cas > -q < query.fasta > -d < database.fasta > -l 0.9 -s 0.9

The output of this mapping, or reference-based assembly, is a .cas file. Getting a table of hits, a la nucmer, you need to process the .cas file with another CLC tool: assembly_table

Processing .cas files:
/usr/local/packages/clc-ngs-cell-2.0.5-linux_64/assembly_table -n -s < output.cas > > < table.tsv >

Dear Diary,

This blog is mostly for me, to have an easy way to find the often used command lines and pipelines, with all their nifty little parameters, that I have to use on a daily basis.
But if this blog offers any insight or suggestions to resolve a problem you're having, then by all means, buy me a beer.