Monday, July 19, 2010

modify a genbank file for mapping

Insert between the following tags at the tail of your Genbank file the nucleotide FASTA sequence only:
ORIGIN
(sequence goes here)
//

Thursday, June 10, 2010

CLC mapping

At ths point, the most current version of the CLC command-line utils is located here:

/usr/local/packages/clc-ngs-cell/

with the mapper for 454 / Sanger reads being clc_ref_assemble_long.

As usual, you need to copy the license.properties file from that folder to your working directory to perform any sort of analysis.

Here are some of the pitfalls:
- making a soft link to the mapping executable will throw a missing license error. forget the soft link, you need to use the fully qualified path of the executable.

- mapping to a Genbank file didn't work. However mapping to a FASTA file completed with blazing quickness.


Once the mapping has producea CAS output, you need to process it to get it in table format:
/usr/local/packages/clc-ngs-cell/assembly_table -n -s input.cas > output.table


Thursday, May 13, 2010

SVN stuff

This is how you create a repository (usually done by IT):
svnadmin create (repo_name)

This is how you add content to a repository:
svn import (files | folder) (repo_name) -m "initial import"

This is how you checkout the latest version of something existing in SVN into the current working directory:
svn checkout (repo_name/path)

This is how you check something into it to update the svn with a more current version:
from within the root folder when doing svn checkout (ie: /workspace/genomeQC):
svn commit -m "update to ver 1.5"


Wednesday, May 12, 2010

FASTQ tools

Here are my favorite FASTQ processing utils:

FASTQC:
To run a QC on your data.

To run in batch mode with zipped output:
java* -Xmx250m -cp <> uk.ac.bbsrc.babraham.FastQC.FastQCApplication <>

*: make sure that java is at least version 1.6

FASTX Toolkit:
Bunch of scripts for processing the fastq files and qc, but fastqc is better for qc'ing.


Friday, April 30, 2010

FASTQC

This java app reads a fastq file and generated various quality metrics.
It can be run from both the command-line or a GUI. It is also pretty fast - processing over 4M reads in under 8 seconds - but I ran it on a beast of a machine with 256 GB of RAM.
It is available for download from this site:

Here is the command-line:
/usr/local/bin/java-1.6.0 -Xmx250m -cp /usr/local/devel/BCIS/external_software/FastQC uk.ac.bbsrc.babraham.FastQC.FastQCApplication < input.fastq >

Where the text in blue represents the installation location.

The output will be a set of HTML and image files zipped inside a folder named < input >_fastqc.

Sun grid engine submission

Here's the typical command I use for making grid submissions:

SGE submission:
qsub -P -m a -M $USER\@jcvi.org -l < fast|medium|default queue > -V -cwd -shell n -b y < command >

Using CLC Bio's command-line tools

This is what I've learned:

In order to run CLC Bio programs from the command line, you need to have a license.properties file in your current working directory. But not any license.properties file, the one associated to the specific version of software you are using. For example, if you are using a command that is on the path, you need to track down which license.properties file it is associated with.

Mapping reads - (greater than 50 bp in length):
For mapping reads back to a reference which have 90% identity over 90% of the length of the query
/usr/local/packages/clc-ngs-cell-2.0.5-linux_64/clc_ref_assemble_long -o < output.cas > -q < query.fasta > -d < database.fasta > -l 0.9 -s 0.9

The output of this mapping, or reference-based assembly, is a .cas file. Getting a table of hits, a la nucmer, you need to process the .cas file with another CLC tool: assembly_table

Processing .cas files:
/usr/local/packages/clc-ngs-cell-2.0.5-linux_64/assembly_table -n -s < output.cas > > < table.tsv >

Dear Diary,

This blog is mostly for me, to have an easy way to find the often used command lines and pipelines, with all their nifty little parameters, that I have to use on a daily basis.
But if this blog offers any insight or suggestions to resolve a problem you're having, then by all means, buy me a beer.