A pipeline for Genome Assembly using SOAPdenovo2

SOAPdenovo2 is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes.

Step1: Check the Insertion Size

1
2
3
4
5
6
java -jar picard.jar CollectInsertSizeMetrics \
I=.sorted.markdup.bam \
O=.insert_size_metrics.txt \
H=.nsert_size_histogram.pdf \
M=0.5

Step2: Make configure file

1
2
3
4
5
6
7
8
9
10
11
max_rd_len=150  
[LIB]
avg_ins=320
reverse_seq=0
asm_flags=3
rank=1
pair_num_cutoff=3
map_len=32
q1=._1_clean.fq.gz
q2=._2_clean.fq.gz

Step3: Run SOAPdenovo2

1
2
#!/bin/bash
time SOAPdenovo-63mer all -s file_configure -K 63 -o output

Set kmers=29,55,63

Step4:Evaluate the assembly using QUAST

  • QUAST
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# for contig
## without reference

quast.py -o compare 29mer/file_29mer.contig 55mer/file_55mer.contig 63mer/file_63mer.contig

## with reference

quast.py -R reference.fa -o compare 29mer/file_29mer.contig 55mer/file__55mer.contig 63mer/file_63mer.contig
quast.py -R reference.fa -g top_level.gff3 -o compare_r_g 29mer/file_29mer.contig 55mer/file_55mer.contig 63mer/file_63mer.contig

# for scaf

quast.py -o compare 29mer/file_29mer.scafSeq 55mer/file_55mer.scafSeq 63mer/file_63mer.scafSeq
quast.py -s -o compare_scafSeq 29mer/file_29mer.scafSeq 55mer/file_55mer.scafSeq 63mer/file_63mer.scafSeq
quast.py -R reference.fa -o compare_scaf_ref 29mer/file_29mer.scafSeq 55mer/file_55mer.scafSeq 63mer/file_63mer.scafSeq