Download and align single cell RNA-seq data: Part II

January 6, 2019 One minute read

Here is the code I use to align downloaded RNA-seq datasets.

I use the popular STAR aligner, which is fast and simple to implement. Before the actual aligning step, the genome needs to be generated with the following command:

#!/bin/bash
#PBS -q hotel
#PBS -N genGen
#PBS -l nodes=1:ppn=8
#PBS -l walltime=168:00:00
#PBS -M z4li@ucsd.edu
#PBS -V
#PBS -m abe

STAR --genomeDir ~/zhen/Reference_genome/ \
--runMode genomeGenerate \
--genomeFastaFiles ~/zhen/Reference_genome/hg38.fa \
--runThreadN 7 \

The reference genome (in this case hg38) can be downloaded from UCSC.

#!/bin/bash
#PBS -q hotel
#PBS -N alignment_test
#PBS -l nodes=1:ppn=10
#PBS -l walltime=168:00:00
#PBS -M z4li@ucsd.edu
#PBS -V
#PBS -m abe

cd ~/zhen/data/raw_data/GSE81475/fastq/
names=$(ls -S | head -2 | paste -sd "," -)
#echo $names

genomeDir=~/zhen/Reference_genome/hg38/
runDir=../test/1pass/

STAR --outFileNamePrefix $runDir --genomeDir $genomeDir --readFilesIn $names --runThreadN 9

wait
echo All processes completed!