Bioinformatics Tools

These programs are currently available in Taito supercluster computing environment. Programs marked with C can be used with the Chipster graphical user interface, and programs marked with W can be used also with a web browser.

What if my software is not on the list? See further instructions here. Note that you can always suggest a software installation or ask for help (helpdesk@csc.fi).

See also the list of services for data management!

Next generation sequencing data analysis

  • ABySS Assembler for very short reads.
  • ALLPATHS-LG Assembler for very short reads.
  • BEDTools Toolkit for for comparing genomic features. C
  • Bowtie Short read aligner. C
  • Bowtie2 Short read aligner. C
  • BWA Burrows-Wheeler aligner for aligning short nucleotide sequences against a reference genome C
  • Canu A fork of the Celera Assembler, designed for high-noise single-molecule sequencing.
  • Chipster Provides a graphical user interface to over 160 tools for NGS data analysis. It also offers interactive visualizations like genome browser. C
  • Cufflinks RNA-seq analysis tool. C
  • Falcon set of tools for fast aligning long reads.
  • FastQC A quality control tool for high throughput sequence data. C
  • Freebayes A genetic variant detector designed to find small polymorphisms.
  • GATK4Toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping.
  • GSNAP Short read aligner.
  • ipyrad RAD sequence analysis.
  • Kallisto A program for quantifying abundances transcripts from RNA-seq, or more generally of target sequences using high-throughtput sequencing reads.
  • MACS ChIP-Seq analysis tool. C
  • MIRA Whole genome shotgun and EST sequence Assembler.
  • MISO Tool to estimate expression levels of alternatively spliced genes.
  • PANNZER/SANSPANZ Protein annotation tool
  • Picard Tools  A set of command line tools for manipulating high-throughput sequencing(HTS) data and formats such as SAM/BAM/CRAM and VCF
  • Preseq A package which aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment
  • Prinseq A quality control tool for high throughput sequence data. C
  • R / Bioconductor A statistical environment with support for genomics C
  • Salmon A wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data
  • SAMtools Utilities for managing SAM/BAM formatted alignment files C
  • SHRiMP Short read aligner
  • SOAPdenovo Assembler for very short reads
  • STAR Short read aligner
  • TopHat Splice junction mapper for RNA-Seq reads C
  • Trimmomatic Tool for trimming illumina data. C
  • Trinity Traskriptome assembly
  • VCFtools Program package designed for working with VCF files
  • Velvet Assembler for very short reads
  • VirusDetect A tool to indetidy viruses uisng sRNA datasets C

 

Sequence analysis

  • EMBOSS Programs for analysing DNA and amino acid sequence data. C
  • CD-HIT Sequence clustering and redundancy removal
  • ClustalW Multiple sequence alignment C
  • MAFFT Multiple sequence alignment  C
  • Muscle A fast and accurate multiple sequence alignment

 

Database searching and sequence alignment

  • BLAST Sequence database homology search tool. C
  • BLAT Sequence database homology search tool.
  • Exonerate  Generic sequence alignment tool
  • HMMER Sequence database search based on profile-HMM. C
  • InterProScan Protein signature search tool
  • Minimap2 Fast general-purpose alignment program to map DNA or long mRNA sequences against a large reference database

 

Metagenomics, Phylogenetics and Population Genomics

  • Metagenomics toolkit Selection of metagenomis analysis tools
  • Mothur Package for microbial community analysis of amplicon sequencing data C
  • Qiime Package for microbial community analysis of amplicon sequencing data
  • BEAST Program for Bayesian MCMC analysis of molecular sequences
  • ExaML Maximum likelihood code phylogenetic inference
  • MrBayes Program for inferring phylogenies using Bayesian methods
  • MSMC Software implements MSMC, a method to infer population size and gene flow from multiple genome sequences
  • Pagan Tool for generating and extending phylogenetic multiple sequence alignments
  • PHYLIP Package for inferring phylogenies C
  • Phyml Software that estimates maximum likelihood phylogenies from alignments of nucleotide or amino acid sequences
  • POY Program for inferring phylogenies
  • RAxML Fast program for inferring phylogenies with likelihood
  • Stacks Pipeline for building loci from short-read sequences (e.g. RAD-seq data)

 

Microarray data analysis

  • Chipster Provides a graphical user interface to over 150 tools for microarray data analysis based on R/Bioconductor, and results can be viewed using several interactive visualizations. C
  • R / Bioconductor A statistical environment with support for genomics C

 

Structural Biology

  • Discovery Studio Molecular modeling package D
  • Maestro Molecular modeling package
  • Rosetta Protein structure prediction and protein docking tool
  • VMD Molecular graphics visualisation

 

RNA secondary structure prediction

  • Vienna Package (within EMBOSS) 

 

Gene mapping

  • GUIDANCE A powerful and user-friendly tool for assigning a confidence score for each residue, column, and sequence in an alignment and for projecting these scores onto the MSA.
  • Mega2 Converts Linkage-format files to other formats
  • MERLIN A fast program for non-parametric linkage, and haplotyping
  • PedCheck Detects marker typing incompatibilities in pedigree data
  • Pseudomarker Joint linkage and LD analysis
  • SimWalk2 Haplotyping, and non-parametric analysis
  • Stampy Package for the mapping of short reads from illumina sequencing machines onto a reference

 

Other tools

  • Bioperl Bioinformatics tools for perl programming
  • Bioconda Pacakage manager that allows easy installation of wide selection of bioinformatics tools
  • Biopython Bioinformatics tools for python programming
  • cPouta Cloud for running users own servers and software installations
  • LAMMPS Classical molecular dynamics code and an acronym for large-scale Atomic/Molecular Massively Parallel Simulator
  • NAMD Parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems
  • Zonation Spatial conservation prioritization framwork for large-scale conservation planning

 

Archived training material

Please check the collected material on the Bioscience learning materials page.

These guides below are provided for archival purposes. They may contain outdated instructions and references to software and databases that are no longer available at CSC. For up-to-date information please refer to the software list and the web pages of each software package.