Clockgen
Molecular Clock Simulator

home · download · quickstart · references · index


Introduction

Clockgen has been developed with the objective of creating a database of nucleotide sequences that evolve in a clock-like fashion, so we can use these sequences to simulate complete phylogenies and compare them with the ones obtained with real life data. For that reason it's output is in the FASTA format (which can be read by almost all phylogenetic softwares).

Following the model of evolution, and the parameters described below, clockgen creates a full phylogeny from a first (ancestral) sequence which may be randomly generated or not. Each run of the program generates a valid FASTA file with all the sequences generated. The output is written to the standard output, which can be easily redirected to any filename you might wish.

Clockgen has the flexibility of reading in any Model of Evolution derived from the General Reversible Model. The software package provides most of the commonly used (and computationaly tractable) models of DNA evolution. The simple file format for the Model input is described below.

Models of Evolution

All models of nucleotide substitution compatible with Clockgen are Markov models, and assume evolution is independent and indentical at each site and along each lineage. Almost all models used in the maximum likelihood reconstruction of phylogenies using nucleotide sequences are processes of this type (but see Yang, 1994).

The models are provided in the matrix form:

General Reversible Jukes-Cantor Kimura 2 Parameters
A C G T
A0.0abc
Ca0.0de
Gbc0.0f
Tdef0.0
A C G T
A0.00.250.250.25
C0.250.00.250.25
G0.250.250.00.25
T0.250.250.250.0
A C G T
A0.0yyx
Cy0.0xy
Gyx0.0y
Txyy0.0

Where each variable is the rate of substitution between the nucleotide in the row and the nucleotide in the column. All rows must sum 1.0.

The General Reversible model has 6 variables that work like this, and thus can. be manipulated in order to represente any given reversible model. But the Kimura 2 Parameters has only two variables (x and y), that represent transversions_rate transitions rate respectively. These values must be assigned according to the following formula so that the "All rows must sum 1.0" rule stands:

transitions_rate + 2*transversions_rate = 1.0

Transmission Bottleneck

Transmission bottleneck may be simulated with clockgen by making use of the parameter "-i" which allows you to provide a sequence as the Ancestral for simulation, and "-hi" making different hosts for each simulation.

Basically you should run a complete simulation, then open the output file and select the sequence you want to simulate the transmission. Copy and paste it into a new file, alone, and give it as input for a new simulation. Now you'll have two evolutionary trees, that evolved independantly in different environments after the transmission. This can be repeated as many times as you need.

Command line arguments

Number of Generations (-g number)

Defines the number of generations to be simulated. [* REQUIRED *]

Mutations per Evolution - or clock (-m number)

Number of mutations to occur on each evolutionary step. [* REQUIRED *]

Nucleotides - the size of the sequences (-n number)

Number of nucleotides on the sequence to be generated. This option implies that the program must generate a random first sequence. You must use one of the two options -n or -i. [* MUST USE THIS OR -i *]

First Sequence (-i filename)

Fasta file with 1 strain to be used as the first strain in the simulation. You must use one of the two options -n or -i. [* MUST USE THIS OR -n *]

Host Immunity Factor (-hi number)

Defines the host native immunity to this species. The number must be between 0 and 1 [0-1]. If no Native Host Immunity is defined, 0.0 is assumed.

Description File Output (-d)

Writes the description file with the name of the strain appended by .desc

Evolutionary Model (-em model [parameters])

Uses the evolutionary model described. If not specified, the JC model is used. The following models are implemented:

Model Name Model Parameters (All parameters are decimal numbers)
Juckes-Cantor JC No parameters
Kimura 2 Parameters K2P transitions_rate transversions_rate
General Reversible REV AC AG AT CG CT GT

Be aware of the rules of an evolutionary model matrix before using specific parameters.

Strain Name (-s name)

Name of the strains (indexes will be added in the output). Example: -s strain => strain0x1, strain1x2, strain2x2, strain3x3, strain4x3, strain5x3, ...

If no name is provided, a default name of Taxon is used.



home · download · quickstart · references · index