Genome Length Evolution in Host-Pathogen systems

This page contains a simple presentation of the models and the documentation for python and Ipython modules written for the host-pathogen models detailed in Carlos Lugo's website, as well as an extremely brief description of our results, that will be presented in a more detailed fashion, explaining the effect of different parameters, in some other place.

The project was born in TSL Bioinformatics Group, as a mean to model the evolution of plant pathogen’s genomes, in particular oomycetes' as Phytophthora infestans.

We developed two different mathematical-computational models to explore the changes in genome size and number of effector genes and transposable elements. One of the main interests was to investigate the effects of TEs on genome size, that is observed as increasing in oomycetes. The other purpose of the study was to model both the population dynamics and the genome changes after host jumps.

Do not hesitate to contact me if you feel you need any more help or informations, or if you want to discuss the project.

Cheers!


First Model

In the first model used for the simulations, we are considering effector genes and transposable elements.

We considered the following events that can happen during evolution:

  • gain of effector genes or TEs by horizontal gene transfer,
  • loss of effector genes or TEs,
  • duplication of effector genes or TEs,
  • mutation or recombination of effectors,
  • silencing of effectors due to TE insertion.

Here each effector is characterized by a score (used to calculate the overall fitness) and a length.

In this very simple model hosts are not explicitly considered, and a host jump simply results in random changes in the scores of the effectors in the pool.

We run the simulation both with and without jumps (you can find here the code to perform the complete simulation with jumps, to skip this step just set the time gap between jumps to the time of your simulation and number of jumps to 1). Please note that we talk about time as a measure of evolutionary events, not absolute time.

Discussion

Here is a very short overview of the results of our simulations, as well as some plots we generated to display the results.

The following plots are the results of a simulation with no jumps, we can see an increase both in length and number of units, before reaching a plateau.

We were able to observe this same pattern in all the simulations with different time gap between jumps.

We were also able to observe a decrease in the fitness parameter following a host jump, as expected. Then the pathogen seems to start recovering, but the DT we used in the simulations didn't probably allow a full recovery.

With regard to the length distribution of effector genes and TEs, it is clear that the resulting patterns significantly differ due to the initial state of the system. The following two plots show the length distribution for transposable elements for the same set of parameters.

The following plots show the length distribution of effector genes for the same set of parameters (follow the rainbow to see changes in evolution).

It can be noticed that the gene size is decreasing in our model, due to the fitness cost of having more bases. On the other side we have to remember that the majority of very short genes have no biological meaning.

Running simulations and getting your own results

To run a simulation of the first model you can download the codes here.

There are a lot of parameters you may want to change, you can have a full list of them by running

$python mprostest.py -h

Please note that you will need to previously install NumPy and pygsl to run the simulation.

Also, we suggest to seed the random number generator with the -s or --seed option, as the program is using a default seed.

It is advisable to set the path to a new folder not to overwrite previous results, and because the program could produce a large amount of folders.


Second Model

In the second model we decided not to explicitly consider transposable elements anymore, as their evolution pattern resulted quite clear from the first model, and there are no other known biological effects today to add in the model. In particular it is not proved that they have an effect on the pathogen fitness, other than contributing to silencing or general genome plasticity, both of which can be implicitly considered modifying accordingly the rates of mutation/recombination, silencing, hgt and duplication.

In this model effectors are characterized by a length and a list of target genes, each one with a score.

Moreover the host is characterized by a fixed list of targets.

The fitness is so calculated considering the score of the effectors towards targets that are actually present in the host.

A host jump is now performed by creating a new host, with a different random list of targets. A sample of the pathogen population is moved to the new host, and the new fitness is calculated.

Also, we keep track of the different strains and their population dynamics over time.

Discussion

Here is a very short overview of the results of our simulations, as well as some plots we generated to display the results.

You can notice how at the beginning of the simulation, with low gene count, a host jump has an important negative effect on the fitness, while at the end of the simulation the decrease is not so big, probably because of the bigger effector gene pool present in the pathogen.

In the picture above we can see a typical population dynamics, where each signal is a different strain.

Running simulations and getting your own results

To run a simulation with the second model, download the codes here. You will need NumPy installed on your computer.

You can have a full list of options running

$python hpmodel2.py -h

We suggest to seed the random number generator with the -s or --seed option, as the program is using a default seed.

For more information you can also rely on the docstring in the main code.

It is advisable to set the path to a new folder not to overwrite previous results, and because the program could produce several new files.


Additional Scripts

In the repository you will also find some scripts we wrote to plot the results. You may want to use these scripts or to write your own. To use the provided scripts you will need to download and install the Matplotlib library.

First Model

To plot the results of the simulation you will need to write in command line:

$python clusterV.py [path] [number of runs] [number of jumps]

This script will plot:

After getting the results (the script will save some plots AND some data files, that you will need for the next step), you can use the scripts:

$python comparison_plot.py [path 1st dir] [path 2nd dir] [path 3rd dir] [path 4th dir] [path 5th dir]

Again, these will plot length and number of units over time, comparing the results if you run the simulation multiple times and you want to compare the results between different time gaps or different mutation rate. This two scripts assume you run a simulation like ours and you provided DT5000, DT10000, DT15000, DT20000 and no jumps. If you run different simulations and want to compare the results you will need to change a number of parameters in the code.

You may also want to run the following scripts to get plots with the length distribution of effectors and TEs:

$python scatter_plot_jumps.py [path 1st dir] [path 2nd dir] [path 3rd dir] [path 4th dir] [path 5th dir] [number of runs]

$python scatter_plot_jumps_tes.py [path 1st dir] [path 2nd dir] [path 3rd dir] [path 4th dir] [path 5th dir] [number of runs]

Again, it is assumed you run the simulation with the parameters of DT that we used, if not check the code. You do not need to run clusterV.py before these.

You can also plot the average fitness of the pathogen (you can have a full list of options with -h) or the fitness in one simulation:

$python fitness_first_model.py [path] [number of runs] [number of jumps]

$python one_fitness.py [path] [number of runs] [number of jumps] -s [number of the run to plot]

Second Model

To plot the result of the simulation you may want to run the following code:

$python newmodfigs.py [path] [number of runs] [number of jumps] [time gap between jumps]

This script plots the interactome pattern, the population dynamics of the strains that evolved during the simulation, the promiscuity pattern and the evolution of a fitness parameter.

A Ipython notebook is also available in the same folder.


Follow Up

There's much more to be tested and implemented (this is a simple model, many features could be added in future). It would be interesting to have some experimental results, as hard as it is to infer evolutionary patterns. Another interesting thing still to be investigate would be the survival/evolution of a pathogen that, after some evolution time, jumps back to its original host.