Cover at PLOS Genetics: “Measuring the rate of evolution in hitchhiking Arabidopsis thaliana”

cross-post from PLOS website:

http://journals.plos.org/plosgenetics/article?id=10.1371/image.pgen.v14.i02

Screen Shot 2018-03-01 at 12.02.32 AM

 

2014-10-17 14.24.42

Despite its modest appearance and small size, Arabidopsis thaliana has proven to be a successful colonizer. It is found today over much of the continental US since its first arrival there only a few hundred years ago. The species thrives in wild, rural as well as urban settings, and the photo shows an A. thaliana plant thriving in the cracks of a sidewalk. Our study suggests that about four hundred years ago, A. thaliana seeds from a single plant were unknowingly being carried by Europeans to the Eastern US. Who would have guessed that several centuries later, scientists would take advantage of its North American exile and of pressed plants that botanists have collected over the past couple of centuries to calculate its genomic mutation rate, that is, the speed at which it evolved in the New World?

Photo credit: Moises Exposito-Alonso

 

Paper in press at PLOS Genetics: “The rate and potential relevance of new mutations in a colonizing plant lineage”

http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007155

Scientists create ‘Evolutionwatch’ for plants

163641_web

Scientists are giving plant collections from museums a new lease of life with ‘Evolutionwatch’ – a new way to study evolution in action.

Using a hitchhiking weed, scientists from the Max Planck Institute for Developmental Biology reveal for the first time the mutation rate of a plant growing in the wild.

They compared 100 historic and modern genomes of the tiny plant Arabidopsis to measure precisely the rate at which it evolves in nature. The oldest plant, preserved in a herbarium, was from 1863. At this time, the scientists estimate the species had already more than 200 years in the New World behind it. Two different methods gave the same result, that Arabidopsis had been introduced by Europeans who arrived on the US East Coast around the year 1600. It was almost certainly introduced there by chance, perhaps carried on the boots of Europeans, or mixed in with the seeds of edible plants.

The team focused on samples from North America, because they knew that one particular genetic family of Arabidopsis was very widespread, presenting an opportunity to observe newly-acquired mutations. The comparison of 100 complete genomes revealed 5000 new mutations, some of which could have given the plant an adaptive advantage as it colonised its new environment. The plant moved inland alongside human settlers, gradually diverging from the European ancestor from which it originated. Samples of the species along the same path today reveal increasingly deep and fast-growing roots, perhaps evidence that it adapted during its hitchhiking trip.

“Collections of invasive populations sampled from different times in history enable us to observe the ‘live’ process of evolution in action,” says Moises Exposito-Alonso, first author of the paper published in PLOS Genetics.

They sequenced the genomes of 100 plants collected by botanists between 1863 and 2006. All samples from before 1990 came from museum collections of dried plants. The oldest dried plants, preserved in time 150 years ago, show how much they had evolved by that time. The youngest plants continued to change and evolve. By comparing genomes of plants that had diverged from a common ancestor for different amounts of times, the scientists calculated how many mutations the plant acquires a year.

This in turn enabled the team to deduce that the last common ancestor of the lineage must have lived at the end of the 16th or beginning of the 17th century, coinciding with the time that many people were arriving by boat from Europe, particularly the southern UK, west coast of France and the Netherlands. This was very surprising, since a previous estimate, which had not made use of genetic information from dried herbarium samples, suggested that the colonizing Arabidopsis plants had only arrived in the 19th century.

Arabidopsis is not a harmful weed, but the findings help reveal some of the fundamental evolutionary processes behind the ability of invasive species to colonise new environments. In particular, they unlock some of the secrets of the “genetic paradox of invasion”. This occurs when a colonizer with low genetic diversity is nevertheless surprisingly successful in a new environment.

To determine the effect of new mutations, the scientists grew some of the plants in the lab to identify any differences in growth. The fact that such differences were found suggests that some of the mutations that appeared during the past 400 years conferred an advantage during colonisation.

“We were very surprised, since scientific dogma suggests that evolution normally proceeds at a much slower pace,” said Hernán Burbano, one of the supervisors of this study.

“Accurate evolutionary rates for plants and animals will be fundamental to reconstruct their past history and to predict the opportunity of novel advantageous traits to arise. Our results show that herbarium and animal specimens can be the source of a great new branch of genetics in future,” Exposito says.

###

This press release was written by Zoe Dunford on behalf of Max Planck Institute for Developmental Biology.

http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1007155

Citation: Exposito-Alonso M, Becker C, Schuenemann VJ, Reiter E, Setzer C, Slovak R, et al. (2018) The rate and potential relevance of new mutations in a colonizing plant lineage. PLoS Genet 14(2): e1007155. https://doi.org/10.1371/journal.pgen.1007155

Image Credit: Moises Exposito-Alonso, Claude Becker and colleagues

Funding: This study was supported by the President’s Fund of the Max Planck Society (project “Darwin”) to HAB and by an ERC grant (AdG IMMUNEMESIS) and core funds of the Max Planck Society to DW. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: The authors have declared that no competing interests exist.

 

Paper in press at Nature Ecology & Evolution: “Genomic basis and evolutionary potential for extreme drought adaptation in Arabidopsis thaliana”

This is a linked post from “behind the paper” section of the Nature Ecology and Evolution community: https://natureecoevocommunity.nature.com/users/77335-moises-exposito-alonso/posts/28790-life-on-the-edge-prepares-plants-for-climate-change

 

By Moises Exposito-Alonso

Our Nature Ecology and Evolution paper is here: dx.doi.org/10.1038/s41559-017-0423-0.

This study is based on the first experiment of my PhD that aimed at identifying genetic variants (i.e. old mutations) related to survival under climate change scenarios. I looked at many different individuals of thale cress, Arabidopsis thaliana,and discovered some hundred genetic variants that, when present in a plant, increased its survival under drought conditions. We also found that such variants are more common in Mediterranean and Scandinavian populations — populations that, by living at the edge of the distribution range of the species, have probably already experienced more extreme environments than those at the center of Europe.

Growing up in a semi-arid area of Spain (Alicante), I was amused to observe how during months-long droughts there were some plants that still miraculously survived — their scientific names became engraved in my brain as part of my undergraduate training in biology. In those undergraduate classes we were taught the general strategies of how plants deal with drought, long described by ecologists, but whose genetic underpinnings were mostly still a mystery.

Three years ago, I embarked on my PhD at the Max Planck Institute of Developmental Biology with the goal of identifying genes that could help plants to survive under future climate change, which will almost certainly see extreme drought situations much more often than today. While I had experience with field experiments, I had not performed any drought experiments. However, two postdocs, François Vasseur and George Wang, were very knowledgeable in this area of ecology, and in image acquisition and processing. Their help, along with that of my two supervisors, Detlef Weigel and Hernán Burbano, was invaluable in getting off to a fast start with my PhD.

To have a good representation of all known genotypes of A. thaliana, I searched the 1001genomes.org databases, which contains genetic information for one thousand A. thaliana strains, and chose a set of individuals, over 200, that were broadly distributed across the geographic range where A. thaliana can be found, including ones from extreme environments. After extensive experimental and image monitoring design, I then took the seeds of ‘my’ populations and planted them in the greenhouse.

The results quickly became obvious: After over two weeks without watering, all the soil was completely dry, but some plants looked astonishingly healthy (see Figure 1). “These little things are tougher than people can imagine”, I thought.

collage

Figure 1. Arabidopsis thaliana individuals after several weeks of drought. Few plants were still green, as those from central Spain (left) and mid to north-Sweden (center), but most were completely dry (right; example from south-west Sweden). Notice also how dry and brittle is the soil that even separated from the walls of the pots. Credit: Moises Exposito-Alonso

As some Swedish populations were, besides Spanish individuals, among those that coped best with my drought treatment, I became extremely curious about these Scandinavian Arabidopsis. Thanks to Magnus Nordborg — a long-time collaborator of the Weigel lab — I could visit some of these coastal populations in south-east Sweden (see Figure 2.). My jaw almost dropped when I saw them growing in the sand.

IMG_20170411_130759-COLLAGE

Figure 2. Natural populations of A. thaliana along the Swedish coast. Sand beaches are a tough environment for plants, as sand does not retain water. Credit: Moises Exposito-Alonso

Based on all previous observations, the next obvious question was: if climate change will increase droughts, as the IPCC and others strongly predict, what is the consequence of populations apparently being more or less adapted to such conditions? Can we predict their fate? The premise was that if different individuals of a species are genetically adapted to different environments, i.e., they vary in their sensitivity to environmental stresses as we saw, they might respond differently to future climates and might even be partially pre-adapted to future environments. That is, they could escape extinction through evolution by natural selection of advantageous genetic variants.

two_maps

Figure 3. Map of presence of important genetic variants related to higher drought survival (left) and predictions of areas where populations might be genetically maladapted by 2070 and thus threatened to locally die. Credit: Figure 3 in our paper dx.doi.org/10.1038/s41559-017-0423-0.

I used a powerful machine learning algorithm (Random Forests) for predictions of potential geographic distributions of genetic variants (Fig. 3). These models make use of the current match between the distribution of genetic variants, as inferred from the locations in the 1001genomes.org database, and different climate variables, such as minimum temperature in winter, precipitation in summer. This technique is typically used for predictions in combination with presence or absence of a species from different geographic regions, but I adapted them for presence and absence of multiple genetic variants; to account for the heterogeneity within a species. Using our models, we transformed maps of projections of the climate in 2070 into predictions of what genetic variants must be present in 2070 for local A. thaliana populations to survive. Doing this we discovered that, because Europe will get drier, plants in Central Europe will need more of these ‘survival’ variants than they currently have.

 

What if one day we can use evolutionary theory to reliably tell where to find the genotypes that might save a threatened species? Or what if we could demarcate geographic areas that require immediate action because they are “genetically poor”?

 

Acknowledgements:

This work was funded by the Max Planck Institute and an ERC grant to Detlef Weigel. I also want to thank my supervisors Detlef Weigel and Hernán Burbano for advice, and my coworkers and friends for their support.

 

code mnemonics

Compilation of useful programming tips that I sometimes need but always forget about.


Stag all removed files in git

git rm $(git ls-files –deleted)


Roxygen skeleton

To add the roxygen2 skeleton to document a function:

shift+ctrl+alt+r

shift+opt+cmd+r

source


 Correspondance between reshape2::melt and tidyr::gather, reshape::dcast and tidyr::spread

library(reshape2)
library(tidyr)
library(dplyr)

mini_iris <- iris[c(1, 51, 101), ]

# melt
melted1 <- mini_iris %>% melt(id.vars = "Species",value.name = 'dimension',variable.name='trait')
melted2 <- mini_iris %>% gather(key='trait', value='dimension', -Species)

# cast
melted1 %>% dcast(Species ~ trait, value.var = "dimension")
melted2 %>% spread(key='trait', value='dimension')

        

 


Insert or overwriting mode shortcut

Not really a command, but super annoying for a linux computer using a mac keyboard.

Shortcut: fn + return

my reference papers

A list of great papers on the disciplines I am most interested in: evolution, ecology, quantitative & population genetics, statistics, bioinformatics.

 

Is adaptation possible in self-fertilizing species?
http://www.sciencedirect.com/science/article/pii/S0168952517300550

Who will adapt to climate change?
https://www.nature.com/nature/journal/v470/n7335/full/nature09670.html

What is a mutation accumulation line?
http://www.annualreviews.org/doi/10.1146/annurev.ecolsys.39.110707.173437

What is the animal model? (LMM)
http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2656.2009.01639.x/full

Under what circumstances SNPs cannot be identified in GWA?
http://www.nature.com/nrg/journal/v13/n2/full/nrg3118.html

What is heritability?
https://www.nature.com/nrg/journal/v9/n4/full/nrg2322.html

What is missing heritability?
https://www.nature.com/ng/journal/v42/n7/full/ng0710-558.html

How to GWA in plants?
http://onlinelibrary.wiley.com/doi/10.1002/cppb.20041/full

What is population stratification correction in GWA? http://www.nature.com/nrg/journal/v11/n7/full/nrg2813.html

What is a meta-GWA?
https://www.nature.com/nrg/journal/v18/n2/full/nrg.2016.142.html

Population structure? F statistics? 
http://www.genetics.org/content/202/4/1485

What is spatial autocorrelation in the data?
http://onlinelibrary.wiley.com/doi/10.1111/j.2007.0906-7590.05171.x/abstract

What is an environmental niche model?
http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2699.2011.02659.x/abstract

What is (really) a selection coefficient?
http://onlinelibrary.wiley.com/doi/10.1111/mec.13559/abstract

What is phenotypic selection?
http://www.jstor.org/stable/2408842

How linkage, inheritance, and sex interfere evolution at multiple locus?
http://www.genetics.org/content/161/4/1727.long

What are the footprints of selection on the genome?
http://www.genetics.org/content/161/4/1727.long

How does multiple testing correction work?
http://www.nature.com/nbt/journal/v27/n12/full/nbt1209-1135.html

What is principal component analysis?
http://www.nature.com/nbt/journal/v26/n3/full/nbt0308-303.html

SNP imputation in association studies
http://www.nature.com/nbt/journal/v27/n4/full/nbt0409-349.html

What is a hidden Markov model?
http://www.nature.com/nbt/journal/v22/n10/full/nbt1004-1315.html

What is a support vector machine?
http://www.nature.com/nbt/journal/v24/n12/full/nbt1206-1565.html

What is the expectation maximization algorithm?
http://www.nature.com/nbt/journal/v26/n8/full/nbt1406.html

What are DNA sequence motifs?
http://www.nature.com/nbt/journal/v24/n4/full/nbt0406-423.html

What are decision trees?
http://www.nature.com/nbt/journal/v26/n9/full/nbt0908-1011.html

What is dynamic programming?
http://www.nature.com/nbt/journal/v22/n7/full/nbt0704-909.html

What are artificial neural networks?
http://www.nature.com/nbt/journal/v26/n2/full/nbt1386.html

How to map billions of short reads onto genomes?
http://www.nature.com/nbt/journal/v27/n5/full/nbt0509-455.html


Some of these papers come from a blog I came across with and could not find again. Thanks to that unknown blog.