Gene family evolution - Exercise #1


You will in this exercise practice your newly acquired skills in phylogenetic inference and "tree thinking", by analysing the evolutionary history of a gene family. One part of the exercise is also to collect the data necessary for running the analysis (in the form of homologous protein sequences from different species) as well as interpreting the result (you first have to draw a species tree for the taxa included in the analysis).

Suggestions for gene families to analyse in this exercise

  • Toc75
  • Toc159
  • Toc132
  • Toc120
  • Toc34
  • Tic20
  • Tic22
  • POR A
  • SAM50
  • Alb3
  • ... or your own favorite gene family.


  1. Select one of the gene families from the list above for your analysis.
  2. Download a protein reference sequence from the species Arabidopsis thaliana at the NCBI site and save it in a text file using your favorit text editor. From earlier exercises you probably remember how to do a species specific search using a gene name.
  3. Navigate your web browser to and have a look at the species tree presented there. Select 5-10 species that will represent your ingroup, and one or two that will represent your outgroup. Make the selection in such a way that you include species from different parts of the tree. Draw a tree (on a piece of paper) of the relationship between your selected species and save that for later.
  4. Click on the species tree in order to reach the BLAST page. For each of the species you selected for the analysis, make a BLAST search for homologous sequences using your Arabidopsis thaliana reference sequence. Save the sequences you want to include in the analysis in the same file as you saved the reference sequence [Hint: at this stage it is better to save too many, rather than too few sequences].
  5. Upload your sequences to the ClustalW site ( and align them. Look for sequences in the alignment that are poorly aligned to the rest, and exclude them if you suspect that they are not homologous to your reference sequence. Keep aligning/analysing/excluding sequences until you are happy with the alignment. Save the resulting sequences to your computer.
  6. Once again redirect your web browser to a new web site. This time to Select their "One Click" function and upload your data and run the analysis using the default settings.
  7. After the analysis has finished you'll be presented with a phylogenetic tree. Manipulate your tree by changing the rooting (try "Mid-point rooting" and "Reroot (outgroup)") etc. Also try the "Flip" and "Swap" options in order to facilitate the comparison to the species tree you draw earlier. Does your result make sense? Also play around with the other options to make the tree look its best.
  8. When you are happy with the tree, download it in PDF format and make a printout. Try explaining the result in relationship to your species tree by identifying gene duplications and speciation events. Does it look like you have found all the homologs you would expect to find. If not, why is that, and what can you do to find them? Once you have identified the steps necessary to take in order to explain the evolution of your gene family, go back to the relevant task in the list above and start over again.
    1. Good luck