The factors enabling certain invasive species to adapt and thrive in novel environments remain an unresolved question within the context of invasion biology. An important part of predicting, preventing, or managing biological invasions is understanding which sites of historical introduction were the most successful and why, and which are facilitating ongoing range expansion of these invasive species. Using reduced-representation sequencing data generated from individuals and sourced from five sampling locations across Aotearoa, this study has successfully produced the first high-resolution genomic dataset characterising the population structure and genetic diversity of Common or European starlings (Sturnus vulgaris) in their invasive range of Aotearoa.
The Population Genetics of an Avian Alien Invader in Aotearoa: The European Starling (Sturnus vulgaris)
Population Genetics and Bioinformatics | Bryan J. Thompson
The Common or European Starling, Sturnus vulgaris (hereafter starling), is considered one of the most successful invasive avian species globally, with invasive populations on all human-populated continents [1]. The establishment of invasive starling populations is the result of intentional human-mediated introductions that occurred in the mid to late 19th century [2]. Acclimatisation societies were responsible for the majority of these introductions. These societies were groups specifically formed to introduce plant and animal species across the colonies, aiming to transform these ‘new’ environments into something more reminiscent of ‘home’, or to control agricultural or horticultural pests [3]. Unfortunately, in many cases, they were ignorant of the ecological consequences of their actions. Repeated introductions of starlings to multiple locations around Aotearoa occurred between 1862 and 1883 to control insect grain pests, contributing to the present-day invasive population [4].
Successful invasions like those of the starling may appear paradoxical. Despite the often-small size of the introduced population and hence limited genetic diversity to facilitate adaptation, invasive species do not only become established but can flourish under novel environmental conditions. What enables certain invasive species to adapt and thrive in novel environments remains an unresolved question within the context of invasion biology [5].
Many different competing ecological hypotheses exist concerning successful invasives. Examples include the exploitation of vacant niches associated with human activity or the competitive exclusion of native species [5]. An important component of addressing these questions is to understand which historical introduction sites were the most successful and why, and which of these sites are facilitating ongoing invasive range expansion. Genetic data can help to inform an understanding of this invasion history, but despite biological invasions continuing to escalate at an alarming rate, population genomic data are notably lacking for many of the most pervasive and invasive species around the world [6]. Such studies will be crucial to enable evidence-based decision-making if we are to predict, prevent, or manage biological invasions successfully.
We used genome-wide DNA sequencing data to produce the first high-resolution genomic dataset of starlings in Aotearoa to characterise and examine their genetic diversity and population structure. We found a relatively high level of genetic differentiation and population substructuring for samples taken from across the Auckland region, which is congruent with documented introduction histories [7] while also implying restricted gene flow with other regions. Other locations presented with more variable genetic population structure, suggesting potential connectivity between regions south of Auckland. This first insight into the interplay between the biotic and abiotic factors that are shaping invasive starling populations in Aotearoa could potentially inform proactive control strategies.
Methods
Sample Collection, DNA Extraction and Sequencing
106 starling specimen samples were obtained from various sources within Aotearoa from five geographically distinct regions (Figure 1). These consisted of three locations in the North Island: Auckland (AUK: n=18), Palmerston North (PLM: n=12), Upper Hutt (UHT: n=40), and two in the South Island: Blenheim (BLN: n=15) and Canterbury (CAN: n=21). Muscle tissue was subsampled from each individual, and DNA was extracted from these tissue samples and sent to Diversity Arrays Technology Pty Ltd (DArT P/L) for processing and genome-wide sequencing [8]. The DArT method sequences DNA from a random sample of the genome, a method termed ‘reduced representation sequencing’, to capture a representative snapshot of genome-wide diversity.
Data Processing, Mapping, Variant Calling and Filtering
Once we received the raw DNA sequence outputs from DArT, a series of software programs and pipelines were used to process these raw outputs. We identified a list of variable sites in the genome where there was a single DNA base change within a sequence (single nucleotide polymorphisms, or ‘SNPs’). SNPs capture diversity between individuals and between sampling locations, and SNP variants can be used to cluster individuals by their similarity and determine which sampling sites are genetically dissimilar to each other. In brief, we used Fastp v0.23.2 [9] to trim DNA sequence lengths to 70 bases and discard all sequences shorter than 40 bases. We used the Stacks v2.2 [10] pipeline to discard low-quality sequences, remove sequences with uncalled bases, and also remove the restriction enzyme recognition sites (Pstl & Sphl). Next, we used BWA v0.7.17 [11] to index our reference genome S. vulgaris vAU1.0 [12], align the sequences, and produce output files containing the alignments and their respective base qualities. Alignments were sorted and indexed using SAMtools v1.16.1 [13], and SNPs were called and annotated using BCFtools v1.16 [14]. We then systematically removed replicates and siblings from the dataset using DartR v2.9.7 [15], resulting in a final individual count of 76 (N=76). Finally, we used VCFtools v0.1.15 [16] to filter out poor-quality SNPs and remove SNPs present in less than 50% of the individuals in each sampling location. We ran one final filtering step to ensure low levels of missing data (due to technical errors during sequencing or issues sequencing particular segments of DNA), resulting in a final dataset containing 29,117 unique SNP sites.
Figure 1: Map showing starling locations sampled across the Aotearoa invasive range. Site abbreviations: AUK - Auckland, PLM – Palmerston North, UHT – Upper Hutt, BLN – Blenheim, CAN – Canterbury.
Figure 2: PCA of the genetic data from five sampled locations from the Aotearoa invasive range. Each dot represents an individual starling and their genetic data. The plot displays PCA axis 1 (3.3% variance explained) and PCA axis 2 (1.9% variance explained) and illustrates that Auckland (AUK) is genetically distinct from other sampling locations.
Figure 3: Heatmap of pairwise analysis of the genetic differences (measured as pairwise FST) between each of the five sampled Aotearoa invasive range locations. Brighter green indicates a higher FST, which indicates more genetic differentiation. The highest genetic differences are between Auckland and all other locations.
Figure 4: Admixture ancestry profile of Aotearoa sampling locations assuming two distinct ancestral source populations. Each vertical bar represents an individual, and the proportion of the two ancestries (green versus purple) are shown. Most individuals are overwhelmingly one colour, indicating they are of a single ancestry with little mixture between the source populations.
Table 1: Consolidated table showing the sample sizes (n) for the five sample sites, and the number of private alleles for each location. Large numbers of private alleles indicate genetic distinctiveness compared to other locations.
Analysis
The SNP dataset was analysed to extrapolate Aotearoa population structuring and spatial partitioning. R v4.2.1 [17] was used with the Bioconductor and DartR suite of packages to generate principal components analysis (PCA), which can be used to visualise how genetically similar populations are based on how much they overlap or how closely they cluster together on the plot. DartR was used to calculate a metric of pairwise population differentiation (pairwise FST) between sampling locations and to report SNP variants that were only found in one site (‘private alleles’) by comparing each sample location against the aggregation of all other locations (method = one2rest). Finally, Admixture v1.3.0 [18] was used to infer ancestry proportions, with individuals sharing similar ancestry profiles if they are genetically similar.
Results
Population Structure of Aotearoa Starlings
Our PCA of the five Aotearoa sampling locations reveals varying degrees of subpopulation structure within Aotearoa, with the exception of Palmerston North (PLM) and Upper Hutt (UHT), which form a single cluster of genetically similar individuals (Figure 2). Auckland (AUK) is the most divergent of the Aotearoa sample locations on PC1, with the greatest differentiation existing between Blenheim (BLN) and AUK. This level of population differentiation is supported by the corresponding pairwise FST value between these two locations (FST = 0.043; higher values reflect higher levels of genetic difference between populations) (Figure 3). Except for AUK, pairwise FST values for all Aotearoa locations fall in the range of 0.007 to 0.016, while AUK pairwise FST values are notably higher, falling in the range of 0.031 to 0.043 (Figure 3). When we plotted ancestry admixture with the assumption of two distinct starling lineages, our findings supported this pattern. The distinctive clustering of AUK compared to all other sampling locations and the differentiation between AUK and BLN matched our admixture assumptions. (Figure 4).
Of the genetic diversity indices we examined, of note was the relatively high number of private alleles (113 private alleles) for AUK (Table 1). Additionally, both PLM and UHT have a relatively low number of private alleles (PLM = 2, UHT = 0) relative to the other sampling locations.
Discussion
Using reduced-representation sequencing data generated from individuals sourced across five sampling locations in Aotearoa, this study has successfully produced the first high-resolution genomic dataset characterising the population structure and genetic diversity of starlings in the invasive Aotearoa range.
Genetics and Introduction History of Starlings in Aotearoa
As documented in shipping records and records from acclimatisation societies, starlings were introduced to Aotearoa between 1860 and 1873 in multiple shipments to primarily three locations: Auckland, Canterbury, and Otago, with the majority of introductions in other regions reportedly resulting from translocations from the established Otago population [4]. Our data displays a pattern in the genetic population structure congruent with this introduction history regarding Auckland, but less of a match with the introduction history of Otago and Canterbury. Notably, although we have sampled Canterbury directly, we are limited to inferring the Otago population indirectly from the Blenheim, Upper Hutt, and Palmerston North samples. These locations reportedly received the majority of their stock from Otago translocations.
In addition to the observed genetic differentiation and distinctive admixture evident in starling ancestry, Auckland’s introduction history is further supported by the relatively large number of private alleles compared to the other Aotearoa sampling locations (Table 1). Private alleles in other invasive species have been attributed to introductions from different source locations [19]. The retention of these private alleles in Auckland also suggests that gene flow is likely restricted between Auckland and the other Aotearoa sampling locations. This restriction implies that it may be possible to manage starling populations in the Auckland region in order to control the negative impacts that starlings are having, particularly at vineyards in the region.
While a distinct signal of introduction history and restricted gene flow is evident in Auckland, it is not as clear for the remaining sample locations. There may be a degree of connectivity and gene flow between these southern locations. Parts of the South Island of Aotearoa experience winter temperature conditions at the lower end of starlings’ thermal tolerance [20], which could trigger some northward dispersal. Of particular note is the lack of genetic differentiation between Canterbury and Palmerston North (pairwise Fst 0.009), two locations that were reportedly founded from different sources [4]. Individuals may travel between these locations, perhaps via seasonal migration. Except for Auckland, these two sampling locations span the largest geographical distance, with Blenheim and Upper Hutt located between them (Figure 1).
Conclusion
In summary, our study has demonstrated that demographic effects associated with the dynamics of introduction history strongly influence genetic diversity and contemporary population structure for starlings in Aotearoa. Furthermore, we have uncovered variable levels of spatial partitioning across the invasive Aotearoa range suggesting that dispersal, and subsequently gene flow, may be restricted in some but not all sampling locations. This provides some evidence that local species management programs may be successful in reducing the numbers of this invasive pest in Aotearoa.
Acknowledgements
First, I want to acknowledge my supervisor, Dr Kat Stuart. Kat’s unwavering patience and exceptional knowledge of computing and genetics meant I got so much more out of this scholarship than I ever thought possible. I also want to thank Associate Professor Anna Santure for her constant support and encouragement - I wouldn’t have taken up the offer without it. Finally, I want to thank everyone in the Santure Lab. They are an incredibly talented group of computational biologists and great people who make it a fun and relaxed environment to work in.
[1] K. C. Stuart, W. B. Sherwin, J. J. Austin, M. Bateson, M. Eens, M. C. Brandley, and L. A. Rollins, “Historical museum samples enable the examination of divergent and parallel evolution during invasion,” Molecular Ecology, vol. 31, no. 6, pp. 1836-1852, 2022, doi: https://doi.org/10.1111/mec.16353.
[2] K. C. Stuart, N. R. Hofmeister, J. M. Zichello, and L. A. Rollins, “Global invasion history and native decline of the common starling: insights through genetics,” Biological Invasions, vol. 25, no. 5, pp. 1291-1316, 2023/05/01 2023, doi: 10.1007/s10530-022-02982-5.
[3] R. M. McDowall, Gamekeepers for the nation : the story of New Zealand’s acclimatisation societies, 1861-1990. Christchurch, N.Z: Canterbury University Press, 1994.
[4] P. Pipek, T. M. Blackburn, and P. Pyšek, “The ins and outs of acclimatisation: imports versus translocations of skylarks and starlings in 19th century New Zealand,” Biological Invasions, vol. 21, no. 4, pp. 1395-1413, 2019/04/01 2019, doi: 10.1007/s10530-018-1905-y.
[5] D. Sol, I. Bartomeus, and A. S. Griffin, “The paradox of invasion in birds: competitive superiority or ecological opportunism?,” Oecologia, vol. 169, no. 2, pp. 553-564, 2012/06/01 2012, doi: 10.1007/s00442-011-2203-x.
[6] P. Matheson and A. McGaughran, “Genomic data is missing for many highly invasive species, restricting our preparedness for escalating incursion rates,” Scientific Reports, vol. 12, no. 1, p. 13987, 2022/08/17 2022, doi: 10.1038/s41598-022-17937-y.
[7] G. M. Thomson, The naturalisation of animals & plants in New Zealand. Cambridge [Eng.]: The University Press, 1922.
[8] A. Kilian et al., “Diversity Arrays Technology: A Generic Genome Profiling Technology on Open Platforms,” in Data Production and Analysis in Population Genomics: Methods and Protocols, F. Pompanon and A. Bonin Eds. Totowa, NJ: Humana Press, 2012, pp. 67-89.
[9] S. Chen, Y. Zhou, Y. Chen, and J. Gu, “fastp: an ultra-fast all-in-one FASTQ preprocessor,” Bioinformatics, vol. 34, no. 17, pp. i884-i890, 2018, doi: 10.1093/bioinformatics/bty560.
[10] J. Catchen, P. A. Hohenlohe, S. Bassham, A. Amores, and W. A. Cresko,“Stacks: an analysis tool set for population genomics,” (in eng), Mol Ecol, vol. 22, no. 11, pp. 3124-40, Jun 2013, doi: 10.1111/mec.12354.
[11] H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” (in eng), Bioinformatics, vol. 25, no. 14, pp.1754-60, Jul 15 2009, doi: 10.1093/bioinformatics/btp324.
[12] K. C. Stuart et al., “Transcript- and annotation-guided genome assembly of the European starling,” Molecular Ecology Resources, vol. 22, no. 8, pp.3141-3160, 2022, doi: https://doi.org/10.1111/1755-0998.13679.
[13] H. Li et al., “The Sequence Alignment/Map format and SAMtools,”Bioinformatics, vol. 25, no. 16, pp. 2078-2079, 2009, doi: 10.1093/bioinformatics/btp352.
[14] H. Li, “A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data,” Bioinformatics, vol. 27, no. 21, pp. 2987-2993, 2011, doi: 10.1093/bioinformatics/btr509.
[15] J. L. Mijangos, B. Gruber, O. Berry, C. Pacioni, and A. Georges, “dartR v2: An accessible genetic analysis platform for conservation, ecology and agriculture,” Methods in Ecology and Evolution, vol. 13, no. 10, pp. 2150-2158, 2022, doi: https://doi.org/10.1111/2041-210X.13918.
[16] P. Danecek et al., “The variant call format and VCFtools,” Bioinformatics, vol. 27, no. 15, pp. 2156-2158, 2011, doi: 10.1093/bioinformatics/btr330.
[17] R: A Language and Environment for Statistical Computing. (2022). R Foundation for Statistical Computing. [Online]. Available: https://www.R-project.org/
[18] D. H. Alexander, J. Novembre, and K. Lange, “Fast model-based estimation of ancestry in unrelated individuals,” (in eng), Genome Res, vol. 19, no. 9, pp. 1655-64, Sep 2009, doi: 10.1101/gr.094052.109.
[19] A. G. Da Silva, J. R. Eberhard, T. F. Wright, M.L. Avery, and M. A. Russello, “Genetic evidence for high propagule pressure and long-distance dispersal in monk parakeet (Myiopsitta monachus) invasive populations,” Molecular Ecology, vol. 19, no. 16, pp. 3336-3350, 2010/08/01 2010, doi: https://doi.org/10.1111/j.1365-294X.2010.04749.x.
[20] S. R. Johnson and I. M. Cowan, “The energy cycle and thermal tolerance of the starlings (Aves, Sturnidae) in North America,” Canadian Journal of Zoology, vol. 53, no. 1, pp. 55-68,1975, doi: 10.1139/z75-007 %M 1116064.
Bryan is in his final year of his BSc, double majoring in Biology and Anthropological Science. He is interested in the interplay between proximate and ultimate mechanisms of evolution and how they together shape species over time. He plans to start his MSc in 2025.