Abstract:
Concerns of declines in insects and population level responses to climate change have highlighted the importance of estimating trends in abundance and phenology from existing monitoring data. As the taxa with the most systematic monitoring data, butterflies are a frequent focus for understanding trends in insects. Even so, ecologists often have only sparse monitoring data for at-risk butterfly populations. As existing statistical techniques are typically poorly suited to such data, these at-risk populations are frequently excluded from analyses of butterfly trends. Here we present guidelines for estimating population trends from sparse butterfly monitoring data using generalized additive models (GAMs), based on extensive simulations and our experiences fitting hundreds of butterfly species. These recommendations include pre-processing steps, model structure choices, and post-hoc analysis decisions that reduce bias and prevent or mitigate biologically implausible model fits. We also present the ButterflyGamSim package for the programming language R, available at GitHub - cbedwards/butterflyGamSims: What the Package Does (One Line, Title Case). This open source software provides tools for ecologists and applied statisticians to simulate realistic butterfly monitoring data and test the efficacy of different GAM model choices or monitoring schemes.
A key task in both population ecology and conservation biology is to infer population trends – both growth rates and, increasingly, trends in phenology – from monitoring data. These trends are the foundations for studies in basic ecology, study of species of conservation concerns (e.g., Bonoan et al. 2021), and comparative studies identifying overarching patterns in abundance and phenology across species (Diamond et al. 2011, Forister et al. 2021). Many species, such as butterflies, have short activity periods during which they are conspicuous, with the timing of peak activity varying from year to year. Monitoring programs for such species are often structured around repeated surveys across a year, in turn repeated across years and sometimes across sites (e.g. Pollard Walk design) (Pollard 1994, Shapiro 2020, PollardBase). Ecologists thus need practical tools for estimating yearly quantities of abundance and phenology, which can then be scaled up to identify variation and trends across years.
Many tools have been developed to translate repeated yearly surveys into yearly estimates of population characteristics (Edwards and Crone 2021), particularly for butterflies. Fundamental to most of these tools are underlying assumptions about the shape of activity: some methods assume survey counts can be approximated with a unimodal, gaussian shape across a single year (Lindén and Mäntyniemi 2011, Dennis et al. 2015, Stewart et al. 2020, Edwards and Crone 2021); the Zonneveld model and the epsilon-skew-normal models fit unimodal shapes with varying degrees of skew and kurtosis (Zonneveld 1991, Clark and Thompson 2011); Gaussian mixture models can represent multimodal activities as might be expected for multivoltine species, presuming activity can be decomposed into Gaussian curves (Proïa et al. 2016). In contrast to most methods, general additive models (GAMs) in the form of smoothing splines make very few assumptions about the shape of activity, making them a strong choice (a) for analyses when aspects of yearly activity are not known ahead of time and must be inferred from the data (i.e., uncertain or shifting voltinism), or (b) to provide a consistent framework for comparative analysis including species with diverse patterns of activity (Rothery and Roy 2001, Hodgson et al. 2011, Stemkovski et al. 2020).
The key benefit GAMs provide is the ability to specify a predictor (e.g., day of year) as having some unknown, potentially nonlinear relationship with the response (e.g., butterflies seen). Following Pederson et al. (2019), we will refer to the terms representing these relationships as “smoothers”. Because these smoothers are flexible, GAMs are able to allow data to identify the relationship (including any nonlinearity) between a predictor day of year and butterfly count, rather than having that relationship be dictated by the modeler. However, the flexibility of smoothing splines leaves them prone to overfitting when working with sparse data, as is common for some types of population monitoring data. The simple solution of dropping years or populations with limited data will bias inferences made from estimated trends (Didham et al. 2020), particularly since populations with more limited data have been found to be disproportionately at-risk species (Forister et al. 2023). Here we offer specific approaches – pre-processing steps, model structures, and post-hoc analysis steps – that perform well for sparse monitoring data. For those looking to understand smoothing splines and how to use them, we highly recommend Noam Ross’s free interactive course “GAMs in R” (https://noamross.github.io/gams-in-r-course/).
To our knowledge, there are no consistent guidelines for using smoothing splines to estimate population trends from monitoring data, particularly for sparse monitoring data. We present guidelines and approaches we have developed in the process of fitting monitoring data for hundreds of butterfly species, informed by extensive simulation using our ‘butterflyGamSim’ R package. These guidelines are targeted for working with sparse butterfly monitoring data from temperate regions (with seasonal rather than year-round activity). However, many of our considerations are relevant for other systems (e.g. choosing appropriate model structure, post-hoc analysis for populations that appear to go extinct). We provide implementation examples specifically for the package ‘mgcv’ (Wood 2011) in the R programming language (R Development Core Team 2023), but our guidelines are relevant for other software implementations of smoothing splines.
We begin with an overview of using smoothing splines, describe our new butterflyGamSims package, and then provide our recommendations. Simulation results are summarized in the supplements; all simulation results will be made available upon publication.
Link to full article:
Estimating butterfly population trends from sparse monitoring data using Generalized Additive Models | bioRxiv
Key Points
- Butterfly Population Trends: The article discusses the importance of estimating butterfly population trends and phenology from monitoring data, emphasizing the challenges posed by sparse data for at-risk species1.
- GAMs for Sparse Data: It introduces guidelines for using Generalized Additive Models (GAMs) to analyze sparse butterfly monitoring data, detailing pre-processing steps, model structure choices, and post-hoc analysis to improve accuracy.
- ButterflyGamSim Package: The authors present the ButterflyGamSim package for R, which simulates realistic butterfly monitoring data, helping ecologists test GAM model choices and monitoring schemes2.
- Recommendations and Cautions: The article provides specific recommendations for implementing GAMs in ecological studies and cautions about interpreting phenology trends, especially for multivoltine species with complex activity patterns.
The article aims to facilitate analyses of previously unusable data to enhance understanding of insect population dynamics and phenology3.
References: biorxiv.org
123