Hybridization among diverging lineages is common in nature. Genomic data provide a special opportunity to characterize the history of hybridization and the genetic basis of speciation. We review existing methods and empirical studies to identify recent advances in the genomics of hybridization, as well as issues that need to be addressed. Notable progress has been made in the development of methods for detecting hybridization and inferring individual ancestries. However, few approaches reconstruct the magnitude and timing of gene flow, estimate the fitness of hybrids or incorporate knowledge of recombination rate. Empirical studies indicate that the genomic consequences of hybridization are complex, including a highly heterogeneous landscape of differentiation. Inferred characteristics of hybridization differ substantially among species groups. Loci showing unusual patterns – which may contribute to reproductive barriers – are usually scattered throughout the genome, with potential enrichment in sex chromosomes and regions of reduced recombination. We caution against the growing trend of interpreting genomic variation in summary statistics across genomes as evidence of differential gene flow. We argue that converting genomic patterns into useful inferences about hybridization will ultimately require models and methods that directly incorporate key ingredients of speciation, including the dynamic nature of gene flow, selection acting in hybrid populations and recombination rate variation. Keywords: evolutionary genomics, gene flow, hybridization, introgression, reproductive barriers, speciation The scope of hybridization plays a special role in understanding how new species are born. The capacity of lineages to hybridize is often used as a basis for species identification (Dobzhansky 1937; Mayr 1963). More generally, the dynamics of gene flow between diverging populations describe key elements of the speciation process. Hybridization leaves detectable footprints in genomes, raising the prospect that it can be characterized from genomic patterns of variation. In particular, genomic data provide avenues to answer two basic questions about speciation: (i) what is the history of gene flow between nascent species and (ii) what genetic barriers maintain their integrity? Although these questions were addressed before the genomic era, the ability to rapidly and affordably survey genomic variation among individuals and populations has ushered in new hope of answering them (Sousa & Hey 2013; Seehausen et al. 2014). Because of recombination and independent assortment during meiosis, diversity patterns at the very large number of loci sampled by genomic data sets can be viewed (ideally) as replicated outcomes of hybridization history, enabling increasingly accurate reconstruction of that history. At the same time, inspection of the genomic distribution of variation can pinpoint specific regions with unusual patterns that might contribute to reproductive isolation or adaptive introgression. Genomic studies of hybridization generally follow a recipe with two steps. First, genomewide patterns of variation are catalogued. If each of the sampled populations is suspected or determined to belong to one diverging lineage (e.g. species) or another, differentiation between these populations is quantified. If the sampled populations likely reflect recent hybridization (e.g. in hybrid zones), other measures of admixture – such as geographic clines in allele frequency across transects – might be used. The second step is to compare observed genomic distributions to those expected under one or more models to draw inferences about hybridization and speciation. Genomewide portraits of differentiation and admixture are shaped by a suite of evolutionary processes. Parameters of particular interest from the speciation perspective include the form, frequency, magnitude and timing of hybridization, as well as the fitness of hybrids. But several other factors complicate interpretations. Natural selection acting before or after hybridization affects differentiation at targeted mutations and at nearby neutral variants (Wright 1969; Charlesworth et al. 1997; Nordborg 1997), thereby inflating heterogeneity in differentiation among loci, even in the absence of gene flow (Noor & Bennett 2009; Nachman & Payseur 2012; Cruickshank & Hahn 2014; Burri et al. 2015). Recombination rate – a parameter that varies both across the genome and among species (Smukowski & Noor 2011) – determines the genomic scale over which patterns of differentiation and admixture vary. Shared variation among populations may reflect unsorted ancestral polymorphism (incomplete lineage sorting) rather than hybridization (Clark 1997; Wakeley & Hey 1997). In this paper, we synthesize emerging trends in the genomics of hybridization between diverging lineages. After reviewing analytical methods that can be used to characterize hybridization, we summarize findings from empirical studies conducted on the genomic scale. Based on this survey, we prioritize future research to maximize insights into the process of speciation. Translating genomic data sets into useful inferences about hybridization and speciation is a formidable task. Recognizing the increasing practicality of collecting high-quality genomic data, we focus here on analytical prospects and challenges. We refer the reader to recent reviews for guidance on important issues related to sequencing, genotyping and variant calling (Nielsen et al. 2011; Altmann et al. 2012; Arnold et al. 2013; Buerkle & Gompert 2013; Mardis 2013; Narum et al. 2013; Schlötterer et al. 2014; Olson et al. 2015). We surveyed the literature for analytical methods that have either been applied in genomic studies of hybridization between diverging lineages or have the potential to be applied in such studies. Both methods developed specifically to understand speciation and methods developed for other purposes were included. A wide range of methods is available (Table 1). The collection of approaches samples major descriptors of genetic variation: levels of diversity and divergence, site frequency spectra, haplotype variation and phylogenetic relationships. Methods vary in their characterization of hybridization. Defining gene flow as the movement of alleles between populations, different methods provide access to contrasting levels of gene flow. For example, coalescent approaches usually assume low rates, whereas analyses of clines consider rates high enough to interfere with selection. Some strategies aim to simply detect gene flow, whereas others strive to reconstruct its magnitude and/or timing. Depending on the approach, estimated rates of gene flow are genomewide averages or locus-specific values. Hybridization is modelled as an instantaneous event or as a continuous phenomenon. For several parametric approaches, there exists a trade-off in computational cost between increasing the number of individuals and increasing the number of loci sampled. For example, methods that fit the isolation with migration model using Markov chain Monte Carlo handle many individuals but are slow with a large number of loci (Hey & Nielsen 2004, 2007; Hey 2010; Sethuraman & Hey 2015), whereas analytical likelihood methods applied to this model can use entire genomes but are restricted to a few individuals (Lohse et al. 2011, 2015, 2016). Principal components analysis (PCA) (a nonparametric approach) can be efficiently applied to whole-genome data sets from large numbers of individuals (Patterson et al. 2006; Price et al. 2006), but it provides no direct insights into the causes of resulting patterns. Methods differ in the frequency of usage by researchers studying the genomics of speciation (Table 2). The most popular strategy interprets heterogeneity in summary statistics of between-population differentiation – usually FST – as evidence of differential gene flow across the genome. Another commonly used class of approaches searches for signs of gene flow in genomic patterns of shared-derived variants between populations or species. In genomic studies of hybridization, the ‘ABBA-BABA test’ (Green et al. 2010; Durand et al. 2011) is a widespread application of such phylogenetic thinking. Interest in this test and its derivatives (Green et al. 2010; Durand et al. 2011; Patterson et al. 2012; Martin et al. 2015b; Pease & Hahn 2015), referred to as ‘D-statistics’, was stimulated by its original application (Green et al. 2010), which revealed that ancestors of modern humans hybridized with Neanderthals. Several approaches commonly used to infer demographic history within species (especially humans) – including diffusion-based analysis of the site frequency spectrum (Gutenkunst et al. 2009) and a host of methods based on individual haplotypes (Pool & Nielsen 2009; Gravel 2012; Mailund et al. 2012; Harris & Nielsen 2013) – have seen limited application in the genomic characterization of gene flow between diverging lineages. Inferences about hybridization and speciation from genomic data
Overall, available methods provide useful tools for reconstructing hybridization. The diversity of focal genomic patterns (Table 1) suggests that collectively, methods can access gene flow occurring over a range of timescales. For example, phylogenetic patterns among species can reveal older hybridization events, whereas the distribution of shared haplotype lengths is shaped by recent gene flow. Although no methods incorporate all aspects of genomic variation, many approaches adopt powerful likelihood or Bayesian frameworks that avoid drastic summaries of the data. For example, several strategies are available that analyse the complete (unsummarized), site frequency spectrum across multiple populations. The ancestries of individuals provide special insights into hybridization history. Importantly, diverse strategies now exist for probabilistically assigning individual ancestry, assuming that appropriate reference populations can be surveyed (Gompert & Buerkle 2013). Since the Bayesian clustering approach Structure revolutionized the inference of ancestry proportions (Pritchard et al. 2000), methods have been developed that use the same likelihood (genotype data conditional on ancestry) but different algorithms to achieve substantial increases in computational speed (Tang et al. 2005; Alexander et al. 2009; Raj et al. 2014). With other methods, including HapMix (Price et al. 2009) and RASPberry (Wegmann et al. 2011), the locus-specific histories of individuals can be reconstructed, enabling changes in ancestry to be detected over short chromosomal distances. Computational and statistical advances have significantly expanded the hybridization scenarios that may be considered. With approximate Bayesian computation, genomic data can be fit to any model that can be rapidly simulated, allowing (in principle) arbitrarily complex histories of gene flow to be compared. Existing methods also enable testing of specific hypotheses about reproductive barriers. For example, the genomic clines framework (Gompert & Buerkle 2009, 2011) statistically evaluates evidence for selection against gene flow at specific genomic locations through comparison to the genomewide hybrid index. Despite their potential, available methods for characterizing hybridization suffer from several challenges when applied to genomic studies of speciation. First, options for measuring gene flow often exclude scenarios of interest. Among the subset of methods that estimate gene flow, most assume it happens continuously at an invariant rate. Few approaches reliably reconstruct the timing of hybridization. For example, popular methods based on the isolation with migration framework may produce wide confidence intervals on the timing of gene flow and falsely infer the presence of gene flow under some conditions (Sousa et al. 2011; Strasburg & Rieseberg 2011; Cruickshank & Hahn 2014; Hey et al. 2015). These restrictions do not match the reality of the speciation process, in which the opportunity for hybridization varies over time due to accumulating reproductive isolation, fluctuating geographic range sizes and other factors. Indeed, temporal changes in gene flow are key ingredients for distinguishing major models of speciation. For example, sympatric and para-patric speciation features gene flow during the earliest stages of divergence, whereas speciation by reinforcement is thought to be triggered by secondary contact and gene flow after a period of divergence in allopatry. A second challenge with existing methods is the limited treatment of natural selection. Approaches that estimate the rate of gene flow usually assume that population history and neutral mutation account for genomic patterns. Therefore, selection has the potential to bias geneflow estimates. This problem could be severe for lineages with a history of reproductive isolation, especially if many loci contribute to isolation. A more fundamental methodological issue in the context of speciation is the assumption that one rate of gene flow characterizes the entire genome (Sousa et al. 2013). Because selection acting on hybrids targets specific loci, rates of introgression could vary significantly across the genome (Barton & Hewitt 1985; Harrison 1990; Payseur 2010); indeed, documenting this variation is a major goal of genomic studies of speciation. Alternatively, genome scan approaches search for locus-specific distortions in summary statistics, but usually leave unestimated rates of gene flow and associated selection coefficients. These shortcomings are not surprising as most methods were designed to analyse population structure within species rather than gene flow among diverging lineages. Although strategies specifically developed to analyse the genomic consequences of mixing in hybrid zones (e.g. geographic and genomic clines) enable the detection of selection and differential introgression in hybrids, they downplay the effects of demographic history and do not estimate locus-specific rates of gene flow. The rate of meiotic recombination is another key parameter to consider in genomic studies of hybridization (Sousa & Hey 2013). Differential gene flow across a chromosome requires recombination. Recombination rate determines the chromosomal scale over which selection targeting reproductive isolation mutations reduces gene flow at neighbouring loci (Bazykin 1969; Barton 1979; Bengtsson 1985; Barton & Bengtsson 1986; Gavrilets 1997), the focal signature of genomic scan methods. Furthermore, several models of speciation propose that loci involved in reproductive isolation will preferentially accumulate in rearranged (Noor et al. 2001; Rieseberg 2001; Navarro & Barton 2003) or collinear (Butlin 2005) chromosomal regions with little recombination, allowing divergence to continue in the face of hybridization. Unfortunately, genomic methods for studying speciation ignore the reality that recombination rates vary across chromosomes on broad and fine scales (McVean et al. 2004; Kong et al. 2010; Comeron et al. 2012; Liu et al. 2014b). In addition, some methods assume that recombination is absent over short distances (i.e. within loci). As hybridization studies transition to using whole-genome sequences, another practical challenge emerges. How should the genome be partitioned into loci before geneflow analyses are conducted? For example, the typical genomic scan strategy of comparing windows of the same physical size (e.g. 10 kb) implicitly assumes that the rate of recombination is invariant. Most methods do not yet account for the expectation that switches in ancestry associated with hybridization will occur over finer scales in genomic regions with higher recombination rates. Several approaches assume free recombination among polymorphisms, even when they are closely linked. A more general challenge arises from the treatment of unlinked loci as independent, a feature of all available methods. Loci on different chromosomes can behave nonindependently in scenarios of interest. For example, recent hybridization events involving only a few individuals can generate introgression at many loci (Goodman et al. 1999). The role of geography in speciation has long been and continues to be a source of controversy (Mayr 1963; Coyne & Orr 2004; Seehausen et al. 2014). Knowledge of the geographic locations of past hybridization events is vital for understanding the connection between gene flow and speciation. With few exceptions (Yang et al. 2014), genomic methods for analysing hybridization do not consider geographic information nor do they infer the geographic context of gene flow. Outcomes of hybridization may vary depending on whether the hybridizing populations diverged in the presence of gene flow or came into contact secondarily after a period of allopatry. But reliable methods for distinguishing between these scenarios are not readily available. Theoretical expectations for patterns of genome divergence following secondary contact or divergence with gene flow appear to be similar under equilibrium conditions (e.g. Barton 1979; Barton & Hewitt 1985; Charlesworth et al. 1997; Feder & Nosil 2010). However, given the extent of temporal variation in geographic ranges due to major climate oscillations, nonequilibrium conditions are likely to be common and could impact inferences about hybridization, as well as its consequences. For example, hybrid zones in the Arctic and some temperate regions are thought to have formed repeatedly following interglacial range expansions of hybridizing lineages, in some instances leading to introgression or hybrid speciation (Hewitt 2011). In summary, researchers may choose from a variety of analytical approaches to draw inferences about hybridization from genomic data. Nevertheless, existing strategies are missing some components that are important in the context of speciation, suggesting that investment in further method development is warranted. To identify emerging empirical patterns in the genomics of hybridization and speciation, we surveyed the literature. We focused on studies that: (i) articulated a specific interest in characterizing hybridization between diverging lineages (rather than simply scanning genomes for evidence of natural selection); (ii) measured variation across the genome at thousands to millions of loci; and (iii) examined species, subspecies or populations with independent evidence for reproductive isolation (rather than focusing on ecotypic differentiation). Both studies that sampled geographically separate populations and those that examined currently hybridizing populations were considered. Inspection of the studies (listed in Table 2) reveals several themes. Genomic data confirm what is now conventional wisdom: hybridization between diverging lineages is common (Arnold 1997; Rieseberg 1997; Mallet 2005). Many instances of hybridization were previously unreported and unanticipated (e.g. genetic exchange between humans and Neanderthals), an indication of the potentially high power of genomic data to detect gene flow, even when it involves extinct populations. The history of hybridization among diverging lineages is sometimes complex, including multiple periods of gene flow as well as asymmetries in its direction. In some cases, genomic data raise the possibility that hybridization facilitated speciation (e.g. butterflies, cichlids). Nevertheless, the inference that most of the hybridizing lineages in Table 2 – including a ring species (Alcaide et al. 2014) – evolved in isolation for some period of time suggests that geographic separation contributes to reproductive isolation in many organismal lineages (Mayr 1963; Coyne & Orr 2004). A consistent observation across surveyed taxa is rampant heterogeneity in patterns of differentiation and admixture within the genome, sometimes over short chromosomal distances. As discussed above, this patchwork reflects a combination of evolutionary factors, including incomplete lineage sorting, effects of selection before and after hybridization, and recombination rate variation, as well as gene flow and selection targeting hybrids. That heterogeneity occurs over a fine genomic scale suggests that some aspects of hybridization can only be captured by examining the entire genome. In those studies that sample hybrid zones, this heterogeneity raises the prospect that boundaries between nascent species are semipermeable, with a subset of loci maintaining reproductive barriers (Harrison 1986, 1990; Wu 2001). In those studies that sample geographically separate populations, this conclusion is more difficult to reach: the null model of a low rate of gene flow that is constant across the genome could still produce strong heterogeneity if only a few regions introgress. From a phylogenetic perspective, the genomes of recently diverged species are complex mosaics of alternating histories, highlighting the difficulty of reconstructing species relationships, even with genomic data. Indeed, extensive hybridization challenges basic assumptions of phylogenetic methods, suggesting that some species histories are better represented as phylogenetic networks (Yu et al. 2014). Genomic regions that display high differentiation (in allopatric populations) or narrow clines (in hybrid zones) could contain genetic changes that confer reproductive isolation. Under this assumption, the studies in Table 2 collectively raise a few genomic themes that could characterize reproductive barriers between species. First, the number of loci that appear to maintain differentiation in the face of gene flow is usually high, suggesting a complex genetic basis to reproductive isolation. An alternative explanation for this pattern is that many loci are falsely labelled as ‘genomic outliers’ because significance thresholds are based on null models that ignore or mischaracterize demographic history. A second observation is that although outlying loci are scattered throughout the genome, there is evidence for a bias towards regions of low recombination. In several species, this pattern takes the form of higher differentiation near centromeres, which are known or suspected to recombine less. In others, the relationship between outlier location and recombination rate is more obvious. Cline width and recombination rate are positively correlated across the genome in a hybrid zone between two subspecies of house mice (Janoušek et al. 2015). Selection to remove long genomic blocks from one species in hybrids was postulated to explain a negative correlation between absolute divergence and recombination rate in monkey flowers (Brandvain et al. 2014). Rearranged chromosomal regions exhibit higher differentiation than collinear regions in several species pairs. Perhaps these results indicate a role for suppressed recombination in the evolution of reproductive isolation. Alternatively, they could simply reflect increased power to detect isolation loci using linked markers in low-recombination regions. In some cases, the pattern appears to be caused by selection at linked sites within lineages rather than barriers to gene flow (Cruickshank & Hahn 2014; Burri et al. 2015). Regardless, these results reveal the importance of considering variation in local recombination rate when interpreting genomic patterns of differentiation and admixture. A third trend among surveyed species is the over-representation of highly differentiated loci on the sex chromosomes relative to the autosomes, seen in both X-Y and Z-W systems. This pattern agrees with results from controlled crosses in the laboratory, which often associate isolation phenotypes – especially hybrid sterility and hybrid inviability – with the sex chromosomes (Coyne & Orr 2004). Whether this genomic tendency indicates that the sex chromosomes harbour a higher density of loci involved in reproductive isolation (Masly & Presgraves 2007), loci with stronger phenotypic effects on isolation, or both, remains to be seen. Interpretations of this pattern must also recognize the complications inherent in setting neutral expectations for the sex chromosomes, which can experience different rates of genetic drift and migration than the autosomes. The collection of empirical studies (Table 2) reveals considerable interest in using genomic data to understand the role of natural selection in speciation. The degree to which selection can increase divergence in the face of gene flow remains controversial and difficult to determine (Kirkpatrick & Barton 2006; Via 2012; Andrew & Rieseberg 2013; Flaxman et al. 2013, 2014; Yeaman 2013; Cruickshank & Hahn 2014; Feder et al. 2014; Seehausen et al. 2014). In contrast, the idea that reproductive isolation is a by-product of local adaptation in allopatry is widely accepted, but genetic tests of this hypothesis remain rare. Those genomic studies that jointly consider currently allopatric populations and currently hybridizing populations provide a special opportunity to evaluate this influential prediction. If the genetic changes that block gene flow are initially driven to fixation by positive selection, genomic regions with restricted introgression in hybrid zones could display higher differentiation between nascent species. Genomic cline width and differentiation are negatively correlated in lycaenid butterflies and in manakins, suggesting that adaptive evolution contributes to reproductive isolation (Gompert et al. 2012; Parchman et al. 2013). Nevertheless, the weakness of these correlations leaves open the possibility that genetic drift plays an important role in the establishment of reproductive barriers. Despite the trends described above, the genomic landscape of divergence appears to differ substantially among species pairs. Estimated hybridization rates vary widely. Whereas some species show evidence for extensive gene flow with a few genomic regions maintaining divergence (e.g. rabbits, mosquitoes), others show widespread differentiation with limited pockets of introgression (e.g. Drosophila simulans vs. island endemics). The genomic scale of heterogeneity also varies. Although these comparisons are complicated by contrasting study designs, some observed disparities probably reflect biological differences among species pairs in the timing and mode of speciation. This empirical survey also reveals challenges that face genomic studies of hybridization. From the perspective of speciation, a primary motivation for analysing genomic data is its potential for characterizing the magnitude and timing of gene flow among diverging lineages. Only a handful of studies (Table 2) reported estimates of these important quantities. The most widely used measure of locus-specific differentiation was FST, a summary statistic that compares variation between populations to variation within populations (Wright 1951). Although FST enables comparisons of loci that differ in within-population diversity, it suffers from interpretive challenges when used to measure gene flow across the genome. Reduced gene flow, which maintains between-population divergence, is confounded with selection at linked sites, which decreases within-population diversity (Charlesworth 1998; Noor & Bennett 2009; Nachman & Payseur 2012; Cruickshank & Hahn 2014). Furthermore, most studies identified unusually differentiated loci as those with values that simply fell in the tail of the genomic distribution, a strategy complicated by the fact that completely neutral genomic distributions also have tails. Other studies used neutral simulations to construct distributions of summary statistics expected in the absence of selection. This approach ignores the likely possibility that selection against gene flow (and other forms of selection) occurs at many genomic locations in hybrids. Furthermore, Table 2 reveals biases in sampling strategy. The majority of studies examined populations that are currently allopatric, with no a priori evidence of hybridization. Compared to hybrid zones, allopatric populations are usually easier to find and investigators may be more likely to sample them for purposes other than studying gene flow. Nevertheless, connecting genomic patterns with hybridization is much simpler in populations that are currently hybridizing in visible ways (e.g. hybrid zones). Other biases are taxonomic. Animal species – especially vertebrates – were the most popular subjects. It is too early to identify differences between plants and animals from genomic studies of hybridization. But the presence of heteromorphic sex chromosomes seems to alter the genetic architecture of reproductive barriers and to increase the speed with which they accumulate, and most plants lack heteromorphic sex chromosomes (Rieseberg 2001; Coyne & Orr 2004; Levin 2012; Phillips & Edmands 2012; Lima 2014). Likewise, it has long been known that the frequency of hybridization tends to be higher in species with external fertilization, such as plants and fishes, than in taxa with internal fertilization (Stebbins 1959). These and other related observations should motivate genomic studies of hybridization in organismal groups that differ in the genetics of sex determination and/or other key features of reproductive biology. Our overview of methods and empirical studies suggests fruitful directions for research on the genomics of hybridization and speciation. First, model-based estimation of the magnitude and timing of hybridization should be prioritized. Reconstructing the history of gene flow between diverging lineages is a necessary step towards understanding the causes and consequences of speciation, and it may provide key insights. For example, the alternative scenarios of speciation with gene flow and allopatric speciation followed by hybridization during secondary contact could be distinguished by quantifying the dynamics of gene flow. Accurate estimates of the magnitude and timing of hybridization are also needed to interpret the results from scans that seek to identify genomic regions with restricted gene flow, which are growing in popularity. Because the expected genomic distribution of differentiation depends on the level and timing of hybridization, the determination that certain loci experienced reduced gene flow ultimately requires a demographic model including these parameters. At present, some authors seem to assume that evidence of hybridization elsewhere in the genome is sufficient to demonstrate that a differentiated genomic region is resistant to gene flow. But a wide spectrum of differentiation among loci is expected, even in the absence of reproductive barriers. Second, natural selection should be directly incorporated into methods for characterizing hybridization between diverging lineages. Even partial reproductive isolation can generate strong selection against hybrids that shapes variation throughout the genome (Barton 1983; Barton & Bengtsson 1986). Accounting for the effects of selection is therefore a necessary step towards accurately reconstructing the history of hybridization. Furthermore, model-based characterization of selection could provide insights into genetic mechanisms of speciation. By estimating the contributions of individual loci to hybrid fitness, the genetic architecture of reproductive barriers in nature, as well as the identity of genomic regions and candidate genes that underlie adaptive introgression or heterosis, could be revealed. It may be possible to compare the fit of observed genomic patterns to models featuring different architectures. For example, selection against heterozygotes at single loci is predicted to impede gene flow at linked variants more strongly than epistatic selection against two-locus, hetero-specific genotypes (‘hybrid incompatibilities’) (Gavrilets 2004; Gompert & Buerkle 2011; Lindtke & Buerkle 2015). The genomic clines approach (Gompert & Buerkle 2009, 2011) provides a good foundation for incorporating selection into genomic studies of hybridization in the context of hybrid zones. For currently allopatric populations that hybridized in the past, model-based assignment of loci to one of a few classes with different rates of gene flow (due to selection) (Sousa et al. 2013) may offer a promising start towards the construction of more realistic methods. Building geographic information into genomic methods for analysing hybridization is another worthwhile goal. This strategy would recognize that the prior probabilities associated with past hybridization are functions of current geographic ranges. It would also enable tests of speciation models that depend on geography. For example, reinforcement is expected to generate ‘inverse clines’ in which greater divergence for the trait/loci under reinforcing selection is expected between populations at the centre than at the ends of the cline (Caisse & Antonovics 1978; Andrew & Rieseberg 2013). Finally, incorporating geographic information could help identify the selective forces responsible for initial divergence and the formation of reproductive isolation, an important goal (Shaw & Mullen 2011). In this regard, clinal models of adaptation with gene flow (Endler 1977; Novembre & Di Rienzo 2009) could provide useful guidelines. Even without these methodological advances, authors of empirical studies should be explicit about the models they are considering and the quantitative patterns they expect to find. In general, it is important to bear in mind that hybrid genomes are outcomes of a complex mixture of processes. As a result, intuition about expected patterns will sometimes mislead. Studies that embrace the complications inherent in the speciation process – rather than automatically attributing observed genomic patterns to differential gene flow generated by hybridization – should be prioritized. Our survey suggests other guidelines for genomic studies of hybridization and speciation. Reproductive isolation should be directly measured whenever possible. Genomic comparisons alone are unlikely to identify genes responsible for reproductive barriers. Understanding how genetic variants restrict gene flow will ultimately require characterizing their functional effects on specific isolation phenotypes. More broadly, the interpretation of genomic patterns will be enriched by comparison to knowledge of reproductive barriers. For example, the over-representation of sex chromosome loci in genetic dissections of isolation traits provides a potential explanation for higher differentiation of sex-linked loci in genomic studies of hybridization. Likewise, a recent critique of the evidence for homoploid hybrid speciation showed that while admixture was well-documented in most putative hybrid species, only a handful of studies connected hybridization to reproductive barrier formation – a key criterion for hybrid speciation (Schumer et al. 2014a). Genomic studies of hybridization should stimulate the development of new organismal models for speciation. Most information about mechanisms of speciation still comes from a small subset of species, but Table 2 suggests that the dynamics and outcomes of hybridization could differ significantly across taxa. Examining those systems featuring multiple species pairs that collectively sample a range of divergence times will enable the effects of hybridization to be evaluated at different stages of speciation. Furthermore, species that currently hybridize offer the most direct insights into the genomic consequences of hybridization, indicating that these species should be prioritized in speciation studies. Finally, the limits to inferring hybridization from genomic data should be examined and embraced. The space of potential hybridization scenarios is expansive and which subset of historical information is recorded in genomes is highly stochastic. Determining what we can realistically hope to learn about speciation from genomic data – rather than treating it as a panacea – is an important next step. We thank Nick Barton, Jeff Good and Richard Abbott for organizing this special issue of Molecular Ecology and for inviting us to contribute. We thank Nick Barton and three anonymous reviewers for insightful comments that improved the manuscript. BAP's work on hybridization is supported by NSF grant DEB 1353737. LHR's work on hybridization is supported by an NSERC Discovery grant. BAP developed the outline of the paper, conducted most of the literature survey, and completed the bulk of the writing. LHR critiqued the outline, contributed to the writing, and helped with the literature survey, especially with respect to plant hybridization.
|