Bomb dating , age validation and quality control of age determinations of monodontids and other marine mammals

Methods for confirming the accuracy of age determination methods are reasonably well established in fishes, but the millions of routine age determinations which take place every year require their own quality control protocols. In contrast, methods for ensuring accuracy in age determination of monodontids and other marine mammals are still being developed. Here we review the basis and application of bomb radiocarbon to marine mammal age validation, highlighting its value for providing unambiguous estimates of age for belugas and other long-lived animals which form growth bands. Bomb radiocarbon is particularly useful for marine mammals, given that the age of an individual animal can be determined to within ±1-3 years, as long as it was alive during the 1960s. However, ongoing age determinations require careful monitoring to ensure that age interpretations remain consistent across ages and through time. Quality control protocols using reference collections of ageing material, in conjunction with age bias plots and measures of precision, are capable of detecting virtually all of the systematic ageing errors that often occur once age determinations of an animal become routine.

marine mammals in general, and monodontids in particular, has tended to lag well behind that of teleosts.Ages of monodontids such as the beluga (Delphinapterus leucas) have typically been estimated by counting growth layers or bands (often referred to as growth layer groups, or GLGs) in longitudinal sections of the teeth.The formation of annual growth bands is now known to be ubiquitous among vertebrates (Campana 2001).However, the deposition rate of growth bands in beluga teeth was previously interpreted as semi-annual, with two GLGs representing one year of growth (Sergeant 1959).Subsequent study of beluga growth sometimes suggested annual GLG formation in the teeth, but was more often inconclusive, whether based on teeth from wild-born beluga held in captivity (Brodie 1982, Heide-Jørgensen et al. 1994, Hohn and Lockyer 2001) or the use of tetracycline marks as a dated chemical marker (Johnston et al. 1987, Brodie et al. 1990, Hohn and Lockyer 2001).It wasn't until the development of bomb radiocarbon as a dating tool that conclusive demonstration of annual growth band formation in beluga teeth became possible (Stewart et al. 2006).
In this paper, we begin by briefly reviewing the most plausible options for confirming (or validating) the age of belugas and other marine mammals.We then describe the use of bomb radiocarbon as a powerful tool for validating the age of wild, long-lived marine mammals, including belugas.We conclude by summarizing the steps required to ensure continued accuracy and precision in the age determination of monodontids and other marine mammals, as drawn from the extensive experience and history of fish ageing laboratories.

AGE VALIDATION METHODS FOR ENSURING ACCURACY
There are many high-profile examples in the scientific literature of age readers with years of experience, and the ability to provide extremely consistent replicate age readings, that were subsequently shown to be incorrect (Campana 2001).Consistency implies excellent precision, and although precision is admirable and is often the sign of a good age reader, it is all too possible to be consistently inaccurate (Svedang et al. 1998).Consistency does not imply accuracy.Accuracy indicates the reader is providing the correct age (on average), even if the readings are not particularly precise.Assessing accuracy usually requires some objective and independent means of determining the age of an organism.The process of confirming that accuracy is called age validation (Beamish and McFarlane 1983).
A variety of methods exist through which age interpretations can be validated.Although the distinction has often been blurred in the literature, methods can be classified as either validating absolute age, validating the periodicity of growth increment formation, or of corroborating (but not validating) an existing set of age estimates.Campana (2001) provides a complete review of age validation methods, some of which are further discussed here as applicable to marine mammals and specifically monodontids.
Age validation methods suitable for marine mammals can be ranked in descending order of rigor as: 1) release of known age and marked animals into the wild; 2) bomb radiocarbon dating; 3) mark-recapture of chemicallytagged wild animals (typically older adults of unknown age); 4) ageing of discrete length modes; 5) marginal increment analysis; and 6) rearing in captivity.A complete discussion of each of these methods is presented elsewhere in this volume; however, it is only the first three approaches that provide rigorous confirmation of age interpretation from marine mammalian teeth.As an example of the efficacy of recapturing known-age mammals, harp seals (Pagophilus groenlandicus) that were tagged as pups provided robust age determinations because the age of the animal was known without error (Frie et al. 2012).Bomb radiocarbon assays have also provided age estimates for individual belugas, but with an error on the order of 2 years (Stewart et al. 2006).Bomb radiocarbon dating in general, and with beluga in particular, is discussed in more detail below.In contrast, recapture of chemically-tagged animals can accurately validate the formation of annual growth bands formed after chemical tagging (i.e.Lieberman 1993), but cannot validate the full age of the animal; for example, recapture after two years would allow the validation of only the two outermost growth bands, and only when those bands can be readily distinguished from the margin.Annual growth bands prior to the first capture mark must be assumed.The remaining age validation methods listed above (4-6) are theoretically applicable to marine mammals, but have never been applied effectively.In particular, examination of teeth from belugas reared in captivity is of questionable value, given that captive animals lack annual migratory movement and would likely exhibit different growth patterns relative to those in the wild.

Bomb radiocarbon for age validation
Bomb radiocarbon derived from atmospheric nuclear testing provides one of the best age validation approaches available for virtually any long-lived organism that forms growth bands, whether it be in trees (Worbes and Junk 1989), fish (Campana 1997), sharks (Francis et al. 2007), bivalves (Weidman and Jones 1993), corals (Druffel and Linick 1978), humans (Spalding et al. 2005) or marine mammals (Tauber 1979, Bada et al. 1987, Stewart et al. 2006).The onset of nuclear testing in the mid-1950s resulted in an abrupt increase in atmospheric 14 C, which was soon incorporated into all organisms that were growing at the time.Thus the period is analogous to a large-scale chemical tagging experiment, wherein all growth bands deposited before 1956 contain only natural, low-level 14 C levels and all those formed after about 1968 contain elevated levels (up to 2 times natural 14 C levels).This measure of change in 14 C levels is typically presented as delta 14 C ( 14 C; Stuiver and Polach 1977).Growth bands formed in the transition period (typically 1956 to 1968) contain intermediate and increasing  14 C levels.As a result, the interpretation of the annual  14 C time sequence (chronology) in growth bands isolated from tooth sections is relatively simple; the growth band  14 C chronology spanning the period of about 1956-1968 should match other regional  14 C reference chronologies, as long as the age assignments based on growth band counts (= years of growth band formation) are correct.Any under-ageing would phase shift the growth band 14 C chronology towards more recent years, while over-ageing would phase shift it towards earlier years (producing an apparent bomb 14 C signal prior to the start of any nuclear testing).
In the case of beluga teeth and bomb radiocarbon, the assumption that two growth bands were formed each year led to a 30-year offset of the  14 C chronology relative to the regional  14 C reference chronology.In contrast, the interpretation of one growth band per year and an age of 60 years led to an excellent correspondence between the tooth chronology and the reference chronology, thus validating the age of 60 years (Stewart et al. 2006).Such a shift was easily detected.Sample contamination with material of more recent origin can only increase the  14 C value, not decrease it.Thus the growth band  14 C value sets a minimum age to the sample, and the years 1956-1968 become the most sensitive years for  14 C-based ageing.Hence, for marine mammals born during this period, bomb radiocarbon dating can be used to confirm the accuracy of more traditional ageing approaches with an accuracy of approximately  1-3 years; the discriminatory power of samples from organisms born before or after this period is more than an order of magnitude lower because of the loss of time specificity (there is little change in natural radiocarbon levels prior to nuclear testing, while post-bomb values tend to remain high for decades) (Campana 2001).
An additional consideration for using bomb radiocarbon dating to validate age is the use of an appropriate environmental  14 C record as a temporal reference.Because the 14 C signal recorded in deep-sea and freshwater environments can be different from that of surface marine waters (deep-sea = delayed and attenuated; freshwater = advanced and enhanced; Broecker and Peng 1982, Campana and Jones 1998), reference 14 C chronologies appropriate to the organism's environment during the period of growth band formation must be used.In addition, growth bands from non-carbonate materials (such as teeth and vertebrae) tend to be lagged by 1-2 years relative to carbonate chronologies (such as those from otoliths), because carbonate structures are usually formed from carbon taken directly from the water (in the form of dissolved inorganic carbon), whereas teeth and vertebrae obtain their carbon from food eaten by the host animal.However, the radiocarbon chronologies are usually not species-specific, which means that reference chronologies from one area and species can often be applied to another area and species.Indeed, there is relatively little variation in the timing of the initial increase in known-age (i.e.reference) 14 C chronologies throughout the world (Fig. 1).The global variation in 14 C chronologies that does exist is largely confined to the period after the  14 C rise, and can be attributed to different water mixing rates.That is, water masses in which surface waters (exposed to the atmospheric bomb 14 C signal) mix rapidly with deep, 14 Cdepleted water masses (not exposed to the bomb signal), tend to show a postbomb peak that is considerably lower and more variable (e.g.Andrews et al. 2013) than water masses where no such dilution occurs ( e.g.Andrews et al. 2011).However, the year in which the bomb signal first becomes apparent (e.g.>10% above pre-bomb levels (Campana et al. 2008) or the first sample to exceed 2 SD of the pre-bomb mean (Kerr et al. 2006) is relatively consistent around the world in stratified waters, and thus serves as a very stable dated marker (Fig. 1).For this reason, the year of initial increase is usually considered to be the most important signal in a radiocarbon chronology; peak 14 C values tend to be far less informative and typically reflect atmospheric diffusion and water-mixing rates after the bomb signal first appeared in the marine environment.
In general, the bomb radiocarbon method for age validation is not well suited to studies of short-lived (< 5 yr) species (but see Melvin and Campana 2010), or when the presumed birth dates do not span the period prior to the 1960s, or in environments where appropriate reference chronologies are not available (such as the south polar region).On the other hand, the low radioactive decay rate of 14 C (half-life of 5730 yr) indicates that both archived and recent collections are appropriate for a  14 C assay.For example, a sample from the innermost growth band of a 50-yr old beluga tooth collected in 2010 would be just as ideal for age validation as that of the innermost growth band of an archived 10-yr old beluga tooth collected in 1970.Bomb radiocarbon assays of marine mammal teeth have an advantage over those of teleost otoliths, in that individual growth bands are often large, and thus can be sampled through micromilling in sufficient weight from any location in the growth sequence, and need not be restricted to the innermost growth band.In addition, radiocarbon chronologies can be Fig. 1.A common characteristic of bomb radiocarbon chronologies from all over the world is the nearly synchronous increase from background levels around 1956 (vertical line).Thus the year of initial increase in radiocarbon serves as a dated marker in all growth band sequences.In contrast, the asymptotic (post-bomb) radiocarbon level can vary widely among water bodies due to differences in water mixing (= dilution) rate, and thus is not particularly useful as a marker.Fitted line represents LOESS curve.developed from assays of multiple growth bands from a single individual (e.g.Stewart et al. 2006).
Age validation studies based on bomb radiocarbon dating are often more rapid and equally cost-effective compared to alternative methods.Despite the fact that individual assays are expensive ($500-$1000 per sample, not including the cost of growth band extraction), relatively few samples are required for age validation and processing time is measured in weeks rather than years.In contrast, validation studies using, for example, chemical tagrecapture often require substantial tagging logistics and expenses, followed by one to several years of recapture effort and rewards to fishermen.
To elaborate on how bomb radiocarbon was used to validate the age of an individual beluga whale tooth, a tooth that was sectioned and aged using conventional methods is provided to contrast age estimation methods (Fig. 2).The initial assumption was that two GLG's were formed each year (Sergeant 1959).The alternate assumption, and one that is more consistent with modern science, is that one GLG forms per year (Stewart et al. 2006).By counting the GLGs back from the growing edge of the tooth (corresponding to the year of death), a presumed year of formation can be assigned to each GLG (or pair of GLGs), corresponding to a year of formation.The radiocarbon assay from any single GLG can then be plotted using the year of formation (as determined from the GLG count) on the X axis.Normally, this would be done with several assays from the same or multiple teeth, so as to span a multi-year period.If the resulting time series matches that of the known-age (reference) chronology from the same region, the age and year assignments based on the GLGs must have been correct.More specifically, if the initial year of increase in the tooth chronology is similar to that of the reference chronology, the GLG interpretation must have been correct.In this example (Fig. 2B), the tooth chronology starts to increase around 1958, very soon after it increased in the fish-and coral-based reference chronology, indicating that the single GLG interpretation was correct.In contrast, GLG interpretations assuming two bands per year would produce an age half that of the original GLG interpretation, and a corresponding shift in the presumed year of formation for each pair of GLGs.When the radiocarbon assays from each GLG pair are then plotted with their new presumed year of formation, there is a huge shift in the tooth chronology, such that it appears to begin to increase in the mid-1970s (Fig. 2C).Clearly, the correspondence between the tooth chronology based on 2 GLGs per year and the reference chronology was unacceptably poor, thus rejecting the hypothesis that two GLGs form each year.Although there are statistical tests available to confirm the superior fit of one 14 C chronology over another (Francis et al. 2010), visual comparisons of fits such as those in Figure 1 are usually fairly obvious.

REFERENCE COLLECTIONS AND QUALITY CONTROL OF AGE READINGS
There are four steps leading to the development and continued success of an ageing program: 1) development of an ageing method; 2) age validation; 3) preparation of a reference collection; and 4) quality control (QC) monitoring.In the case of beluga ageing, the use of tooth longitudinal sections is the ageing method, while bomb radiocarbon dating was the method used to validate the accuracy of the method.The latter two items-the reference collection and the quality control monitoring-are the steps required to ensure that subsequent ageing of belugas remains accurate, and that tooth interpretations do not change over time, perhaps as a result of a change in personnel who age them.More specifically, quality control monitoring can track ageing consistency through time, under the previously tested and confirmed assumption that the method is accurate.As noted by Campana (2001), the monitoring process ensures 1) that the age interpretations of individual age readers do not 'drift' through time, introducing bias relative to earlier determinations; and 2) that the age interpretations by different readers are comparable.Such a protocol monitors both relative accuracy and precision at regular intervals, and is completely analogous to quality control protocols in a manufacturing process.Integral to the quality control process is the reference collection and two statistical monitoring tools discussed later: age bias plots and the CV.
Reference collections of ageing structures are important elements of an ongoing ageing program.Ideally, a reference collection is a group of prepared and aged structures of known or consensus-derived ages and representative of all factors that might reasonably be expected to influence the appearance or relative size of the growth bands.A list of such factors might include all combinations of age, sex, season, and source of collection, spanning the entire organism length range, a representative sample of the geographic range, and several collection years.The primary role of the reference collection is to monitor ageing consistency over both the short and long term, as well as among age readers.The collection is particularly important for tests that may reveal long-term drift in age interpretation, something that cannot be detected through simple re-ageing of samples from the previous year, or through use of a secondary age reader.A second role of the reference collection is for training purposes; a representative subsample of the collection can be imaged and annotated, thus simplifying the training of new age readers and ensuring consistency in the type of structures which are interpreted as growth bands.It is important to note that ideal reference collections are rare.It is far more important to have something-anything-than to have nothing.A collection of 200 teeth or other ageing structures is good, and can be added to through time, but again, a few dozen tooth section images is better than nothing.
Once assembled, the reference collection can be sent out for ageing as part of an exchange program, either physically or in the form of digital images.The preparation of digital images ensures long-term availability, facilitates exchanges with other laboratories, and simplifies the training of new age readers (see NAMMCO 2012 for caveats and recommendations).The use of annotated digital image 'layers' (sensu Photoshop), which can be toggled off and on, allows the image to be interpreted with or without the annotation and facilitates blind age comparisons in training exercises.
For quality control monitoring, a subsample of the reference collection is intermixed with a subsample of recently-aged samples (a production ageing subsample) and is then aged without the age reader knowing which samples come from the reference collection.An age bias graph comparing test versus reference ages for the reference structures would confirm long-term ageing consistency, while a separate age bias graph comparing test versus original ages for the production subsample would insure consistency between the most recent production run and the QC test.If both tests indicate lack of bias, the same ageing criteria must have been used for both reference and production samples.The CV of original and new ages provides the measure of ageing precision.The combination of the age bias graphs and CV is sufficient to detect almost all sources of ageing error.
The age bias plot is the primary tool for assessing bias, which is defined as a systematic difference between two age readers or ageing methods.It is the ideal tool for detecting under-or over-ageing of one age reader relative to another, even if the ageing error is restricted to the youngest or oldest animals (Fig. 3).Ideally, age bias plots are prepared when the age reader compares current readings against a known-age reference collection.When this is done, the age bias plot becomes a check on the accuracy of the age reader.When known ages are not available, the most reliable set of ages is used as the reference age on the X axis.However, only relative accuracy is being assessed at this point, since neither set of age readings is known to be correct.When two age readers are being compared, or when the comparison is between two ageing structures or methods, the age bias plot can only reveal a systematic difference.For example, Reader 2 in Figure 3B is under-ageing specimens relative to the reference age, but it is possible that Reader 2 is correct and that the reference age is too high.
It is important to note that the interpretation of the age bias plot is in terms of overall patterns, not at an individual age.The intent is to detect consistent deviations from the 1:1 line, not to examine the deviation of any one age from the line.For example, in Figure 3A there are some points that lie above the 1:1 line and some points that lie below it; some mean values are statistically different from the 1:1 line (as indicated by the extent of the error bar), and others that are not.But there is no overall pattern, and therefore no bias.Conversely, there may be a trend.In the example of Figure 3B, several continuous ages (ages 4-6) are about 1 yr above the line, and ages 8 and older are all increasingly below the line.This is a more serious type of bias, since it indicates that one of the two age readers has changed their age interpretation criteria relative to the other age reader.The easiest type of age bias to deal with occurs when a reader (Reader 3 in Fig. 3C) consistently counts 1-2 extra growth bands relative to the reference age.This type of bias usually occurs when one reader is counting the edge (or a first annulus) and the other reader is not.A brief comparison of annotated images is usually sufficient to remove of this type of bias.
Ageing precision refers to the reproducibility or consistency of repeated age determinations on a given structure, whether or not those age readings are accurate.It is not unusual for inaccurate age readings to be highly reproducible (in other words, precisely wrong).Therefore, precision cannot be used as a proxy for accuracy.Nevertheless, a measure of precision is a valuable means of assessing the relative ease of estimating the age of a particular structure, of assessing the reproducibility of an individual's age determinations, or of comparing the skill level of one age reader relative to that of others.
There are two widely used and statistically robust measures of ageing precision: 1) average percent error (APE), and 2) coefficient of variation (CV).Although percent agreement is the traditional index of ageing precision, it varies widely both among species and among ages within a species.For example, 90% agreement to within one year between two age readers would represent poor precision if there were only 3 year classes in the population.In contrast, 90% agreement to within one year would represent excellent precision for beluga, given its 60-yr longevity.Therefore, there is little reason to recommend the use of percent agreement when more robust and easilycalculated measures of precision are readily available.
The coefficient of variation (CV), expressed as the ratio of the standard deviation over the mean, is the most widely used measure of precision, and can be written as: where CVj is the age precision estimate for the jth animal.The CV is calculated across all age readings for each animal, and is usually averaged across animals to produce a mean CV.
The average percent error (APE), is defined as: where Xij is the ith age determination of the jth animal, Xj is the mean age estimate of the jth animal, and R is the number of times each animal is aged.When averaged across many animals, it becomes an index of average percent error.
CV and APE are mathematically related, with CV being about 40% higher than APE for any given set of ageing data (Campana 2001).All measures of precision will be artificially inflated by any bias which exists among readers, implying that bias should be dealt with before calculating precision.There is no single value of precision that can be used as a target level for ageing studies, but a CV of 5% is often used for otolith studies.CV values of more than 10% are common in studies reporting shark ages based on vertebrae (Goldman et al. 2012).

CONCLUSION
Methods for confirming the accuracy of age determination methods for monodontids and other marine mammals are more limited than those for fishes, in part because of the markedly lower numbers of animals that are available for study.However, the use of bomb radiocarbon is particularly well suited for monodontids, given their extended lifespan.In addition, the relatively large size of the growth bands in sectioned teeth or narwhal tusks makes them amenable for assaying individual growth bands, allowing individual animals to be aged with great accuracy, as long as they were alive during the 1960s.In light of the ongoing development of ageing methods for marine mammals, age validation using methods such as bomb radiocarbon dating will be required before the accuracy of the ages can be broadly accepted.
Once monodontid age determinations become commonplace, careful monitoring will be required to ensure that age interpretations remain consistent across readers and through time.Quality control protocols using reference collections of ageing material, in conjunction with age bias plots and measures of precision, are capable of detecting virtually all of the systematic ageing errors that often occur once age determinations of an animal become routine.Use of age quality control protocols helps ensure that any observed changes in monodontid population age structure or size at age are due to real changes in the population, as opposed to artefacts resulting from changes in age interpretation.

ACKNOWLEDGEMENTS
We thank Allen Andrews for his helpful comments on the MS.

LITERATURE CITED
Andrews AH, Kalish JM, Newman SJ, and Johnston JM (2011) Bomb radiocarbon dating of three important reef-fish species using Indo-Pacific ∆

Fig. 2 .
Fig. 2. (A) Longitudinal section of a beluga tooth showing presumed annual growth bands.Dates of formation can be determined by counting the growth bands in from the growing edge (right or pulp edge in this image), which corresponds to the year of collection.(B) Bomb radiocarbon assays of individual growth bands in the tooth shown begin to increase around 1958 (vertical dashed line) in synchrony with the marine reference chronology (solid line) if the growth bands have been aged and counted correctly (i.e. one GLG per year), but (C) are greatly offset from the marine reference chronology if the growth bands have been interpreted as forming twice per year (2 GLG per year).Years shown in this example are hypothetical.

Fig. 3 .
Fig. 3.Examples of age bias plots where no bias is present (A) and where bias is present (B and C).In (B), Reader 2 has over-aged ages 4-6 but under-aged ages 8-10.In (C), Reader 3 has consistently counted 1-2 extra growth bands compared to the Reference age.Each error bar represents the 95% confidence interval about the mean age assigned to all samples of a given age by a second age reader (Reference age, known or assumed to be correct).The 1:1 equivalence (solid line) is also indicated.Numbers plotted below symbols are the sample size at each age.