Density surface fitting to estimate the abundance of humpback whales based on the NASS-95 and NASS-2001 aerial and shipboard surveys

Generalized additive models (GAMs) with spatially referenced covariates were fitted to data collected during the 1995 and 2001 Icelandic (shipboard and aerial) and Faroese (shipboard only) components of the North Atlantic Sightings Surveys (NASS-95 and NASS-2001). The shipboard surveys extended from the east coast of Greenland, around Iceland, down to an area along the west coast of Ireland (in 1995) and to the north of the United Kingdom (in 2001). In contrast, the aerial surveys were limited to Icelandic coastal waters only. The aim of the analysis was to predict density, and hence abundance, of humpback whales throughout the survey regions and also to establish if there was any evidence that humpback whale density was related to sea surface temperature or depth. Fitting GAMs to the 1995 data proved problematic and so various subsets of the data were used in an attempt to improve the model fitting. Such difficulties did not occur with the 2001 data. Confidence intervals (CIs) for the abundance estimates were estimated using bootstrap sampling methods. The estimated humpback whale abundance for the region covered by the aerial and shipboard surveys in 1995 was 10,521 (95% CI: 3,716–24,636) using all available data and 7,625 (3,641– 22,424) if survey blocks with 0 sightings around the Faroes and south of 60 ̊ N where no humpback whales were detected were excluded from the analysis. The estimate for the total survey region in 2001 was 14,662 (9,441–29,879). The high upper bounds of the confidence intervals were thought to be caused by a paucity of effort over wide areas of the survey leading to interpolation. Overall, the uncertainty associated with these abundance estimates was approximately equal to, or greater than, that associated with a stratified distance analysis. Given these wide CIs the evidence for a substantial difference in abundance between years was equivocal. However there was evidence to suggest that humpback whales congregated in shallower waters between 6–8 ̊C. Paxton, C.G.M., Burt, M.L., Hedley, S.L., Víkingsson, G.A., Gunnlaugsson, Th. and Desportes, G. 2009. Density surface fitting to estimate the abundance of humpback whales based on the NASS-95 and NASS-2001 aerial and shipboard surveys. NAMMCO Sci. Publ. 7:143-159.


INTRODUCTION
Humpback whales (Megaptera novaeangliae) gained initial protection in 1944 (Jefferson et al. 1993) and full protection in the North Atlantic and worldwide after the 1956 and the 1965/1966 whaling seasons respectively (Tøn-nessen and Johnsen 1982).There has been some debate on their rate of recovery since then (Lubick 2003) much of which revolves around estimates of the size of the pre exploitation population (e.g.Roman andPalumbi 2003, Mitchell andReeves 1983).Estimates for the size of the population in the North Atlantic include 3,500-10,700 (estimated from photo identification: Balcomb andBreiwick 1984, Balcomb et al. 1986) and between 9,400-16,400 adults (from photo identification and biopsy, Smith et al. 1999).The International Whaling Commission's current estimate of western North Atlantic humpback whale abundance is 11,570 (http://www.iwcoffice.org/conservation/estimate.htm#table).Estimates for the population in the late summer and autumn feeding areas (i.e.Norway, Iceland, Greenland, New England and Canada) include Pike et al. (2009Pike et al. ( : 4,928 in 2001, coastal waters about Iceland) and Smith et al. (1999: between 7,100-8,100 adults).All of the estimates of the last decade are considerably higher than those of Balcomb et al. (1986) for 1978 and 1979 (3,503 and 5,005 respectively) albeit not always significantly so.However, no previous work has attempted to combine shipboard and aerial data into 1 analysis.
Humpback whale sightings during both the aerial survey and the shipboard survey of the North Atlantic Sightings Survey in 1995 (NASS-95) occurred in distinct aggregations (Figs 1a and 2a), and were particularly concentrated off north-east Iceland.Broadly similar distribution patterns were encountered during NASS 2001 (Figs 1b and 2b).Other analyses of these data Pike et al. (2009: aerial 1995& 2001, and Pike et al. MS 2002: shipboard 1995) estimated abundance using standard distance sampling methods (Buckland et al. 2001).These stand-ard methods resulted in abundance estimates with high variance.Fitting a density surface to geographic covariates has the potential to increase the precision of the abundance estimate, and furthermore, allow elucidation of an ecological interpretation of their distribution (NAMMCO 2003).Data from earlier NASS were not used because of a paucity of data.
Estimates from aerial surveys tend to be negatively biased because of diving whales being missed: temporarily submerged whales are more likely to be seen from a ship because a ship travels more slowly than a plane, thus allowing more time for the animal to return to the surface.This is known as availability bias.One potential way to correct for this bias is to compare the densities from the shipboard survey with the aerial survey.Thus, estimates obtained from the shipboard survey data can be used to correct estimates made from the aerial survey.Burt et al. (MS 2003) fitted generalised additive models (GAMs) to the aerial and shipboard data (from 1995 and 2001) separately, and they then compared estimates over the survey regions covered by both modes of survey (i.e.shipboard or aerial) to obtain an approximate correction factor for the aerial survey.However, Burt et al. (MS 2003) did not combine the data from both survey modes in a single analysis where survey mode could be considered as a discrete factor.Here we combine data from both survey modes in a single analysis and treat survey mode as an additional explanatory variable in the model.In this analysis, the 1995 and 2001 data (initially analysed assuming g(0)=1, using separate single platform multiple covariate distance sampling (MCDS, Marques MS 2001) for each survey mode) were generated into spatially referenced density data using the 'count' variant of methods developed by Hedley et al. (1999).
The aerial and shipboard density data were then combined to form a single generalised additive model (GAM) for each year to estimate humpback whale density.This analytical framework allowed the influence of 2 environmental explanatory variables (sea surface temperature and depth) on humpback whale density to be considered as well as spatial reference to latitude and longitude.Finally, all data were combined in order to look at the relative effect of survey year in the GAM as well as depth and sea surface temperature in the absence of latitude and longitude.

Aerial surveys
The aerial surveys covered Icelandic coastal waters and the same survey design was used in 1995 and 2001, except that in 2001 the survey area was extended from 11° W to 10° W (Fig. 1).The same aeroplane and pilot were used in both years and 2 primary observers searched through bubble windows.The cruise leader and pilot acted as secondary observers and the cruise leader recorded environmental conditions.The survey was mainly conducted in passing mode but some sightings were closed on to confirm species.For more details, see Pike et al. (2009).

Shipboard surveys
The shipboard surveys, conducted in the eastern and central North Atlantic, used 3 vessels in 1995 and 4 vessels in 2001 (Table 2, Víkingsson et al. 2009).In 1995, humpback whales were only sighted from 2 of the ships; the Strákur and the Árni Friđriksson (Pike et al. MS 2002).The survey boundaries for the 2 years were substantially different: in 1995, the survey boundaries extended eastwards from 42.25° W to 5° W and northwards from 52° N to 72.75° N (Fig. 2a); and in 2001, the survey area extended south-westerly and north-easterly, but excluding the area extending westwards off the Irish coast (Fig. 2b).Unlike 1995, when all vessels were dedicated cetacean sightings survey ships, the 2001 survey included effort from 2 vessels primarily conducting a redfish survey in addition to 2 dedicated cetacean survey vessels.
The data from the 1995 and 2001 shipboard surveys were collected with a view to conducting a conventional design based line transect analysis (e.g.Buckland et al. 2001).Although the transects from the redfish survey (from the 2 named vessels above) were not designed for the purpose of estimating whale density, they do achieve a spatial coverage that would be expected to be adequate for estimating whale density from a model based approach.For more details for 2001, see Víkingsson et al. (MS 2002).

Detection function estimation
In conventional distance sampling the probability of detection depends only on the perpendicular distance of the sighting to the transect and at 0 perpendicular distance is assumed to be 1 (denoted by g(0) = 1).In this analysis, the effects of covariates, other than perpendicular distance, were incorporated into the detection function model.This was achieved by setting the scale parameter in the model to be an exponential function of the covariates (Marques MS 2001).Thus the probability of detection be-comes a multivariate function, g(y,ν), representing the probability of detection at perpendicular distance y and covariates ν (ν = ν 1 ,..,ν Q where Q is the number of covariates).Using either a hazard rate or half normal detection function, the covariates were incorporated via the scale term, σ, where for sighting k, σ has the form: (1) where β 0 and β q (q=1,…,Q) are parameters to be estimated.With this formulation, it is assumed that the covariates may affect the rate at which detection probability decreases as a function of distance, but not the shape of the detection function.
A stepwise forward selection procedure was used (starting with a model containing per-  2002), then the final selected models were re fitted using a set of customized functions in R (Ihaka & Gentleman 1996).This facilitated estimation of variance within R-(see below).

Modelling whale density
There was no evidence of a trend in pod size over the survey area in 1995 (using a GAM with logarithm of pod size as the dependent variable and longitude and latitude as explanatory variables, P=0.89.Note that such a model cannot be used for prediction.In 2001, there was a significant trend (P<0.01) in pod size with longitude (larger pods occurring to the east of the survey region) but longitude only explained 3% of the deviance (the difference in log likelihood between the model under consideration and a full model consisting of as many parameters as data (Dobson 1990)) and this may have been an artefact of the scarcity of sightings to the far west and so was not considered further.
As there was no consistent evidence of a trend in pod size over the survey area (Burt et al. MS 2003), a modified version of the 'count model' of Hedley et al. (1999) was used to model the trend in spatial distribution of humpback whale pods.Transects covered during the survey were divided into small sections, or segments, such that the sighting and geographic conditions did not change considerably within a segment.The response variable for the model was calculated from the estimated number of individuals in each segment, , where the subscript denotes the i th segment.This was calculated using an estimator similar to the Horvitz Thompson estimator (Horvitz and Thompson 1952), as follows: (2) where, for segment i, is the estimated probability of detection of the j th detected pod, n i is the number of detected pods in the segment and s j is the size of the j th pod.The total number of transect segments is denoted by T. By assumption, p(y), the probability density function of actual perpendicular distances is uniform up to the truncation distance.This is satisfied provided transects are randomly located with respect to the distribution of whales.
Having obtained the estimated number of individuals in each segment, the density, , was simply given by where a i is the area of segment i. Segment area was calculated as the length of the segment multiplied by twice the truncation distance used to model the detection function.There is no objective way of choosing segment length and so a variety of segment lengths were tried in the range of 5-13 km.Eventually 9 km was selected as an appropriate compromise be- tween maximising the ratio of non zero to zero segments, maintaining environmental resolution and giving some measure of spatial independence (see results).Density was then modelled as a function of K spatially referenced covariates, z, using a GAM with a square root link function: (3) where θ 0 is the intercept parameter and the f k are one dimensional smooth functions (thin plate regression splines) of the K spatial covariates.The parameter M was a factor variable which indicated whether the survey mode was a ship or a plane.The degrees of freedom of the smooth functions control the 'wiggliness' of the smooth and were estimated from the data.Two way interactions between the spatial covariates were also considered for inclusion in the model via two dimensional smooths (Wood 2003).In the context of this analysis a two dimensional (2D) smooth of latitude and longitude may be better at explaining spatial variation than the sum of the one dimensional (1D) effects.
A variety of statistical families were considered to model the distribution of .The estimated number of pods in each segment did not follow a typical exponential family distribution, such as the Poisson distribution, and quasi-likelihood methods were more appropriate to fit the model.Quasi-likelihood methods do not require a particular error distribution to be assumed, but the general form of mean variance relationship of the observations must be established.In this case it was assumed the data were over dispersed.
The covariates considered in the analysis were longitude (Lon) and latitude (Lat), sea surface temperature (SST) and depth (Depth).Sea surface temperatures were obtained from the National Oceanic and Atmospheric Administration (NOAA) and were an updated set based on the analysis of Reynolds et al. (2002) at 1 degree and weekly resolution.Depths were obtained from the ETOPO5 5 minute resolution relief data available from NOAA (http://www.ngdc.noaa.gov/mgg/global/seltopo.html).Temperatures and depths were associated to effort segments by finding the closest point in the temperature and bathymetry data to the midpoint of the effort segments using great circle distances (and additionally, time for temperature).Maps of depths and temperatures over the 2 survey periods are shown in Fig. 3. Generalised cross validation (GCV) implemented in the mgcv package (v.1.1 7, Wood 2001) in R (v. 1.9.1) was used for covariate selection, augmented with diagnostic plots, using the principles described in Wood ( 2001) with the additional criterion that a term must explain an additional 4% of the deviance given other variables in the model.All covariates were considered for inclusion in the model as 1D smooths of untransformed covariate values and also as the logarithm of the covariate.In addition, 2D smooths of Lat and Lon and SST and Depth were considered for inclusion into the GAM.Initially a maximum of 6 degrees of freedom (7 knots) were allowed in the selection of 1D smooths and up to 13 degrees of freedom (14 knots) were allowed in the case of 2D smooths, thus allowing moderate flexibility but reducing the likelihood of spurious overfitting (fitting unnecessarily complicated functions) especially as GCV has been shown to cause overfitting (Mackenzie and Walker, submitted).Survey mode (Mode) was included in all models.Due to gaps in search effort along transects, effort could not always be split into segments of the desired length (see later).Therefore, the size of each segment varied and so the model was weighted by segment area.The final model was used to predict density of humpback whales throughout the survey regions.Whale abundance was estimated by numerically integrating under this predicted whale density surface.As survey mode was included in each model, abundance was predicted assuming shipboard survey mode.In the case of the 1995 data, the large number of segments with 0 sightings made fitting the GAM problematic and so several subsets of the data were considered.There were no sightings of humpbacks in the blocks surrounding the Faroe Islands, to the far south and southwest of Iceland (blocks Faroes, 7 and 4, see Fig. 2a), and so these were excluded from some of the analyses.
The subsets of data that were considered were 1. data were used in entirety, 2. blocks Faroes and 7 omitted, 3. blocks Faroes and 7 were omitted but some transects (close to the included blocks) from the omitted blocks were included to stabilise the GAM at the edge of the survey region, 4. blocks Faroes, 7 and 4 omitted, 5. blocks Faroes and 4 omitted, and 6. blocks Faroes and 4 omitted but some transects (close to included blocks) from the omitted blocks were included to stabilise the GAM.Density, and hence abundance, was not estimated in the omitted blocks; there were no sightings in these blocks and so were assumed to have 0 density.The number of segments and search effort are given in Table 1.In none of the cases were the distributional characteristics ideal and a small minority of the model fits to the bootstrap re-samples did not converge to a definitive set of parameters, potentially leading to an inflated estimate of variance (see below).
In addition to the analyses performed to estimate abundance, a further analysis was conducted which considered only survey mode, SST and Depth; latitude and longitude were explicitly excluded from the model as they may obscure any relationship between density and the environmental variables.The aim here was to produce models that explained rather than described the density surface.The most complete data set from the 1995 analysis (subset 1) was used in this analysis.Finally, all the data from both years were combined in order to look at the effect of year (Year), Mode, SST and Depth (the latter 2 covariates being considered both as 1D smooths and a single 2D smooth).

Variance estimation
The variance of the predicted abundance was estimated using a non-parametric bootstrap (Hedley and Buckland 2004, Efron andTibshirani 1993) which combined both components of the analysis; detection function estimation and modelling of whale density.Model selection (in terms of variable selection) was not part of the bootstrapping process but smooths with different degrees of smoothing could be selected up to the same maximum number of knots as used in the original model.The bootstrap samples were generated by repeatedly sampling 'transect legs', sampling with replacement.This assumed that the transect legs were independently and identically distributed.Legs were sampled within blocks so there was good coverage of the data within each bootstrap sample and the bootstrap sample contained the same number of transect legs in a block as the original data.
One thousand bootstrap samples were generated and for each, a density surface and total abundance was predicted.These abundance estimates were ordered according to size and the confidence interval (CI) was given by the 2.5% and 97.5% quantiles.
A disadvantage of using the non-parametric bootstrap in this way is that the bootstrap samples do not reproduce the spatial coverage of the original survey data.However, since there were a large number of transect legs (195 in 1995 and 206 in 2001) and transects were selected within blocks, bias from using this procedure was expected to be small.An alternative method of estimating variance would have been to use a parametric bootstrap (as proposed in Hedley et al. 1999), but as  currently implemented, their approach does not adequately capture the correlation between adjacent, or close, segments of transect, and thus appears to under estimate the true variance.
Many of the models fitted to the bootstrap samples did not converge because of the large number of segments with 0 sightings.Removing these abundance estimates may have biased the CI, thus they were retained providing a robust estimate of variation.Non-convergence was avoided to an extent by reducing the maximum degrees of freedom allowed for the smooth function as described above.However, some non-converging models fitted so badly that the estimated fitted deviance was stated as negative.Such replicates (1-3 per bootstrap sample) were typically associated with extremely high nonsensical estimates of abundance.The abundance estimates associated with these models were replaced by randomly choosing abundance estimates associated with models which had a positive explained deviance.

Aerial survey detection functions NASS-95
There were 89 sightings of humpback whales in total.For robust modelling of the detection function, a general rule of thumb is to discard approximately 10% of the sightings with furthest perpendicular distance (Buckland et al. 2001).However, because of a long tail in the distribution of perpendicular distances, sightings were truncated at a distance of 1,700 m (leaving 76 sightings) to ensure robustness during bootstrapping.For the rest of the 1995 analysis, aerial sightings beyond this distance were discarded.Often in aerial surveys, visibility directly below the aircraft is limited so that detection probability for whales surfacing on the transect is not certain.This did not appear to be the case for these data probably because the aircraft was installed with bubble windows.Covariates considered for inclusion in the model for the detection function were: Beaufort (as a factor with 2 levels; Beaufort 1 and 2); percentage of cloud cover; sightability (a 3 level factor with levels ranging from perfect; perfect to good; good to worse).Initially, glare was also considered but due to ambiguous interpretations of its effect, it was excluded.Both half-normal and hazard rate functional forms were considered, and forward stepwise model selection was by AIC (Akaike's Information Criterion), starting with a model which included perpendicular distance only.No covariate was found to have a strongly significant effect, and thus the final model selected included only perpendicular distance, with half-normal form (Fig. 4a).See Pike et al. (2009) for a further treatment of this data.

NASS 2001
Despite slightly less search effort in 2001 compared to 1995 there were 161 sightings of humpback whale pods in 2001, nearly twice as many as in 1995.The distribution of their perpendicular distances is shown in Fig. 4b using a truncation distance of 2,000 m (discarding 11% of sightings).Covariates considered for inclusion in the detection function model were: Beaufort Sea State (as a factor with 3 levels, 1, 2 and 3 or more); sightability (good and moderate/ poor); percentage cloud cover; and pod size (as a factor with 3 levels: 1, 2 and 3 or more).Ac-cording to AIC, the best models were those with either Beaufort or sightability included as covariates.However, in both cases, the estimated probability of detection increased as Beaufort increased or sightability decreased-contrary to expectation but possibly due to small sample sizes under the worst conditions.Therefore it was decided to fit a detection function model based on perpendicular distance only (Fig. 4b, red line)-the best model being half normal.

Shipboard survey detection functions NASS-95
There were 215 sightings of humpback pods within a perpendicular distance of 3,600 m.The histogram of perpendicular distances of these sightings is shown in Fig. 5a.Covariates considered for inclusion in the model for the detection function were: survey block; platform; cloud cover; pod size (as a factor with levels 1, 2 and 3 or more); vessel and Beaufort Sea State (considering it both as a continuous variable and as a factor with 2 levels-less than 3.5 and greater than 3.5).The model chosen was a half-normal with cloud cover and vessel included as covariates: the probability of detection increased in cloudier conditions, and was higher for the observation ship Strákur compared to the Árni Friđriksson.
The average fitted detection function across all the covariates is shown in Fig. 5a (red line).

NASS 2001
With a truncation distance of 3,600 m, 273 sightings of humpback pods remained, and their histogram of perpendicular distances is shown in Fig. 5b.Covariates considered for inclusion in the model for the detection function were as for 1995 except for survey block, plus 3 additional covariates: sightability, swell and visibility.No improvement in model fit was gained by the inclusion of covariates in the model.Thus, a hazard rate model including perpendicular distance only was selected.

Modelling of estimated density NASS-95
The large number of transects (and hence segments) with no sightings meant that the density data had awkward distributional properties.A modal segment length of 9 km was chosen as a compromise between a length that would represent the environmental characteristics of the raw data and a length that would lead to nominal spatial independence.Due to changes in survey effort and environmental conditions, not all segments had the same area the range in segment areas was 0.27 -97 km² with a median of 30 km².
For most of the 1995 subsets, a 2-dimensional smooth of latitude and longitude was selected along with survey mode (aerial or ship) (Table 2) and the total explained variance for all the models was between 30 and 34%.The point estimates were similar as were the 2.5% CI estimates.The largest differences between the subsets occurred in the 97.5% CI estimates.The best model, in terms of smallest CI, (Table 3a) were those analyses which omitted 2 of the blocks with no sightings but included a few transects to ensure the predictions were 'tied down' at the periphery of the prediction region (subsets 3 and 6).The abundance estimate based on the entire survey region was 10,521 (bootstrap CI: 3,716-24,636). Figure 6 gives the density surface.
Abundance based on an inner core region (i.e.assuming 0 density in 2 of the 4 blocks with no sightings) was 7,780 (3,503-21,521) or 8,002 (3,356-18,999) depending whether additional transects were used to tie down the GAMs.
To allow a direct comparison of humpback whale abundances across years, the 1995 data (subset 1) was used to estimate abundance over a common region which was surveyed by both the 1995 and 2001 survey.In addition, to approximate a design-based estimate, abundance in this common region was estimated using the mean density from the 1995 data-this is equivalent to fitting a flat density surface where the density is equal to the mean density throughout the survey.
The abundance in the common region for 1995 was 9,138 (4,562-17,511) animals based on a constant mean density and 9,810 (3,457) based on the spatially varying density model.
The explanatory models (excluding Lat and Lon as potential candidate covariates) fitted to the 1995 subsets all included Mode as well as SST and Depth as a 2D smooth.In some cases these models explained more of the deviance than the descriptive model (Table 2).However, they also had higher GCV scores associated with them and therefore the predictive models (i.e. the ones with Lat and Lon) were used to predict abundance.The explanatory models were not selected in the model selection procedure for the predictive model because of the huge contribution of Lat (approx.19% deviance depending on data used) to the explained deviance.Unsurprisingly Lat was closely correlated with SST (r = 0.93) therefore SST was not selected when Lat was a candidate variable.

NASS 2001
Again a modal segment length of 9 km was used.Some segments could not be amalgamated if environmental conditions changed and so again there was a wide range in segment areas.
The range in segment areas was 0.001 to 97 km², with a median of 42 km².The final descriptive model consisted of Mode and 1D smooths for Lon and Lat.Again the shipboard densities were typically twice that reported from the aerial survey.The total explained deviance was ca.32%.There was no evidence for any spatial autocorrelation in the residuals of the model.The predicted densities of humpback whales for 2001 are shown in Fig. 7.The densities were predicted using the survey region for 2001.The estimated abundance of humpback whales in the survey region was 14,662 (9,441-29,879, Table 3b).
Two additional abundance estimates were made using the model fitted to the 2001 data for the survey region common to 1995 and 2001 to allow comparison between the 2 surveys; using the mean density throughout the survey region, and by predicting density for the common region using the model fitted to the 2001 data.The predicted values for 2001 were higher than 1995 but there was no significant difference between them (Table 4).
The explanatory model for 2001 again consisted of Mode and a 2D smooth of Depth and SST (Table 2) explaining ca 27% of the deviance.Predicted abundances for whales using ship as the Mode, were about 1.5 times that when using aerial as a Mode.

Explanatory model for 1995 and 2001 data combined
To investigate the underlying factors influencing humpback distribution between the 2 years, the data for 1995 and 2001 were combined into a single model and all data, including data from blocks where no whales were seen, were included.The explanatory variables under consideration were 1D smooths of Depth and SST (also considered as a 2D smooth), with Mode and Year as factors.The model selected included Mode and a 2D smooth of Depth and SST.This model explained 23.6% of the deviance.The more complex model containing Year as a factor was associated with a higher GCV score but Year although significant explained only an additional 0.4% of the deviance.
The pattern of the responses, like the 2001 data alone, again suggested an optimal sea surface temperature of around 6-8˚ C.

Comparison with earlier studies
Table 5 provides a comparison of the estimates obtained in this paper with those of other analyses of the same data.The estimates were broadly similar but CIs were wider using the methods outlined in this paper.

DISCUSSION
The NASS aerial and shipboard surveys of 1995 and 2001 all recorded distinct clumped aggregations of humpback whales (Figs 1 and 2) with the greatest numbers to the east and, to a lesser extent, west of Iceland.This pattern has been reflected in the predicted densities (Figs 6 and  7).The analysis methods used in this paper have provided an alternative approach to estimating abundance, yielding estimates which are comparable to those previously obtained using design based estimation methods but with higher variances (see   et al. (2009) where a significant upward trend was shown.Whether this increase reflects migration into the survey region or an overall increase in numbers is unclear.However there have been consistent increases in abundance in this area since at least 1979 (Sigurjónsson and Gunnlaugsson 1990) which does imply a real increase has taken place, as presumably there is a limit on the number of individuals in a constant population that could move into the area.
Whilst the bootstrap procedure performed adequately to obtain 95% CI of the abundance estimates there is still a possibility of future improvement.Firstly, while latitude and longitude are a useful coordinate system for humans, they may not be an adequate proxy for unknown variables to explain whale density.However, the explanatory models based only on Mode, SST and Depth did not produce narrower confidence intervals.Secondly, extrapolation with highly flexible models such as GAMs, (see below) could have lead to the spurious results in the bootstrap estimates as well as, to a more limited extent, spurious interpolations.This was suggested by the minor narrower CIs of the models based on data which contained additional transects used to tie down the GAMs (Table 3).Thirdly, the whales may be too patch-ily distributed to be adequately modelled within a GAM framework more suited to smoothly changing surfaces.Using a 2-stage procedure where presence/absence is modelled first, and then density, if present may be an appropriate response but this can cause problems in that when densities are low, estimated segment numbers will often be falsely 0 leading to bias.
A number of unresolved technical issues remain when fitting the density surfaces, not least of which involves 'taming' the smooth functions at the edges of the survey region (to avoid extrapolation difficulties) and (to a lesser extent) in regions where there are gaps in coverage (when interpolating).The evidence here suggested that adding additional transects outside the prediction region can help.
Predictions made from GAMs can also be sensitive to the maximum number of degrees of freedom allowed for estimating the smooth function.Spatial correlation of errors was not considered in this analysis and dealing with such correlation in future may make the data more tractable.To date, little attention has been given to investigating how change in choice of segment length affects the estimation, and if so, how to select the optimal segment length.It is preferable to have segments that are sufficiently small such that expected density is unlikely to vary much within the segment (Hedley et al. 2004).However, because of the way in which the segments are defined (with the same environmental and detectability conditions) some variation in this is inevitable.In this analysis, the segment areas were used as weights in the model to account for this variation.Amalgamation of segments can make modelling more tractable and possibly increase spatial independence.
As currently implemented, GAMs produce variance estimates that compare approximately with those produced by stratified samples (compare Table 5).Nonetheless, estimating density in this framework does provide 1 substantial advantage over conventional line transect analysis in elucidating the geographic factors that may influence abundance.They also provide an objective assessment of whether there are differences over time by including year as a factor in the analysis even when the 95% CIs do not suggest any differences.The selected models allowed some elucidation of the habitat preferences of the humpback whales.The data suggested that humpback whales prefer shallower waters with an optimum of approximately 6-8˚C but both these variables may be surrogates for other variables not considered in the model.With a greater availability of potential explanatory variables GAMs (perhaps with spatially correlated errors) could allow elucidation of the habitat preferences of Megaptera.
That humpback whales are associated with shallower waters during feeding is well established (Jefferson et al. 1993).Again this may reflect optimal feeding locations rather than depth preference per se.The mild interaction with sea surface temperature is interesting but it remains to be established if this is an actual temperature/ depth preference or a correlation of temperature/depth with some other variable.Humpback whales' association with shallower water might suggest that sea surface temperature could directly influence their position unlike other deeper and longer diving species.However, there is no evidence that the ca 6˚C "preference" found here is universal; for example Calamboukis et al. (2004) found that feeding humpback whales off northern Washington were associated with an average water temperature of 14˚C (no formal statistical correlation was made however).
Humpback whales certainly shift their distribution in response to food availability (e.g.sand eels to herring in the Gulf of Maine, Weinrich et al. 1997) and habitat preferences may vary between individuals; for example Ernst and Rosenbaum (2003) found mother calf pairs preferred shallower water when over wintering off Madagascar, although patterns may be different on the feeding grounds.No strong evidence of an association of particular pod sizes was found here although the association with longitude found in 2001 was intriguing.
In conclusion GAMs, as currently implemented for large data sets, provide a useful way to investigate the distribution patterns of humpback whales in relation to environmental variables and can produce smaller confidence intervals than those found in conventional line transect analysis.

Fig. 1 .
Fig. 1.The realised survey effort (grey lines) and location of sightings (circles) for the aerial surveys in a) 1995 and b) 2001.Observed pod sizes are proportional to the area of the circles.

Fig. 2 .
Fig. 2. The realised shipboard survey effort (grey lines) and location of sightings (circles) for a) 1995 and b) 2001.Observed pod sizes are proportional to the area of the circles.Black lines indicate the ship survey blocks and the blue lines indicate the aerial survey region.

Fig. 5 .
Fig. 5. Overall perpendicular distance (km) distribution and detection function for a) NASS-95 shipboard survey and b) NASS-2001 shipboard survey.The red line is the average fitted detection function.In the case of figure a) this is averaged across cloud cover and vessel.Dashed black lines in a) indicate the detection functions for each vessel.

Fig. 6 .
Fig. 6.Estimated whale density for NASS-95 based on the combined aerial and ship surveys.Density estimates are on a scale 0 to 0.05 whales/km² (see bar).

Table 1a
and 1b.For each survey block, area in km², number of sightings after truncation (N), number of segments and realised search effort (L) in km for a) 1995 and b) 2001.Search effort is realised effort after discarding effort conducted in Beaufort Sea States greater than 6 and anomalous records.

Table 2 .
Selected terms in the GAMs.s(.) denotes inclusion of the covariate as a smooth function in the model.Initially, smooth functions were allowed 7 knots for 1D smooths and 14 knots for 2D smooths.'Abundance model' refers to the model used to estimate abundance whereas 'explanatory model' refers to the model used to examine the relationship between density and environmental variables.

Table 4 .
Comparison of the models, predicted mean densities (whales /km²) and 95% bootstrap CI for the survey region common to 1995 and 2001 surveys (size of common region is 1,725,204 km²) using models fitted to either the 1995 or 2001 survey data.s(.) denotes inclusion of the covariate as a smooth function in the model.

Table 5 .
Comparison of abundance estimates from this paper with estimates obtained using alternative methods.