T HE SECOND LANGUAGE DEVELOPMENT OF PAST PERFECTIVE FORMS IN S PANISH

effects across course levels to better ascertain the areas of development in learner selections. Findings indicate that time of action and sequencing demonstrated significantly different effects across course levels, but not presence of an adverb. Thus, it may be that separate regression analyses across learner levels could lead to an overestimation of differences among these levels. Most change across course levels occurred in a year ago contexts, in which the use of the preterit increased. Implications are discussed relating to the developmental trajectory of preterit and PP forms in L2 Spanish, L2 statistical methods, and the teaching of past perfective forms in Spanish.


Introduction
Decades of research demonstrate that native speakers (NSs) display probabilistic variation during language use that is constrained by linguistic, social, and cognitive constraints (e.g., Labov 1972;Wagner & Sankoff 2011). This variation is known as sociolinguistic variation, and structures that demonstrate two or more ways of saying the same thing are often called variable structures. Research into second languages (L2s) has found that L2 learners also use many of the variable structures found in native speech and that their use is affected by many of the same constraints as native speakers (e.g., Adamson & Regan 1991;Geeslin & Gudmestad 2008;Kanwit 2014). The accumulative research shows that as learners' proficiency or experience with the L2 increases, they employ these structures in ways that are increasingly similar to NSs. They use forms at rates that are more like those of NSs, and their variation between forms is progressively constrained by more of the same linguistic, social and cognitive constraints that govern native speaker variation, often in the same way. While this development has been extensively studied for structures such as subject form variation (e.g., Geeslin, Linford & Fafulas 2015;Geeslin, Linford, Fafulas, Long & Díaz-Campos 2013;Long 2016;Zahler 2018) and future form variation (e.g., Gudmestad and Geeslin 2013;Kanwit 2014Kanwit , 2017Solon and Kanwit 2014), less research exists that examines L2 development of Spanish perfective forms (i.e. preterit and present perfect [PP]) across multiple course levels (e.g., Terán 2019; Zahler & Whatley 2023). Additionally, most studies on the L2 acquisition of variable structures that examine development across course or learner levels often compare across multiple separate statistical analyses in ascertaining learner development (e.g., Zahler & Whatley 2023;Geeslin, Linford & Fafulas 2013;Kanwit 2014Kanwit , 2017Terán 2020). These studies compare each learner group to each other in terms of the constraints that are significant, as well as their direction and magnitude of effect. However, comparing across separate regression analyses does not allow for the determination of whether these differences between learner groups are, in fact, large enough to be statistically significant. Thus, the goal of the current study was to reanalyze the data from Zahler and Whatley (2023), who examined L2 Spanish past perfective forms in a controlled task completed by learners from multiple course levels, comparing across mixed effect regression models conducted for each level. In the controlled task, known as a written contextualized task (WCT), students were asked to complete 32 sentences in a narrative with either a preterit or PP form. Each of the 32 contexts was manipulated for four separate discursive constraints known to affect preterit-PP variation in native Spanish. The regression analyses conducted in Zahler and Whatley (2023) determined whether one of these four discursive features of the surrounding context significantly impacted learner selection in a given level. The authors then compared the significant constraints across levels, as well as their coefficients to determine whether the course levels differed from each other. In this reanalysis, the data from all course levels were combined and one statistical model was used that included an interaction between course level and each of the linguistic constraints manipulated on the task. This type of model allowed for confirmation of whether the effect of the linguistic constraints was significantly different between course levels, which can verify differences between learner groups in L2 research.

Perfective form variation in native Spanish
In native Spanish, there are three primary simple verb forms used to express past. The imperfect denotes past actions with imperfective or continuative aspect. These are events that are viewed as ongoing or habitual from the perspective of the speaker (e.g., comía 'I would eat/was eating'). The preterit is used for perfective past actions, those that take place completely in the past and are viewed as discrete events that are separate from the present from the viewpoint of the speaker (e.g., comí 'I ate'). The PP, on the other hand, is used for past actions with perfect aspect, those in which the past action has a continuing or persistent effect until speech time or is viewed as connected or relevant to speech time by the speaker (e.g., he comido 'I have eaten'). However, in many varieties of Spanish, the preterit and PP overlap since the PP can express some perfective functions (Schwenter & Torres Cacoullos 2008). The use of the PP differs among varieties, and in certain dialects, the PP has developed specialized epistemic or evidential uses (e.g., Howe 2006;Rodríguez Louro & Howe 2010).
Since these forms at least partially overlap in the perfective context in many varieties of Spanish, researchers have quantitatively analyzed their use (e.g., Burgo 2010;Howe 2006;Howe & Schwenter 2003;Schwenter & Torres Cacoullos 2008; among many others). Using primarily variationist methodologies, they code each use of the forms in samples of data for features of the surrounding discourse. These features of the surrounding discourse are those aspects that are typically indicative of or tend to be associated with perfect aspect (typically expressed via the PP) versus perfective aspect (typically expressed via the preterit but for which the PP is used in certain contexts). Researchers, then, often run multivariate regression analyses to determine whether specific features of the discourse predict or more frequently co-occur with the use of either form. These associations often clarify the similarities and differences in functional meaning between preterit and PP forms. In research on preterit versus PP variation, frequently studied variables include: the temporal reference of the past event, presence and type of temporal adverbs, verb lexical aspect, grammatical person, polarity, object plurality, clause type, and foregrounding and/or sequencing of the event (e.g., Burgo 2010;Hernández 2013;Howe 2006;Howe & Schwenter 2003Rodríguez Louro 2009;Rodríguez Louro & Yupanqui 2011;Schwenter & Torres Cacoullos 2008;Terán & Kanwit 2018).
Findings have shown that many of these variables demonstrate similar effects across varieties if significant. Given the number of varieties and variables examined across studies in native Spanish, this review will be limited to the discourse features manipulated on the task used in this study, described in Section 3.2. For the first variable, time of action of the event, defined as whether the event occurred on the same day as speech time or a year ago in the task for this study, several Peninsular varieties differ from most Latin American varieties in that the PP is used more frequently with perfective events occurring on the same day as speech time in these Peninsular varieties (e.g., Zahler & Whatley 2023;Burgo 2010;Howe 2006;Schwenter & Torres Cacoullos 2008;Soto 2014). In many Latin American varieties, on the other hand, the PP is often disfavored in the today context compared to the preterit (Schwenter & Torres Cacoullos 2008;Soto 2014;Terán & Kanwit 2018), although even in Latin American Spanish, the PP was used more frequently in today contexts than before today contexts (e.g., Zahler & Whatley 2023). Regarding the second constraint examined in the study, verb lexical aspect, most research has found that the preterit is more common with telic verbs (accomplishments and achievements) than nontelic ones (activities and states) (e.g., Zahler & Whatley 2023;Burgo 2010;Schwenter & Torres Cacoullos 2008 [Mexican Spanish]; Terán & Kanwit 2018). However, some studies have found that this constraint had no effect on variation (Howe 2006;Rodríguez Louro 2009;Schwenter & Torres Cacoullos 2008 [Peninsular Spanish]). For the third variable, presence or absence of a temporal adverb, when there is a significant effect, the presence of an adverb favored the preterit while its absence disfavored it (e.g., Terán & Kanwit 2018); however, many studies have found no effect of this constraint (Zahler & Whatley 2023;Howe & Schwenter 2008;Rodríguez Louro 2009). Lastly, for the fourth variable, sequencing of the event, previous research has found that in most varieties, the preterit was used in foregrounded sequenced narratives, while the PP was used for background and nonsequenced events, although the PP can also be used with sequenced, foregrounded events that have today temporal reference in several peninsular varieties (e.g., Zahler & Whatley 2023;Howe & Schwenter 2003). The remaining variables found to be significant in prior research on native speaker varietiesclause type, polarity, grammatical person, and primingwere controlled in the current study, except for object plurality, as explained in Section 3.2.

Past perfective forms in second language Spanish
Most prior research on the L2 development of past tense forms in Spanish focuses on the acquisition of the preterit and imperfect (e.g., Andersen 1991;Camps 2002;Liskin-Gasparro 2000;Lubbers Quesada 2006;Montrul & Slabakova 2002;Salaberry 2000). Although this research does not address the variation between preterit and PP forms, it does indicate that the preterit first emerges with prototypical meanings of perfectivity, and is preferred with telic verbs, foregrounded events in narrative sequences and, in particular, with adverbials marking telic aspect.
Research that includes the PP in their analysis of the L2 Spanish past time verb forms is of two types. Several studies examined the variation between preterit and PP forms among English-speaking U.S. students studying abroad and in contact with varieties with region-specific uses of these forms (Zahler & Whatley 2023;Geeslin, García-Amaya, Hasler-Barker, Kenriksen & Killam 2012;Geeslin, Fafulas & Kanwit 2013;Kanwit, Geeslin & Fafulas 2015;Linford 2016;preterit, PP and imperfect in the case of Whatley [2013]). In all these studies, learners were asked to complete a WCT at least twice over the course of study abroad, selecting between past tense forms in contexts on the task that were manipulated for several discursive features, often based on the regional varieties with which the study abroad students came into contact. Findings diverged between studies and even between learner groups in the same study: some learners moved toward the patterns evidenced in the local variety while others moved away from them. For example, Geeslin, Fafulas and Kanwit (2013) studied the selection between preterit and PP on a WCT by learners studying abroad in San Luis Potosí, Mexico and Valencia, Spain, two regions that differ in their use of preterit and PP. They manipulated the contexts on a WCT for the time of action (today, yesterday, before yesterday, undetermined), telicity of the verb (telic, not telic) and repetition (whether there was a frequentative/repetitive adverb or not). The authors found that learners in each region moved toward their respective regional norms for the effect of the constraint "time of action". The learners in Spain increased their selection of the PP in today temporal contexts, while the learners in Mexico decreased its selection in this context, following the choices of NSs from their respective regions who also completed the task. On the other hand, learners in Mexico moved away from the native speaker norm for the effect of repetition, unlike the Spain learners who showed no change in this constraint over study abroad. Zahler and Whatley (2023) argued that the only consistent finding across studies and learner groups from previous study abroad research on preterit-PP variation is that learners moved toward regional norms regarding the effect of the factor of time of action. However, for the other constraints considered, given the diversity of coding schemes, task types, and learner proficiency levels, cross-study comparisons are difficult to make. These studies, although focused on learners studying abroad, do provide information about the acquisition of the preterit and PP in L2 learners outside of study abroad as well, particularly regarding the PP, since there were some consistencies for the constraint time of action in learner development even across different regions. As mentioned in Zahler and Whatley (2023), learners were sensitive to the effect of time of action and showed development with regard to the effect of this constraint, generally in the direction of their local study abroad regional pattern. This finding suggests that this constraint on past perfective form variation is one in which learners may demonstrate a lot of development in general, or is one to which they are particularly sensitive, potentially including outside the study abroad context. Additionally, in most studies, at the start of study abroad and often even after their time abroad, learners overselected the PP in contexts that are typically strongly favorable for preterit selection, such as with "a year ago" temporal reference (Zahler & Whatley 2023;Geeslin et al. 2012), suggesting that this overselection is a typical developmental stage for the PP.
The remaining studies that examine preterit and PP variation in L2 Spanish have focused on mostly English-speaking learners in the U.S. university context (Zahler & Whatley 2023;McAlister 2019;Terán 2020). Overall, these studies corroborate findings from study abroad that L2 Spanish learners overallow or overuse the PP in traditionally perfective contexts, such as those bounded in the past and with before-today temporal reference. These studies also indicate that this overallowance and use decreases as learner experience with the target language increases. For example, the first study, McAlister (2019), investigated L2 Spanish learners' acceptance of the PP in a judgment task with items that differed by whether the PP or preterit forms are generally accepted in the discursive contexts in which the item was embedded. The participants who completed this task were 21 American English and 4 Brazilian Portuguese learners of Spanish in 3 rd year, 4 th year, and graduate Spanish courses. McAlister found that American English learners allowed the PP in contexts that are typically reserved for the preterit, that are bounded in the past and do not include speech time. He also observed that learners who had studied abroad were more likely to reject the PP in traditionally preterit contexts.
Terán (2020), the second study, examined both production of past tense forms in an oral task and their selection on a WCT similar to the task used in the current study, except that learners could opt between additional past tense forms as well as the preterit and PP. Her participants were English-speaking L2 learners of Spanish in 1 st semester, 2 nd semester, 4 th semester, 5 th /6 th semester, 7 th /8 th semester, and graduate Spanish classes. She found that in production, the PP emerged after the preterit and imperfect, and was rarely produced even at the most advanced proficiency levels. When produced in her highest level of learners, it was used 95% of the time in prototypically preterit contexts rather than perfect ones, mirroring McAlister's (2019) findings on his acceptability task. She also found that learners at the lowest level (1 st semester Spanish courses) overselected the preterit on the WCT in typically perfect contexts, where one would expect the PP form, such as with background verbs and atelic verbs. This overselection did not occur in higher levels. Her third observation was that learners selected the PP in prototypically preterit contexts, such as with hesternal (yesterday) and/or pre-hesternal (before yesterday) temporal reference in 1 st through 3 rd semester classes. This result was similar to her findings on production and McAlister's (2019) findings on his judgment task, providing additional support for a developmental stage in L2 Spanish in which the PP is overused or overselected in traditionally preterit contexts.
Lastly, Zahler and Whatley (2023) analyzed the selection of preterit and PP forms on a WCT manipulated for four linguistic variables (time of action, verb lexical aspect, sequencing, and presence of temporal adverb) by 105 learners of Spanish across six course levels. They observed that learner selection of the PP on the task decreased as course level increased. Additionally, and like Terán (2020), they found that the linguistic constraints that affected form choice on the task differed across course level. The findings of their regression models are presented in Table 1 below (c.f. Zahler & Whatley 2023: 340). Learners' selections between PP and preterit were significantly constrained by time of action at all course levels, with the same direction of effect: participants chose more PP with today temporal reference than a year ago. However, verb lexical aspect (telic v. atelic), sequencing (part of a sequence of events or not) and presence of a temporal adverb differed across levels. Verb lexical aspect only significantly affected learners' choices at the 3 rd semester and 4 th year levels, sequencing at the 3 rd semester, 4 th year and graduate levels, and presence of an adverb at the 4 th semester level.  Zahler and Whatley (2023: 340 Zahler and Whatley (2023) interpreted these differences in significant constraints and coefficients across levels as indicative of learner development. For example, when discussing their findings for time of action, they stated that "time of action was significant for all learner class levels, and its effect, as indicated by the coefficient value, generally increased as class level increased" (p. 340). However, although the coefficient for this constraint increased as course level increased, since a separate regression analysis was conducted for each participant group, it cannot be confirmed whether these differences are significant between learner groups. The same observation applies to the other constraints. For example, although verb lexical aspect (telic vs. atelic) was only significant at the 3 rd semester and 4 th year levels, the remaining levels demonstrated the same direction of effect of this factor, but the effect did not reach statistical significance. Thus, it may not be that the difference in the effect of this factor is statistically significant between course levels. Additionally, given that their study was focused on study abroad and the U.S. university students served as a comparison, the authors did not provide the distribution of learner choices among categories of these variables, such as the percent of items with today reference for which PP was selected by learners compared to items with a year ago reference. Thus, even if there is a difference between levels in terms of the magnitude of effect of the constraint time of action, it is unclear in which contexts this change is specifically taking place: today, a year ago, or both.
Together, these lines of research offer some generalizations. First, more past tense forms emerge in L2 Spanish as learner proficiency or language experience increases. Regarding the PP and preterit, the latter emerges before the former in production. However, when the PP does emerge in production, it is overused in primarily preterit contexts. In selection and judgment tasks the PP is overallowed in preterit contexts compared to NSs at earlier levels, but appears to lessen in these contexts as learner experience with Spanish increases. Additionally, learners select the PP more on WCT or interpretation tasks than they produce it. The final consistency that can be found across studies is that learners are sensitive to the effect of time of action, more than other constraints, and that this constraint typically shows differences across course levels in research in the university context or over time in study abroad research. However, to determine whether these differences are significant across levels, we need to combine learner levels in one statistical model and include an interaction between course-level and linguistic constraints.

Comparison across statistical models in second language studies
In general, most studies of variation in L2 Spanish have followed a similar crosssectional design to Zahler and Whatley (2023) in order to examine the development of variable structures by conducting separate statistical analyses for each learner level and comparing across them (e.g., Geeslin et al. 2015;Gudmestad 2012;Gudmestad & Geeslin 2013;Kanwit 2014Kanwit , 2017Long 2016;Zahler 2018). In these analyses, differences in the significant constraints and their direction of effect across levels are interpreted as indicative of learner development. For example, Zahler (2018) examined the selection of null and overt subject pronouns by 4 th year university and graduate learners of Spanish on a WCT manipulated for four constraints: switch reference, priming, person and switch in TMA. She found that learners' choices in the 4 th year level were not significantly constrained by the factor switch in TMA, while graduate learners' selections were. She interpreted these findings as indicative of a developmental change between the two levels. However, although not significant, the 4 th year learners demonstrated the same pattern of effect as the graduate learners, that is, they selected more overt pronouns in contexts of switch in TMA compared to no switch in TMA. Thus, it is unclear whether the effect of the factor switch in TMA was indeed significantly different and represents change between levels. Additionally, including all data in one model and adding an interaction between the cross-sectional participant groupwhether it is defined by course level, proficiency level, time, or another participant characteristicwould follow best practices in linear and logistic regression analysis (e.g. Dunteman & Ho 2006;Osborne 2015). Moreover, this practice would put research on L2 variation in line with other L2 researchers more broadly (e.g., Cunnings  Thus, the current study aims to add to the growing literature on preterit and PP in L2 Spanish, as well as to research on L2 morphosyntactic variation more generally in two ways. First, we reanalyze the data in Zahler and Whatley (2023) using one statistical model. This will allow us to confirm whether any changes in effects of independent variables in higher course levels (4 th semester, 5 th semester, 3 rd year, 4 th year, and graduate) are, indeed, statistically significantly different from the lowest level (3 rd semester). Second, we will provide the distribution of learner choices between preterit and PP across the categories of the independent variables that differ across course levels in order to visualize in which contexts more learner development is evidenced. To that end, two research questions guide the current study: (1) What are the significant constraints that affect learners' choices between preterit and PP forms on a WCT? Are these constraints statistically significantly different across course levels? (2) In which categories of the linguistic independent variables is development found?
For the first research question, it is hypothesized that the reanalysis of the data from Zahler and Whatley (2023) will indicate less differences across course levels than a comparison of separate regressions suggests. Specifically, we expect that time of action will demonstrate change across levels, given previous research on study abroad and university Spanish learners that has shown learners to overselect or overallow the PP in temporally remote contexts at earlier levels or with less language experience compared to later levels (Zahler & Whatley 2023;McAlister 2019). However, we do not anticipate significant change in the other constraints analyzed in the current study: verb lexical aspect, sequencing, and presence of temporal adverb, given that these constraints show variable findings in previous research both abroad and in the university context (e.g., Zahler & Whatley 2023;Geeslin et al. 2012;Geeslin et al. 2015;Linford 2016;McAlister 2019;Terán 2020;Whatley 2013). Specifically, in Zahler and Whatley (2023), these constraints generally demonstrated the same direction of effect for learners, but only varied in whether they were significant. Only a direct comparison between levels with one statistical model will clarify whether these differences between levels are significant. For the second research question, we expect that learners will show the most change across levels in temporally-bound past events in the past that are not tied to speech time, given previous research findings that learners overuse and overselect the PP in these contexts. In the task used for the current study, this would be in contexts in which the time of action was a year ago. We hypothesize that learners will overselect the PP in this context at earlier levels, and that this overselection will diminish as course level increases.

Participants
The participants for this study were 105 English-speaking learners of Spanish who served as an at-home comparison group for the study abroad research in Zahler and Whatley (2023). The learners belonged to six course levels: 3 rd semester (N = 20), 4 th semester (N = 20), and 5 th semester (N = 17) Spanish language courses, 3 rd year content courses (N = 20), 4 th year content courses (N = 14), and graduate Spanish courses (N = 14). All learners indicated that English was their sole first language. Some had acquired other L2s and/or studied abroad. These participant characteristics were not controlled since the differences in individual characteristics among participants across levels were considered typical of language students as course level increases. Students that studied abroad had done so in a variety of countries, including Spain, Argentina, Colombia, Peru, Ecuador, Costa Rica, Mexico, Chile, and Guatemala. The profiles of the participants at each level are provided in Table 2. The grammar test is described in Section 3.2.

Procedures
All participants completed three tasks one time: a written contextualized task (WCT), a 25-item grammar test and a background questionnaire. The WCT asked students to select between the preterit and the PP forms to complete sentences embedded in a narrative. These sentences were created to manipulate four features of the surrounding discourse known to affect preterit-PP variation in native Spanish. Three of the manipulated variables had two categories, while one had four. Each category of each variable co-occurred in combination with all other categories of the other variables. Thus, this led to 32 unique combinations of the four variables and their categories (2 x 2 x 2 x 4 = 32), yielding the 32 items on the task. The first variable manipulated was time of action, which occurred either today or one year ago. The second was verb lexical aspect, of which states, activities, achievements and accomplishments were equally distributed across the task. The third variable was sequencing: each context could be part of an overt sequence of events introduced with an infinitive after a preposition (e.g., después de comer 'after eating') or not part of an overt sequence of events. Lastly, the fourth variable was the presence of temporal adverbs (present/absent). Even when the temporal adverbial was absent, the time of actiontoday or a year agowas always inferable from the context or previously established in the prior sentence. When present, it was always a specific temporal adverbial anchoring the event to today or a year ago reference. In example (3), the sentence occurred with a year ago temporal reference, established in the prior sentence, with an accomplishment (telic) verb, in a sequence (después de comer 'after eating'), and with a temporal adverb (esa mañana 'that morning') present.

(3)
Hace un año […]. Esa mañana, después de comer, Jorge (3) hizo/ha hecho la maleta -¡Una tarea que siempre es bien difícil! 'One year ago … That morning, after eating, Jorge packed/has packed his suitcase -A task that is always tough!' These independent variables were selected given prior research on native Spanish that have shown them to be important constraints on preterit and PP use. Additionally, although presence of a temporal adverb was only significant in one prior study and not significant in several others examining native Spanish, we included it in the task since learners, even at advanced levels, overly depend on adverbial modifiers to interpret aspect in order to select verb morphology, while native speakers interpret aspect from the verbal morphology (Baker & Quesada 2011). Other variables known to affect past perfective form variation were controlled in the task: the preceding verb was either imperfect or pluperfect, ya 'already' was not employed, the subject was 3 rd -person singular, all sentences were affirmative, and the verb always occurred in a main clause. Object plurality was not controlled for. In the current task, 14 items had no direct object, 14 had a singular one and 4 had a plural one. Thus, we also included it as an independent variable in our regression analyses, described in Section 3.3.
Participants also completed a 25-item grammar test as a measure of grammatical knowledge in Spanish. This was a multiple-choice task in which participants had to select the appropriate grammatical form to complete sentences in a narrative. The items tested participants' knowledge of subjunctive, copula contrast, object pronoun use, prepositions, and preterit/imperfect use, among other grammatical features. This test has been used in several studies by Geeslin and colleagues (e.g., Geeslin & Gudmestad 2010), and has been shown to distinguish among levels of L2 learners. Lastly, participants completed a background questionnaire asking for their linguistic history and demographic information.

Analyses
The data were analyzed via a binomial generalized linear mixed model in R (R Core Team 2020), using the glmer function of the lme4 package (Bates, Maechler, Bolker & Walker 2015). All participants were included in one model. The dependent variable was the learners' selection of either preterit or PP on the WCT, resulting in 32 tokens per participant. We subsequently coded each response for the four linguistic variables manipulated on the tasktime of action, verb lexical aspect, sequencing, and presence of a temporal adverbas well as for object plurality. Verb class was combined into telic predicates (achievement and accomplishment verbs) and atelic predicates (activities and states) in order to compare with previous research (e.g., Zahler & Whatley 2023;Geeslin et al. 2012;Geeslin, Fafulas and Kanwit 2013;Terán 2020). We also coded each token for two participant characteristics: grammar test score (continuous) and course level. These linguistic and individual variables were included as fixed effects, while individual participant was a random effect. The categorical independent variables, which included all linguistic variables as well as course-level, are automatically dummy coded within a glmer analysis, where one category is selected as the baseline or reference level. The baseline for time of action was today temporal reference, for verb lexical aspect it was atelic, for sequencing it was no sequence, for presence of a temporal adverb it was no temporal adverb present, and for object plurality it was no object. The reference level for course level was 3 rd semester. We also considered four additional interaction variables in the analysis: time of action X course level; verb lexical aspect X course level; sequencing X course level, and presence of a temporal adverb X course level. 1

Results
In this section, the findings of the study are presented. Figure 1 displays the rates of PP selection on the WCT by course level. The lowest level of learners, those in 3 rd semester Spanish, selected the PP at the highest rate, on average 36.8% of the time. Other than in 5 th semester Spanish, as course level increased, selection of the PP decreased (4 th semester = 30.7%; 5 th semester = 33.5%; 3 rd year = 29.7%; 4 th year = 24.6%; graduate = 15.9%). This selection of PP for the graduate learners is at a rate nearly identical to the U.S. native speaker group rate of 15.1% reported for the same task in Zahler and Whatley (2023). The native speakers in their study resided in the United States but were from a range of Spanishspeaking regions across the world. The code that was used in R to run the glmer model was the following: PP_Full.mod <glmer(factor(Response) ~ 1 + Aspect + Time + Sequence + Adverb + Object + Proficiency + Level + Aspect:Level + Time:Level + Sequence:Level + Adverb:Level + (1|Individual), family= binomial, data= data) Table 3 presents the results of the regression analysis. In the table, all significant constraints are marked with asterisks, which indicate the p-value. For each constraint, the coefficient is included as well as the standard error. For the dependent variable, selection of preterit or PP, the PP form was the reference category. For each categorical constraint, one category served as the reference. For example, with time of action, today temporal reference was selected as the reference category. For the remaining categorical variables, the reference categories were the following. For verb lexical aspect, it was atelic, for sequencing, it was no sequence, and for presence of temporal adverb, it was no adverb present. For object plurality, the reference category was no object present, while for course level and the interaction variables, it was 3 rd semester, the lowest learner level in the current study. Grammar test score was included as a continuous variable and did not have a reference category. Since an interaction between course level and the four linguistic independent variables was included in the model, any significant effect of these latter four variables only applies for the reference level of the factor course level, which is 3 rd semester. Thus, the coefficient for time of action indicates the likelihood of the selection of the preterit form in the non-reference category, a year ago, compared to the reference category, today, for the 3 rd semester group. Thus, a positive regression coefficient for a year ago contexts indicates that the preterit is more likely to be selected than the PP in a year ago contexts compared to today contexts for 3 rd semester participants. For the interaction variables, the coefficient indicates the difference of the effect of the nonreference category of the first independent variable, one of the four linguistic independent variables, compared to the reference category in the non-reference category of the second independent variable, course level, compared to the reference category, in this case, 3 rd semester. Thus, for example, there is a positive coefficient 2.076 for the effect of a year ago temporal reference compared to today temporal reference in the 4 th year course level compared to 3 rd semester. This positive coefficient indicates that the slope of the effect of a year ago temporal reference compared to today is higher in the 4 th year than the 3 rd semester. Table 3 includes the coefficient estimates β, standard errors SE, z value and pvalue with significance level indicated by asterisks for all variables in the model. The results of the model indicate that time of action, sequence and object plurality had direct effects on learners' selections on the WCT for the 3 rd semester participant group. The preterit was more likely to be selected in a year ago contexts versus today contexts, with a sequence of actions versus no sequence, and with a singular object versus no object. On the other hand, presence of temporal adverb compared to no temporal adverb was not a significant constraint on variation nor was the effect of a plural object compared to no object. Grammar test score also had a direct effect on variation. Learners in the 3 rd semester with a higher grammar test score selected more preterit than PP on the task. Lastly, two of the interaction constraints were significant: time of action X course level and sequencing X course level. For time of action X course level, the effect of a year ago contexts versus today contexts was stronger for learners in the 4 th year and graduate levels compared to 3 rd semester learners. For sequencing X course level, the effect of selecting more preterit in sequenced contexts versus non-sequenced contexts was significantly weaker or the opposite trend for 3 rd year compared to 3 rd semester learners.
In the following two figures, the cross-sectional (i.e. across learner level) distribution of the selection of PP on the task is provided for the factors time of action and sequence, since the effect of these constraints were significantly different across levels. Figure 2 illustrates that the direction of effect of time of action is the same across course levels. Learners select less PP than preterit in a year ago contexts compared to today contexts. However, in the 4 th year and graduate levels, this effect is larger. That is, the difference in PP selection between the two temporal contexts is bigger, as confirmed by the regression analysis, in which these are the only two levels that are significantly different from the 3 rd semester for the effect of a year ago temporal reference compared to today. In particular, PP selection decreases in the year ago context, in which the PP is rarely selected (<10%) in the 4 th year and graduate groups. In Figure 3, the distribution of sequencing across course levels is illustrated. The direction of effect is the same in five of the six learner groups -the PP is more frequent with non-sequenced events than sequenced events. However, the effect of sequenced events compared to non-sequenced is significantly lesser in the 3 rd year level compared to the 3 rd semester. In fact, by comparing across sequenced and non-sequenced events in the 3 rd year group, it appears that there is little difference in rate of PP selection between these two contexts.

Discussion
In this section, we discuss our findings, first in terms of the research questions, then with regard to how they connect to prior studies. The first question asked about the significant constraints that affect learners' choices between preterit and PP forms on our WCT task and whether these constraints were different across course levels. The regression model indicated that of the linguistic factors, time of action, sequencing and object presence were significant. Since time of action and sequencing were included in an interaction with course level, they were simple effects and their significant result indicates that they constrained variation in the 3 rd semester group. This group was more likely to select the preterit with a year ago reference and sequenced events. Object presence had a direct effect on past form selection in this group as well: 3rd semester learners significantly selected more preterit with singular objects than with no object. Time of action and sequencing were moderated by course level. The effect of time of action was stronger with 4 th year and graduate Spanish learners compared to 3 rd semester. The effect of sequencing was weaker with 3 rd year learners compared to 3 rd semester. Course level did not have a significant effect on variation. Interestingly, the 4 th year and graduate learners selected even more preterit on the task than any of the other groups but there was not a direct effect of course level on the selection of preterit for these levels. This is because of the interaction between time of action and course level for the 4 th year and graduate learners. The increase in preterit and decrease of PP was mostly found in the year ago contexts in these groups, as exemplified in Figure 2. Thus, the results suggest that as course level increases until the 3 rd year, learners select more preterit on the task overall, but that this continued increase in preterit selection and decrease in PP selections occurs more in a year ago contexts in the 4 th year and graduate groups rather than across the whole task. Lastly, grammar test score had a direct effect on preterit selection for 3 rd semester students, which was higher when learners' grammar test score was higher. However, no further interactions between grammar test score and course level were explored.
The findings for research question one are important with regard to how linguistic development is analyzed in L2 research for variable structures. In Zahler and Whatley (2023), using the same data, a separate regression analysis was conducted for each course level. The findings from their study suggest that more development occurred than the current model in this study indicates. For example, Zahler and Whatley (2023) found that verb lexical aspect significantly constrained variation in the 3 rd semester and 4 th year groups, while presence of a temporal adverb was significant for the 4 th semester group but no others. However, by including all the data in one model and adding an interaction between each linguistic constraint and course level, we see that verb lexical aspect and presence of a temporal modifier were not significant as simple effects nor was there an interaction with course level. Thus, these constraints for which differences across learner groups were evidenced in previous research, did not, in fact, demonstrate significant development across course levels in the current study. Additionally, by using the single model, we were able to show that the effect size of time of action was indeed significantly larger in the 4 th year and graduate learner groups, as suggested in Zahler and Whatley (2023). Therefore, the results indicate that using a single model with interactions between the linguistic constraints and course level can help to confirm development (i.e. statistical differences) between course levels. At the same time, conducting separate regression analyses may lead to an overestimation of differences between course levels, while a single statistical model can avoid such interpretation. These findings indicate that the choice of statistical model leads to distinct findings.
The second research question asked whether development was found more in specific categories of the linguistic independent variables considered in the current study. Only two of the constraints had effects that changed over course levels, and Figures 1 and 2 helped in visualizing where these differences occurred. The effect of sequencing was generally the same over course levels, except in the 3 rd year level, where a visual analysis in Figure 3 suggested that it had no effect. For time of action, as already described, this constraint was significantly stronger in the 4 th year and graduate levels, specifically in a year ago contexts. Indeed, 3 rd semester students selected the preterit 58.4% of the time in today contexts and graduate learners did so 70.5% of the time, for an increase of 12.1% for the preterit and the same decrease for the PP in this context. However, in a year ago contexts, the difference between 3 rd semester (68.0%) and graduate learners (97.8%) in preterit selection was 29.8%, meaning that the selection of the PP was reduced by the same amount in this context. Thus, taken together, the single model regression analysis coupled with Figures 2  and 3 showing the distribution of preterit selection among categories of the independent variables indicated that most learner progress in the use of preterit and PP forms occured in a year ago contexts, and specifically not until the 4 th year and graduate level. This context is temporally bound in the past and distant from speech time, and thus is one in which learners should select very little or no PP forms. However, in the earlier levels, they overselected the PP with a year ago reference, and decreased this selection of the PP as course level increased, increasing the preterit in this context. These findings, that most progress occurred in a year ago temporal contexts, are specific to the WCT used in the current study. However, they do coincide with previous research on preterit and PP variation that indicates that L2 Spanish learners overuse or overselect the PP form in typically preterit contexts, that is, those that are temporally bound in the past and separate from the present (e.g., yesterday, a week ago, a year ago). This overuse and selection of the PP in preterit contexts occurs in several previous studies on study abroad, which provide a snapshot of development, as well as in at-home research, some of which is crosssectional as a proxy for development (Zahler & Whatley 2023;Geeslin et al. 2012;Kanwit et al. 2015;McAlister 2019;Terán 2020). In the U.S. university context specifically, only McAlister (2019) and Terán (2020) have examined preterit and PP form variation. Both have found that overselection of the PP in typically preterit contexts occurs at earlier course levels (in the case of Terán [2020]) or in learners without study abroad experience compared to those with experience abroad (in the case of McAlister [2019]). Thus, it may be part of the typical developmental trajectory for past perfective forms in Spanish for learners to initially overuse the PP form in preterit contexts before acquiring a more native-like distribution of the PP and preterit. However, it is important to highlight that we did not directly compare with native speakers in the current study.
Coupled with study abroad research that shows that learners tend to move toward native speakers from their region of study abroad with regard to the effect of time of action, the accumulative research indicates that time of action is a constraint with which L2 Spanish learners typically show the most development and to which they are sensitive. Thus, while this study is not focused on and does not assess instruction on L2 Spanish past tense forms, there are potential teaching ramifications for our findings. Specifically, since learners overuse the PP in preterit contexts (Terán 2020) and overallow or select it on interpretation and WCT tasks (the current study; McAlister 2019), it may be that learners have difficulty in understanding the semantic differences between the preterit and PP forms, especially regarding time of action and relevance to speech time. Instructors of Spanish could target temporally-bound contexts in the past and focus on these as a site for improvement in use of the preterit and PP forms. Due to an apparent sensitivity to this constraint on the part of learners in our and other prior research on L2 Spanish past perfective forms, it may be that learners would benefit from targeted teaching on these semantic and temporal uses of the PP and preterit forms.

Conclusion
In this study, we compared the selection of PP and preterit forms on a WCT in learners at course levels ranging from 3 rd semester to graduate level Spanish, analyzing the effect of five linguistic constraints, course level and grammar test score in one statistical model. Our first goal was to determine whether the use of one statistical model including all course levels and interactions between course level and four linguistic constraints would demonstrate different findings from separate analyses for each course level, as conducted in prior research. Our results indicated that distinct statistical models per course level do suggest more differences between learner groups than a single model, thus possibly overestimating differences and learner development. Our second goal was to examine in more detail the specific contexts in which learner development occurred across course levels. We found that our participants demonstrated the most change specifically in a year ago contexts on our task. This observation is in line with previous research on L2 Spanish past perfective forms that have also found that learners overuse and overselect the PP form in typically preterit contexts, especially at lower course levels or with less experience with Spanish. This overuse and selection may be a stage in the development of past perfective forms in L2 Spanish. Overall, this study shows the importance of carefully choosing the type of statistical analysis used in L2 research and sheds additional light into the acquisitional trajectory of PP and preterit forms in Spanish.