RESTRICTIONS ON ORDERING OF ADJECTIVES IN SPANISH

Abstract. Sequences of multiple modifying adjectives are subject to poorly understood lexical ordering restrictions. There are certain commonalities to these restrictions across languages, as well as  substantive language variation.  Ordering restrictions in Spanish are still under empirical debate, with some proposing strict ordering for direct modifier adjectives; others proposing broad ordering restrictions based on the contrast between intersective and non-intersective adjectives, and yet others raising the possibility that adjectival order is fully unrestricted.  The goal of the present study is to examine corpus evidence for adjectival sequences. We look at both sequences of two postnominal adjectives (Noun +Adjective + Adjective, NAA sequences) as well as sequences of one prenominal, and one postnominal adjective (Adjective + Noun +Adjective, ANA sequences). The results from the NAA datasets clearly categorically confirms that relational adjectives are structurally closer to the noun. There is some evidence for an ordering bias along the line of the intersectivity hypothesis, but little else in term of hard evidence for restrictions. Additional ordering constraints appear once we incorporate the ANA datasets into the empirical picture. One interpretation is that these restrictions can be subsumed under an approach where evaluative adjectives have to occupy the prenominal restriction.  In sum, the evidence is most compatible with the middle ground approach, but not with a fully articulated set of ordering restrictions.

ABSTRACT. Sequences of multiple modifying adjectives are subject to poorly understood lexical ordering restrictions. There are certain commonalities to these restrictions across languages, as well as substantive language variation. Ordering restrictions in Spanish are still under empirical debate, with some proposing strict notionally-based ordering for direct modifier adjectives; others proposing broad ordering restrictions based on the contrast between intersective and non-intersective adjectives, and yet others raising the possibility that adjectival order is fully unrestricted. The goal of the present study is to examine corpus evidence for adjectival sequences. We look at both sequences of two postnominal adjectives (Noun +Adjective + Adjective, NAA sequences) as well as sequences of one prenominal, and one postnominal adjective (Adjective + Noun +Adjective, ANA sequences). The results from the NAA datasets clearly categorically confirm that relational adjectives are structurally closer to the noun. There is some evidence for an ordering bias along the line of the intersectivity hypothesis, but little else in term of hard evidence for restrictions. Additional ordering constraints appear once we incorporate the ANA datasets into the empirical picture. One interpretation is that these restrictions can be subsumed under an approach where evaluative adjectives have to occupy the prenominal restriction. In sum, the evidence is most compatible with a middle ground approach, where a limited set of ordering restrictions applies to broad, entailment-based categories. The evidence does not seem to support a fully articulated set of ordering restrictions.

Introduction
Sequences of multiple modifying adjectives are subject to poorly understood restrictions. The unmarked ordering of NP-internal adjectives is affected by prosodic, morphological, lexical, and semantic factors. Some patterns seem to hold crosslinguistically, while others have been proposed to be language-specific. How universal are these ordering hierarchies? One leading proposal (Cinque 2010) holds that attested orders reflect an underlying syntactic cartography within the structure of noun phrases, which in some languages is expressed as a mirror image pattern, resulting from cyclic (roll up) movement. Cinque (2010) characterizes the contrast between Germanic and Romance languages in these terms. We illustrate by comparing Spanish and English. English has near categorical use of a prenominal position for adjectives, whereas the primary position for Spanish adjectives is postnominal. This is true of other Romance languages, but the postnominal order is especially dominant in the case of Spanish (Scarano 2005;Rizzi et al. 2013) 1 . The contrast between (1) and (2) illustrates the mirror image pattern. (1) Faded blue jeans (physical property >color) (2) Pantalones azules desteñidos (color > physical property) Nonetheless, there is no consensus in the literature as to whether a Cinque-style strict hierarchy applies in Spanish. Authors such as Demonte (1999) describes a smaller subset of ordering restrictions. Sánchez (1996Sánchez ( , 2017) goes a step further by proposing that, in Spanish, adjectives are not directly merged inside the nominal projection, but introduced as modifiers inside covert predicate structures (as predicates phrases or reduced relatives; see also Fábregas 2017). A consequence of this mode of derivation is greater flexibility in word order. As these claims rely primarily on intuitional data, our study aims to empirically evaluate the status of adjective ordering restrictions. We first provide an overview of the typological literature, then consider the literature on Spanish in more detail. We then present a corpus study of sequences of noun-adjective-adjective (NAA) and adjectivenoun-adjective (ANA). We extracted about 6800 tokens of these sequences from the 2012 Google Ngram Corpus and the Genre/Historical section of the Corpus del Español. Our data shows evidence for a small set of categorical restrictions alongside a set of ordering trends.

A typological review of ordering restrictions
Restrictions on the relative order of the various semantic classes in adjective sequences (henceforth, Adjective Ordering Restrictions or AORs) seem to recur in multiple languages. However, there is sufficient variation so as to preclude simplistic generalizations. Description is further complicated by the fact that informationally marked orders also exist, where under certain contexts and with certain prosody, otherwise marked orders are allowed by speakers (-You want this one? [passes the interlocutor a big red ball from a set of blue and red balls only two of which-one of each color-are big] -No, Give me the BLUE big ball).
AORs have been traditionally defined in terms of notionally or lexically-based typologies. These typologies vary in terms of their degree of articulation and may be language specific or universal. For instance, Dixon (1982) claims that there are seven semantic classes of adjectives in English, which form the hierarchy in (3), presumably under conditions of normal stress and intonation: (3) Dixon (1982) Sproat and Shih (1991) argue that (4) is universal in terms of head-proximity, on the basis of data from Dutch, Greek, Kannada and Mokilese. Scott (2002) uses examples from a variety of languages and language families including Finnish, Swedish, German, English, Welsh, Serbo-Croatian, Ibibio, and Malayalam to justify the fine-grained differentiation in his hierarchy.
Parallel to this expansion in AOR hierarchies, the literature show attempts to identify a unified underlying semantic source for AORs. Martin (1969) and Posner (1986) argue that gradability is one such factor, where less gradable adjectives appear closer to the noun compared to more gradable adjectives. Such efforts to explain AORs via more fundamental properties (Alexiadou et al. 2007) reflect the desire to eliminate notional classes in the interest of independent justification (McNally 2016;Truswell 2009). Still other semantic accounts are motivated by a need to explain AORs outside the syntax. A traditional approach to adjective modification such as Sproat & Shih's (1991) assumes adjectives are adjuncts recursively modifying NP. As adjuncts are freely ordered, they argue that AORs have a cognitive-semantic basis. They claim that adjectival categories can be classed into: a) Absolute properties (i.e. colour, shape) posed to be more inherently related to the noun; and b) Relative properties (i.e. quality, size), which are thought to be less related to the noun.
According to these authors, the absolute/relative distinction predicts where rigid ordering should be expected: adjectives of the same level of absoluteness can generally be reordered, while adjectives with different levels of absoluteness have inflexible orders. They argue that these classes also account for the restriction observed in other languages such as Mandarin, which appears to preclude the cooccurrence of multiple adjectives of the same type (Paul 2005). Many alternate cognitive-semantic proposals for AORs have been invoked. An earlier observation suggested that adjective order could be determined via Affective Load (Richards 1977) which is based on the "Polyanna Hypothesis" (Boucher & Osgood,1969). Speakers seem to prefer adjectives with positive connotations to be placed further away from the noun, relative to adjectives with negative connotations. This predicts that powerful dangerous medication is preferred over dangerous powerful medication. More recent studies have operationalized the absolute-relative distinction in terms of subjectivity with less subjective adjectives occurring closer to the noun (cf. Scontras et al., 2017). Subjectivity is understood in terms of consistency of judgements; because judgements about blueness are likely to be more consistent than judgements of bigness, blue occurs closet to the noun in a phrase such as the big blue box. In an intuitional study of English, Scontras' et al. (2017) find that degree of subjectivity accounts for a great deal of the variance in speaker preferences for adjective ordering. These approaches contrast with syntactic accounts of AORs, which argue that semantics alone cannot explain the clear limit on the number of non-coordinated adjectival phrases within the DP (since interpretability should be the only limiting factor). The most prominent syntactic analysis of AORs comes from cartographic approaches (cf. Cinque 1994;2010;Scott 2002), which hold that adjectives are merged in the specifiers of unique functional projections (FPs) between D and N. FPs are said to be driven by semantics, however few attempts have been made to link the proposed FPs with independently motivated projections within the DP (but see Svenonious 2008). Still, Cinque (2010) argues that the cross-linguistic distribution of adjective orders supports this account. His claims is that the only adjective orders that can be generated cross-linguistically are (6a, c, d): Cinque's typological classes do not fit the ordering proposed in Sproat & Shih's (1991), since in (6c) the relative property, size, is closer to the head noun than the absolute property, color. However, the possibility remains that (6c) could be derived via movement.
Other factors play a role in AORs, beyond syntax and semantics. Posner (1986) argues for the importance of morphology. By his account, phrases such as woolen white hat are preferred over white woolen hat as woolen is less noun-like compared to white due to the adjective suffix -en. Word frequency and length may have an additional effect on ordering preferences. Bock (1982) and Lapata et al. (1999) claim that higher frequency adjectives precede lower frequency adjectives. The ordering is hypothesized to be driven by the resting activation of words, where high frequency words have higher resting activation and thus, appear earlier. Word-length on the other hand affects AOR inversely in that shorter words are thought to precede longer words. Phonotactic preferences are also argued to play a role in adjectival ordering. Venneman (1988) andSchülter (2003) claim is that preference will be given to utterances with better syllable structure. Similarly, Cooper-Kulhen (1986), Gries & Wulff (2013) have observed that a similar preference holds for the ordering which provides the ideal stress pattern. For English, the ideal syllable structure involves consonant-vowel alternation and the ideal stress pattern is stressed-unstressed alternations. As such, a segment alternation constraint would yield a preference for utterances such as lovely bright eyes over bright lovely eyes as the latter leads to a consonant cluster at the morpheme boundary between bright and lovely as well as a vowel cluster at the boundary of lovely and eyes while the former follows a strict consonant-vowel alternation. A different constraint, rhythmic alternation yields a preference for utterances such as Chinese traditional band over traditional Chinese band, as the former follows the stressedunstressed alternation while the latter has a cluster of three consecutive unstressed syllables. No study so far integrates lexical semantic factors with phonetic and frequency factors.
In sum, despite clear crosslinguistic evidence for trends in AORs, variation is also present. Word order seems to be free when adjectives (a) are realized as separate phonological units (i.e. with comma intonation), or (b) represent indirect modifiers, i.e. modifiers outside the scope of DP and/or those that are visibly derived from a relative clause (see Sproat & Shih 1991 for a detailed account of the direct-indirect modifier distinction).

The case of Spanish
One way of accounting for variation in terms of AORs is to state that languages differ in terms of the inventory of modification strategies they employ. English has been described as a language with strict AORs that can only be broken via comma intonation. Mandarin Chinese also has strict AORs, but this can be broken via the RC de- (Sproat & Shih 1991). For still other languages, the degree to which AORs hold is up for debate. This is the case for Spanish, where little empirical data is available to assess variation. Cinque (2010) proposes an analysis for Italian, which he claims holds for Romance languages more generally. In this analysis, hierarchical order corresponds to the linear order of the DP Germanic languages. Adjectives that obey AORs (henceforth direct modification adjectives) are merged in the specifiers of dedicated FPs (i.e., value > size > shape > colour> nationality > relational), whereas unordered adjectives originating from a relative clause source (henceforth indirect modification adjectives) are merged higher. Cinque (2010) argues that the ordering of postnominal adjectives in Romance is the mirror image of prenominal adjectives in Germanic. This is true for both direct modification adjectives and sequences of direct and indirect modification adjectives. The default order of the Romance DP is derived via roll-up movement of the N past (at least) lower direct modification adjectives such as relational and nationality adjectives. Movement is optional above higher direct modification adjectives (i.e. color, shape, size, value). However, when such movement occurs, it is also of the roll-up type illustrated in (7a). Movements to derive the mirror image order of direct modification adjectives are followed by movement of the noun plus all direct modification adjectives to a position above indirect modification adjectives (7b).
This analysis has been criticized on two points. First, the observation that Spanish only allows for a reduced number of possible prenominal adjectives (Demonte 1999a, b) and second, the relatively free order of adjectives observed in Spanish, both prenominal and postnominal (Sanchez 1996(Sanchez , 2018Demonte 1999a, b). Demonte (1999a) distinguishes adverbial adjectives (posible 'possible', falso 'fake', frecuente 'frequent'), qualitative adjectives (delgado 'thin', divertida 'fun'), and relational adjectives (presidencial 'presidential', cardíaco 'cardiac'). Qualitative adjectives assign a property to the extension of the N or one of its sub-elements. Adverbial adjectives do not attribute a property to a N, but rather indicate the manner in which a concept or intension of the noun applies to a specific referent (in formal semantic terms they map properties to properties). Finally, relational adjectives are denominal; they denote a relation between the entity denoted by the noun and the entity denoted by the nominal root of the adjective. Demonte (1999b) further subdivides the class of adverbial adjectives that modify deverbal nouns into: modal epistemic (posible 'possible', presumible 'probable'), intensional (completo 'complete', simple 'simple', único 'unique', falso, 'false'), and circumstantial (antiguo 'old', frecuente, 'frequent'). Generally, only a single adjective can appear in prenominal position. In the limited instances where multiple adjectives occur before the noun, relatively free order obtains: modal epistemic and intensional adjectives are freely ordered with respect to qualitative adjectives (8) and amongst themselves (9). Moreover, despite a tendency to place modal epistemic adjectives higher than intensional (10a) and circumstancial adjectives (10b), clear counterexamples exist (11). Please note that in various examples glosses are given with the adjective-adjective sequence ordered in the linear order of the Spanish case under discussion, rather than with the order in which they would be grammatical in English. Consequently, all ungrammaticality marks reflect the intuitions about ordering of the Spanish phrase. In what follows, we put linear glossing of adjectival sequences in small caps to alert the reader. The first observation pertains to intersective adjectives, defined as those where the denotation of the complex NP is the intersection of the set of entities that are A (e.g., color, shape) with the set that are N. Intersective adjectives can be merged in any order (12); (12) Un vaso rojo oval de terracotta / un vaso oval rojo de terracotta 'A RED OVAL vase/OVAL RED vase of terracota' The second observation pertains to relative or non-intersective adjectives, that is, adjectives whose denotation is a) either context dependent (a small elephant ¹ a small animal; but see Kamp & Partee 1994 for a reanalysis of these adjectives as intersective) or b) whose denotation modifies a subpart of the noun as opposed to the entire referent (a good lawyer =good in their role as a lawyer ¹ a good person). These include: size, age, speed, value, and behavioral property adjectives. Relative adjectives can only be combined in coordinated structures or following an intersective adjective (13); (13) El libro viejo *(y) amarillo/ el libro amarillo viejo 'The old *(and) yellow book / the YELLOW OLD book' The third observation is about extreme property adjectives; these always appear at the end of an adjective sequence (14): (14) El libro grande maravilloso / *El libro maravilloso grande 'The BIG WONDERFUL book/ *The WONDERFUL BIG book' The fourth observation is about relational adjectives (i.e., denominal adjectives), which are known to maintain strict adjacency with the noun (15-16). These adjectives are compatible with further modification by qualitative adjectives (15)  These properties integrated together give us the ordering in (17): (17) N > Rel > other classes > extreme property Sánchez (2018) argues that even these rules are too strict; Spanish allows for stacking without ordering, as shown by the possibility of various permutations in (18) As shown by the grammaticality of the various alternations in (18), multiple permutations of color, size and nationality adjectives are available in the language. These are all acceptable but possibly not equally preferred across speakers (see Fábregas 2017).
We summarize the AORs proposed by these authors in (19) Sánchez (1996Sánchez ( , 2018: Color~Size~Nationality (no significant ordering predicted) To put the issue simply: What is the evidence available for adjective ordering in Spanish? Do we find evidence for a privileged position for relational adjectives, a point of general consensus? Do we find evidence for a broad level distinction between intersective and non-intersective adjectives, a point of agreement between Cinque (2010) and Demonte (1999)? Furthermore, do we find evidence for narrow ordering among intersective classes (nationality/color/shape) (Cinque 2010), or not (Sánchez 2018;Demonte 1999a,b)? Finally, is there an ordering within the non-intersective classes? Cinque predicts size to precede value, whereas Demonte suggest these two classes cannot co-occur unless linked by coordination. Empirical evidence is needed to compare these various proposals for Spanish. We extracted a set of pair-wise predictions about relative ordering in Spanish, given the main lexical classes discussed. Our approach is to extract frequencies from a natural language corpus, in order to test the statistical validity of each ordering pair independently.
(20) Predictions H1: Proximity of relational adjectives hypothesis Relational > Everything else (Demonte 1999a,b (Demonte 1999a,b) We also propose to expand the empirical scope of the existing discussion in Spanish AORs. Points of variation in the literature above concern post-nominal sequences of adjectives (henceforth noun-adjective-adjective or NAA). This is reasonable given that postnominal adjectives are considered the unmarked case in Spanish. However, Spanish also offers an alternative source of evidence: adjective-noun-adjective (ANA) sequences. To exploit this data, we must adopt the assumption that the two orderings are related derivationally. In a Cinque-type derivation, if the proposed roll-up movement that gives rise to mirror-image order stops at one adjective, one predicts that the prenominal adjective in ANA sequences, which is further from the nominal head in terms of scope, corresponds to the second adjective in NAA sequences. Let us call these A1 and A2. If this is correct, an A2NA1 sequence would be a structural alternative to NA1A2. In what follows, to unify the ordering across structure types, we will refer to adjective types as an ordered pair were (A, B) denotes a structure where type A is the hierarchically closest adjective (A1) to a noun, and B denotes the structurally further position (A2). The derivation is represented in (21).
(21) Roll-up movement derivation of two-adjective sequences If this assumption is correct, both types of sequences can serve as evidence about ordering restrictions. ANA sequences are quite common in general, but they are not explicitly considered in the literature. Our intuition is that certain combinations of modifier adjectives work better in the configuration where the Noun is interpolated between the two modifiers. Some speakers prefer (22b) over (22a). In the next section, we assess the proposals for adjective order in Spanish on the basis of corpus data. Three separate analyses were conducted, two for NAA orders and one for ANA.

Data
To examine semantic restrictions that explain preferred adjective ordering in Spanish, we initially extracted 816 NAA constructions of oral origin from the Genre/Historical Corpus del Español. A second round of searches extracted 2599 noun-adjective-adjective (NAA) sequences and 3499 adjective-noun-adjective (ANA) sequences found in the 2012 Google Ngram Corpus. The Genre/Historical Corpus del Español compiles over 100 million words from over 20,000 Spanish texts from the 1200s to the 1900s. Data from the 1900s is divided amongst spoken, fiction, newspaper and academic genres. The Google Ngram Corpus uses the contents of books digitalized on the platform from 1800 to present (divided by language) and outputs clusters of words and phrases (i.e., ngrams) and their usage frequency over time. The part-of-speech coding of these two corpora allowed for automated extraction of NAA and ANA sequences. To extract data from the Google Ngram Corpus we imported the readline_google_store function from the Python library 'google ngram downloader 4.0.1' (https://pypi.org/project/google-ngram-downloader/). The function allowed us to individually analyzes 5-gram files and download the relevant sequences. Unlike the Corpus del Español, the Google Ngram Corpus is limited insofar as the contextual information available; ngrams are obtained via collocation (i.e. they do not reference the semantics) and 3-gram or 5-gram databases are available. We used the 5gram database to better assess the validity of the string and optimize search results. For the Corpus del Español, we extracted data using the Keyword in Context (KWIC) search function to identify NAA sequences of oral origin.

Methods
For the Corpus del Español, we filtered out sequences not in Spanish (most notably in English), as well as sequences in Spanish containing grammatical categories misclassified as adjectives in the corpus. These included names (Rosa), prepositions (fracciones partidistas junto 'partisan fractions together', a sangre fría 'on cold blood') and adverbial phrases (cabeza mejor aireada 'better aired head', más crédito directo 'more direct credit'). We similarly excluded secondary predicates where the adjective did not refer to the preceding noun (te levantas al día siguiente fuerte 'You wake up strong the next day') and cases that constituted clear instances of reduced relative clauses (horarios respectivos fijados 'respective set schedules'), buen resultado obtenido 'good results obtained'). We also removed compounds, including noun-adjective (sistema nervioso central 'central nervous system', primer ministro británico 'British prime-minister) and adjectiveadjective (matrimonios mal avenidos 'Ill-suited marriages') compounds. Additionally, we filtered out sequences containing repeating adjectives (ejemplo típico típico 'A typical typical example'), sequences that were proper names (Fondo Monetario Internacional 'International Monetary Fund', Baja Edad Media 'Lower Middle Age'), and sequences that contained NAA constructions as a proper subset of longer sequences of multiple modifying adjectives (distintos pasos superiores diferentes 'distinct superior different steps', gran población andaluz compuesta 'great composite andalusian population'). After this selection process, 663 NAA sequences remained from the Corpus del Español.
For the Google Ngram Corpus, we also filtered out sequences not in Spanish (again, most notably tokens containing words in English) and sequences in Spanish containing grammatical categories misclassified as adjectives. In addition to the misclassified categories found in the Corpus del Español, we found misclassified disjunctions (fondos públicos o, 'Public fonds or…' o delito cometido 'Or crime committed) and verbs (ojos verdes brillan 'green eyes shine', izar bandera blanca 'raise (a) white flag'). These were filtered out, along with sequences containing nonsense syllables (c co co co co). The filtering process left us with 1008 NAA and 1214 ANA constructions from the Google Ngram Corpus.
The remaining sequences were analyzed as follows. Each adjective was classed according to the notionally based semantic typology presented in Table 1. We relied on the semantic classes in Cinque's (2010) hierarchy as a point of departure for our typology. We expanded this basic typology to include the categories physical property and human propensity from Dixon (1982). We further subdivided the category human propensity into internal state, behavioral property and physical state following Blackwell (2005) and Tribushinina et al (2014). In our typology, we also included a number of additional categories proposed by these authors such as: age, physical property, time and ordinals (cf. Blackwell, 2005;Tribushinina et al., 2014). Finally, we added a number of categories that proved to be frequent in our data, manner, place, possessive, and modal adjectives. We also counted quantifiers such as varios 'several', which have an adjectival distribution. Some adjectives corresponded to more than one semantic class in our hierarchy. Therefore, lexical classification involved a certain degree of context dependence, as many adjectives are polysemous, and their polysemy cuts across classes. Physical properties or states often shift into evaluative senses (grande can mean 'large' or 'possessing greatness'; the boundary between shape and size is blurry: gordo 'fat' can refer to either roundedness or heftiness). Often the classification depends on the modified noun (cf., paso rápido 'fast pace', and carro rápido 'fast car', where the adjective corresponds to manner and physical property, respectively). It is not known whether such shifts affect relative ordering in adjectival sequences, but there is extensive literature on adjective polysemy in the context of prenominal/postnominal adjectives (Fábregas 2017). Adjectives whose ambiguity was evident at the level of lexical distribution were assigned a semantic class according to the noun they modified. For example, the adjective mayor may refer to a person's age, i.e., 'elderly', or may it be quantificational as in 'greater'. The adjective mayor was coded as an age adjective in cases such as (23), but as a quantifier in (24). Some adjectives can be classified as value only in relation to the noun they modify (e.g., sana 'healthy', in finanzas públicas sanas 'Healthy public finances'. Yet others are classified as value independently of the noun in question (e.j. círculo central favorable). A lexical approach meant only the later were classified as value. For this reason, adjectives whose polysemy could only be established on the basis of the noun they modified were coded according to their most basic lexical classification. Two native speaker authors completed the classification. The coding procedure included identification of doubtful cases which were then subsequently resolved by agreement between the two coders. For the Corpus del Español, about a fourth of the data was coded jointly by two coders. For the second stage of the project, with the Google Ngram data, we recoded 5% of the data to calculate interrater classification. Reliability was relatively low (78% for NAA and 83% for ANA). This underscores how the high degree of contextual dependency affects the classification procedure, and casts doubt on current feasibility of full automation of adjective classification. For instance, in (25a) informativo could either be a relational adjective if we interpret it as 'educational' or a value adjective with the reading 'informative'. Similarly for (25b) azorado could either be interpreted as a behavioral state in the sense of 'embarassed' or a behavioral property as in 'flustered'.
(25) a. material informativo específico 'specific informational material' b. pequeño cuerpo azorado 'small flustered body' Beyond the problem of ambiguity, a more complex issue arises: the indeterminacy in what counts as intersective vs. non intersective among these lexically defined subtypes. We implement the distinction as below, while recognizing many potential problems; for instance, is sticky intersective or not? A sticky glue is certainly not the same as a sticky

Results
Relational adjectives were by far the most common type. They are overwhelmingly the type attested in first position. Because they are so frequent, the most common pairings consisted of a relational adjective followed by a subsequent contiguous relational adjective (280 tokens, 27% of the total data in the Ngram corpus; and 148 tokens, 22% of the Corpus data) other pairings emerge. Tables 2 and 3 show the frequencies of different pairings of NAA sequences attested in the Corpus del Español and the Google Ngram dataset, respectively, ordered by frequency of the pairing. Order listed in these tables is from the most to the least frequent, and the relative frequency of the dominant ordering is reported.
To evaluate the robustness of the dominant order for the various pairings of different adjective types, we further examined the proportion of orders where Type A precedes Type B. This analysis was limited to pairings with a frequency higher than 6 tokens. 2 Pairings were classified as categorical, biased or in free variation. Pairings were classified as categorical if the frequency of the order was more than 95% of the total frequency of the pairing (A,B) or biased if the frequency of the order was between 75% and 95% of the total frequency (Table 2). Less than 75% of an order was considered close to chance and classified as in free variation. We then calculated the probability of finding such distribution (i.e., the relative frequency of the ordering obtained, as compared to chance). Using the binomial test, we estimated the probability of obtaining at least the frequency of the ordering AB in a sample of the size equal to the total frequency of (A,B) on the null hypothesis that either the ordering AB or BA were equally likely (p=0.5). We used a 5% significance level.
Relational adjectives were overwhelmingly closest to the noun. For all corresponding pairings across the two NAA datasets, the position of the relational adjective closest to the noun was either biased or categorical; this is so for relational adjectives paired with adjectives of origin, value, time, manner, place, physical property and quantifiers. The only exception to this generalization was the order of relational and physical state adjectives. However, for relational and physical state adjectives differences occurred across NAA corpora; free variation was attested in the Corpus del Español, but relationals were categorically in 1 st position in the Google Ngram data. A number of additional pairings emerged that were only attested in a single data set. AORs attested in the Google Ngram data include adjectives of nationality/origin, which were biased to appear closer to the noun than adjectives of time, and color adjectives which categorically noun adjacent compared to adjectives expressing physical property. Free variation was attested for manner-value and quantifier-value pairings. For the Corpus del Español, additional pairings included relational adjectives, which were biased to appear before adjectives of size but in free variation with ordinal adjectives, and adjectives of nationality/origin, which were biased to occur before value adjectives. However, this latter contrast did not achieve significance. It is important to note that relational > physical state is categorical for NAA Google Ngram corpus, despite being in free variation in the Corpus del Español.   p>.05 (n.s.); p< .01 (*); p <.05 (**), p<.001 (***)  In sum, the two NAA analyses found clear support for the status of relational adjectives, which were either categorically adjacent to the noun or strongly biased. The findings of each analysis are summarized in (27) and (28) The ANA data from the Google Ngram Corpus was comparatively richer in terms of adjective pairings. Table 4 reports 26 class combinations that were robust enough for statistical evaluation, compared to the 11 and 12 distinct pairings in the Corpus del Español and Google Ngram NAA data sets, respectively. The pairings that were attested in the NAA data received further support here. Recall that the proposed equivalence between the two configurations is that the second adjective in a NAA sequence corresponds to the prenominal adjective of the ANA sequence.       ------------------Categorical (**) Significance codes: p>.05 (n.s.); p< .01 (*); p <.05 (**), p<.001 (***) Pairings that were also attested in the NAA data for the both the Google Ngram Corpus and the Corpus del Español, are consistent with the observations in the ANA corpus (29). Table 4 provides further support for the claim that relational adjectives are categorically closer to the noun. The evidence here also shows that relational adjectives were categorically closer to the noun (i.e. in post-nominal position) in the ANA sequences, compared to adjectives of: value, time, manner, physical property, physical state, size, age, ordinal, possessives and quantifiers. The relative ordering for adjectives of national origin and time are also consistent across ANA and NAA, as shown by (30)  (31) NAA: iglesia española contemporánea (N+ Origin +Time) (Origin>Time) CHURCH SPANISH CONTEMPORARY Nationality adjectives were categorically closer to the noun in the ANA data and biased to appear closer to the noun in the NAA data extracted from the Google Ngram Corpus. Similar findings hold for relational and size adjective; in the ANA data the relational adjectives categorically appeared in first position (closest to the noun), whereas in the NAA data from the Corpus del Español this was the biased ordering, as in (32)-(33).

REFORM FISCAL WIDE
Two pairings that appear as in free variation in the NAA datasets (albeit with low frequency) showed stricter order in the ANA dataset. This was so for relational-ordinal pairings, which was shown as categorically ordered in the ANA data, but in free variation in the Corpus del Español, and manner-value pairings which were biased in the ANA data, but showed free variation in the NAA Google Ngram data. A single pairing went in the opposite direction; color adjectives categorically appeared before physical property adjectives in the NAA Google Ngram data, but occurred in free variation in the ANA data.
Possessives also emerged as an additional category restricted to second position with respect to relational adjectives. A number of restrictions involving value adjectives also emerged. Value adjectives appeared closer to the noun than ordinals (biased) and quantifiers (biased), but further from the noun than color (biased) and modal (categorical). These additional biased ordering restrictions for ANA sequences are compatible with an extended hierarchy, but were not actually attested (or represented in sufficiently robust quantities to allow evaluation) in either of the NAA datasets. Figures 1 and 2 visually synthesize these ordering trends.

Discussion
Our data on sequences of two postnominal adjectives show that relational adjectives maintain strict adjacency to the noun. Most pairings including a relational adjective that were frequent enough for evaluation had either a biased or a categorical distribution. Two exceptions were physical state and ordinals, which appeared in free variation with relational adjectives in the Corpus dataset. From the few counterexamples we are forced to consider the possibility of compounding interfering in the classification (Enseñanza primaria 'Elementary education', cuerpo desnudo 'naked body', piso térmico 'termal floor'). However, relational and physical state show the expected ordering in the NAA Google Ngram data set. We thus note that the present analysis supports the privileged status of relational adjectives.
One other significant ordering restriction justified by the NAA data sets is the ordering of color and physical properties. Color adjectives appear to categorically precede physical properties. This observation can be thus subsumed under the intersectivity hypothesis. We also found non-significant evidence for a bias for origin to precede value, which is compatible with the intersectivity hypothesis.
We next consider evidence from the alternative configuration ANA, with the head noun appearing between two adjectives, under the assumption that the prenominal adjective corresponds to the second adjective in a postnominal sequence of adjectives (i.e., the one structurally and lineally furthest from the noun). Examining this construction enriches our understanding of hierarchical structure within the Spanish DP. This set of findings is compatible with the main result of the NAA data.
Relational adjectives categorically precede value, size, quantifiers, age, time, manner, physical properties, ordinals, physical state and possessive adjectives. This is unsurprising, given that relational adjectives are restricted to postnominal position. More interesting is the additional support for other ordering restrictions. Several pairings are compatible with the intersectivity hypothesis. First, we observe that nationality categorically precedes value, size, time and age. We also find color adjectives precede value adjectives. We found one example in NAA ngram dataset of color and size (34), not sufficient to test the existence of ordering restrictions.
(34) Uvas negras inmensas (color > size) 'Giant black grapes' Surprisingly, intersective adjective pairings, such as color and shape did not emerge. We did observe some evidence of articulation within the class of non-intersective adjectives: Value is biased to precede ordinals, and modals categorically precede quantifiers. Another pair that did not emerge was size and value. This is informative. For hypothesis 4, if size and value are in complementary distribution we should expect to find no tokens that combine their use. Indeed size and value pairings were not attested in the data.
In sum, the evidence shows strong support for the special position of relational adjectives and some support for the intersectivity hypothesis. To our surprise, the pairings relevant to hypothesis 3(color, shape) was too rare for evaluation. Unfortunately, these are the types that are relevant to assess the current questions about ordering restrictions within the Spanish DP. More work, possibly focusing searches on specific sets of terms, is needed to address this gap.

Conclusion
Does our evidence support strict hierarchies, as in a cartographic approach? Or is it more compatible with the view that the Spanish DP is more loosely structured? Here it might be useful to maintain the distinction between ANA and NAA data. After a careful assessment of the evidence we concluded that the possibility that relational adjectives can be flexibly ordered (relative to ordinal adjectives and physical state adjectives) can be discounted. The relational class seems clearly privileged for hierarchical proximity to the noun. Other than that, we found categorical ordering where color precedes other physical properties, but not much else. Value adjectives are shown to be freely ordered with respect to quantifiers and manner adjectives. In the ANA dataset, we noted that value adjectives participate in several significant ordering pairings: One additional observation is that value adjectives seem to appear in free variation with adjectives of time. We also observed that value was biased (albeit non-significantly) towards a more distant position than physical property. This distribution is what would be expected under the premise that the prenominal position is linked to evaluativeness (Bouchard 2002, Pettibone in preparation). Allowing for the possibility of focus movement within the NP (Demonte 1999(Demonte , 2008, we might have to rule out (35) as evidence for strong ordering. This leaves us with not much more articulation of ordering rules based on narrowly specified lexical classes beyond what is already captured under the intersectivity hypothesis.
Before concluding, we wish to acknowledge that our findings are subject to limitations inherent to our approach. First is the volume of data captured from these two corpora, which yielded interesting core observations but failed to adjudicate on some more refined hypothesis. A much larger (and more diverse) sample would be needed to determine whether we actually have evidence of absence, rather than absence of evidence. Our sampling was constrained in part by the need for human involvement in our classification procedure. As these corpora has been tagged in ways that can confuse Rosa, the person, and rosa, the color, we are not confident that there is current hope for mechanical implementation of the semantically sophisticated typologies. The other problem is that human classification was vulnerable to reliability issues, because of the complex lexical and contextual demands required to implement the semantic typologies. We remain commited to theoretical intuitions behind the abstract typologies are well founded; this is merely a comment on the challenges of implementation: extending the empirical scope of theoretical work to natural language corpora is hard but important work. These limitations aside, we observe that the evidence does not lend support to a fully articulated cartography of adjectival sequences in Spanish. Instead, it is most compatible with a middle ground approach, where ordering arises from a few ordering restrictions operating over entailment-based broad classes of adjectives.