Category Archives: Najera

Stop Anthropomorphizing Literary Periods, or Why the Most Frequent Words Don’t Matter.

Looking for a method to trace the evolution of Gutiérrez Nájera’s poetry from one period to another, I came across an article published
by David Hoover earlier this year. In the essay, Hoover contrasts the word frequencies in three of Henry James’novels, each one written at a different stage in the writer’s career, to analyze their stylistic changes. The article, “A Conversation Among Himselves: Change and the Styles of Henry James” (Chapter 5, in Hoover, Culpeper and O’Halloran. APPROACHES TO CORPUS STYLISTICS, Routledge, 2014), employs a interesting system for comparing the frequencies of the three periods. Hoover assigns a pattern to each word depending on whether its frequency is different in the three periods, the same in two of the three periods, or the the same in all periods. For example, a pattern like LMH (“L” = low, “M” = medium, “H” = high) indicates that a word increased in frequency from the first period to the last. The number of possible patterns is thirteen:

HML, HLM, MLH, MHL, LHM, LMH if the frequencies are different, HLL, LLH, MLL, LLM, HMM, MHH, if for two of the periods the word has same frequency, and LLL (or HHH) if the frequency is the same for all three periods.

Loosely employing the periodization presented in my previous post, I divided Gutiérrez Nájera’s poetry in three major periods, from 1875 to1879, 1880 to1887, and 1888 to 1895. I combined all the poems from each period to create a “text” and then, following Hoover’s example, I reduced the two of the samples to the size of the smallest one, by simply eliminating the part of the “text” exceeding that size. (It occurs to me now that selecting a random sample of the text is possibly a better approach). “Texts” are then tokenized and, according to their frequency, tokens are classified into one of the thirteen possible patterns. Hoover is, of course, known for his use of Excel spreadsheets to perform his text-analyses, but the idea for this technique is simple enough that a few lines of code in R can easily allow us to assign a pattern to each token.

Here is an RStudio image of part of the matrix resulting from applying this classification technique.


Employing these patterns to study the changes in a writer’s style works quite well, producing interesting insights, as Hoover himself shows in his article. In Henry Jame’s case, Hoover is not only interested in words that “show substatial change” across the periods, but also in the MFWs within each pattern (He develops an interesting alternative to determining frequencies, which I will not addressed here, but it is thoroughly explained in his article). The uniqueness of Gutiérrez Najera’s poetic corpus, however, led me into a different direction.

The dates of Gutiérrez Najera’s “late” stylistic period, 1888-1895, coincide with the beginning of the modernista movement. This of course means that the changes in his style towards the end of his life, are not only “personal” changes, but they also could be signs of the advent of a new literary period in Spanish American letters. Patterns such as LLH, LMH and even MLH, which identify words with higher frequencies in the late period, are also possibly pointing to modernista words that have become influential in the late 19th century.

I suppose that until now I have been guilty of anthropomorphizing literary periods. I have assumed that strategies for analyzing a writer’s style can be used to understand the “style” of a literary period. Assembling MFWs list of modernista words has so far led me to frustrating results, and perhaps I should be focusing on words that either emerge or increase in frequency in a literary period in relation to the previous literary period. Ideally, for an analysis of this kind, I would need really “big data,” which I do not have at the moment.

Unlike Hoover, I am not interested in the top words belonging to a specific pattern. Any word that appears overrepresented in the late period in relation to the other periods is interesting because it might indicate the emergence of a new language. Thus, a modernista word like “ninfas,” which follows the pattern LLH, would not appear in the MFWs list because its frequency in the text is not high enough. But if we consider that “ninfas” went from not appearing in the first two periods to appearing five times in the third one (0-0-5), one must acknowledge this change as a significant one (esp. in poetry). In contrast, a token that goes from 11 and 11 to 13, is less relevant for determining the style of a period, but it would probably appear as part of the MFWs because of its high frequency.

To notice the difference between the two methods for obtaining the most significant words in a literary period (MFWs vs pattern analysis), let’s take a look at the top 150 MFWs for Gutierrez Najera’s 1888-1895 period:

 [1] "la"        "de"        "y"         "el"        "en"       
  [6] "que"       "a"         "las"       "los"       "no"       
 [11] "se"        "con"       "qué"       "es"        "mi"       
 [16] "al"        "del"       "su"        "por"       "tu"       
 [21] "ya"        "como"      "me"        "un"        "para"     
 [26] "lo"        "te"        "si"        "sus"       "muy"      
 [31] "mis"       "tus"       "todo"      "pero"      "alma"     
 [36] "amor"      "más"       "ni"        "una"       "oh"       
 [41] "yo"        "cuando"    "vida"      "dios"      "son"      
 [46] "tan"       "tú"        "flores"    "está"      "sin"      
 [51] "le"        "mar"       "noche"     "luz"       "entre"    
 [56] "esa"       "sombra"    "blanca"    "ha"        "porque"   
 [61] "hay"       "o"         "ojos"      "triste"    "mañana"   
 [66] "nos"       "ser"       "así"       "casa"      "cielo"    
 [71] "quién"     "rosas"     "va"        "cual"      "alas"     
 [76] "hasta"     "poeta"     "brazos"    "siempre"   "también"  
 [81] "versos"    "azul"      "cómo"      "ella"      "fin"      
 [86] "fué"       "labios"    "madre"     "amores"    "blancas"  
 [91] "mas"       "sólo"      "amante"    "bien"      "dos"      
 [96] "era"       "hermosa"   "primavera" "sé"        "sobre"    
[101] "sueño"     "tal"       "tiene"     "tristes"   "día"      
[106] "dolor"     "ese"       "espera"    "esperanza" "muerte"   
[111] "nada"      "pues"      "quien"     "rosa"      "señor"    
[116] "aquí"      "ay"        "blanco"    "bueno"     "mientras" 
[121] "musa"      "nadie"     "nunca"     "ondas"     "parece"   
[126] "queda"     "ti"        "tierra"    "todas"     "todos"    
[131] "vez"       "viene"     "aire"      "ama"       "beso"     
[136] "buena"     "coro"      "él"        "eres"      "hoy"      
[141] "luego"     "poco"      "voz"       "acaso"     "almas"    
[146] "altar"     "belleza"   "busca"     "busco"     "cuán" 

The following are some of the words with the patterns LLH, LMH, MLH for the same period, listed alphabetically, without taking into consideration the frequencies.


  [1] "acero"      "acude"      "acuerdo"    "afrodita"   "algunos"   
  [6] "alto"       "amada"      "ambiente"   "ancha"      "apaga"     
 [11] "apagados"   "aparece"    "arena"      "arte"       "azahares"  
 [16] "bajar"      "bonito"     "bosque"     "bote"       "botones"   
 [21] "brillan"    "brillante"  "brillantes" "buenos"     "calla"     
 [26] "callada"    "calles"     "cauda"      "cerca"      "cisnes"    
 [31] "copa"       "copas"      "correr"     "cristo"     "cuanto"    
 [36] "daré"       "déjame"     "dejan"      "dejemos"    "dí"        
 [41] "día"        "dichoso"    "digno"      "dió"        "dioses"    
 [46] "dura"       "edad"       "encaje"     "encanto"    "enciende"  
 [51] "entreabre"  "envuelto"   "escalera"   "esposo"     "estatua"   
 [56] "fronda"     "fue"        "fuerza"     "gardenia"   "gracia"    
 [61] "grecia"     "griega"     "guerrero"   "haber"      "hadas"     
 [66] "heladas"    "hizo"       "hombros"    "id"         "ideas"     
 [71] "iras"       "licor"      "lirios"     "mala"       "mayor"     
 [76] "mire"       "modo"       "muñeca"     "naranjos"   "naturaleza"
 [81] "ninfas"     "nota"       "nuestra"    "obscuras"   "olvides"   
 [86] "pasar"      "perezoso"   "peso"       "pide"       "piensa"    
 [91] "pierde"     "plantas"    "plumaje"    "prometida"  "puerto"    
 [96] "puñal"      "querido"    "quita"      "raudos"     "regatas"   
[101] "riqueza"    "roban"      "roca"       "rocas"      "saben"     
[106] "secas"      "señores"    "senos"      "sentí"      "sentir"    
[111] "sigue"      "subir"      "tener"      "tengo"      "tiembla"   
[116] "tocar"      "toda"       "toma"       "última"     "venid"     
[121] "verde"      "vestidos"   "ví"         "viendo"     "vivo"      
[126] "volcán"     "vuelva"


  [1] "abandona"    "abrir"       "acaso"       "ah"          "alameda"
  [6] "alas"        "álbum"       "alguno"      "alondra"     "altar"
 [11] "amado"       "amantes"     "amiga"       "amigos"      "apacible"
 [16] "aprisa"      "arco"        "áureo"       "baja"        "barca"
 [21] "barranco"    "blanco"      "bocas"       "breves"      "buena"
 [26] "buenas"      "bueno"       "busca"       "busco"       "cae"
 [31] "caja"        "calle"       "campo"       "cantan"      "cantando"
 [36] "cariños"     "casa"        "cautiva"     "cirios"      "ciudad"
 [41] "claridad"    "compasión"   "conchas"     "corales"     "coro"
 [46] "cosas"       "cristal"     "cuánta"      "cuántas"     "cuánto"
 [51] "cuántos"     "cuentos"     "da"          "débil"       "decir"
 [56] "descansa"    "desnuda"     "desnudo"     "dicen"       "dije"
 [61] "dijo"        "dónde"       "é"           "en"          "enlutada"
 [66] "entré"       "esas"        "escuela"     "espera"      "esta"
 [71] "están"       "estás"       "estremece"   "fiesta"      "fin"
 [76] "follaje"     "frescas"     "gallardo"    "gran"        "guantes"
 [81] "guardó"      "hablan"      "hablar"      "hace"        "hada"
 [86] "hago"        "hay"         "hermana"     "hermosa"     "hondo"
 [91] "húmedas"     "huyeron"     "i"           "impacientes" "inmensa"
 [96] "iracundo"    "juega"       "jugando"     "juguetona"   "laurel"
[101] "ligera"      "lo"          "luego"       "malo"        "mamá"       
[106] "mañana"      "mar"         "marfil"      "margarita"   "mariposas"  
[111] "metal"       "mías"        "misa"        "muda"        "mudas"      
[116] "mueren"      "musa"        "muy"         "nadie"       "negras"     
[121] "ni"          "nieve"       "niños"       "no"          "noches"     
[126] "novia"       "nuestro"     "nuevas"      "oh"          "oís"        
[131] "otra"        "padres"      "paje"        "papá"        "para"       
[136] "pasa"        "pedestal"    "pensando"    "pero"        "piernas"    
[141] "plata"       "plumas"      "pobres"      "poco"        "poesía"     
[146] "poeta"       "primavera"   "primero"     "príncipe"    "pues"       
[151] "qué"         "queda"       "quedan"      "quieren"     "quiso"      
[156] "recuerdos"   "risa"        "risas"       "rojas"       "rojos"      
[161] "rosa"        "rosas"       "rubias"      "rumor"       "sabe"       
[166] "salid"       "sangre"      "sí"          "silencio"    "sino"       
[171] "solas"       "sollozando"  "solo"        "sombra"      "son"        
[176] "soñadora"    "sonriendo"   "suelto"      "tenue"       "tienden"    
[181] "tienen"      "tímida"      "todo"        "trémulas"    "trenzas"    
[186] "tristezas"   "túnica"      "tuve"        "unos"        "vamos"      
[191] "vela"        "versos"      "viaje"       "visto"       "vivos"      
[196] "volar"       "vuelan"      "ya" 


  [1] "a"           "allá"        "alzarse"     "ama"         "amante"
  [6] "amorosa"     "años"        "ay"          "azul"        "belleza"
 [11] "beso"        "besos"       "blancas"     "brazos"      "brotó"
 [16] "brumas"      "caer"        "canción"     "cayó"        "cierto"
 [21] "conoce"      "cosa"        "creo"        "cuadro"      "cuán"
 [26] "dan"         "dar"         "dicho"       "dios"        "donde"
 [31] "es"          "esa"         "ese"         "esposa"      "espumas"
 [36] "eternamente" "existe"      "fresca"      "fué"         "fueron"
 [41] "gentil"      "gigantes"    "gracias"     "grana"       "grito"
 [46] "ha"          "hacer"       "haré"        "haz"         "herido"
 [51] "hermosas"    "hermoso"     "hermosura"   "hijas"       "hijo"
 [56] "huerto"      "huyen"       "infeliz"     "infinita"    "instante"
 [61] "joven"       "las"         "le"          "leve"        "llama"
 [66] "llena"       "manto"       "meses"       "mío"         "muchas"
 [71] "muerto"      "muñecas"     "muros"       "océano"      "olor"
 [76] "ondas"       "orillas"     "otras"       "pálida"      "palidez"
 [81] "palomas"     "parecen"     "patria"      "pecado"      "pechos"
 [86] "piadosa"     "pido"        "pluma"       "porque"      "prosa"
 [91] "quién"       "quien"       "quieras"     "rabia"       "razón"
 [96] "responde"    "retozan"     "risueño"     "rompe"       "salud"

A quick look at the modernista vocabulary present in the MFWs reveal that many of the typical modernista words appear in it (azul, flores, rosas, primavera), as well as in the groups selected by patterns. In the MFWs, as it is expected of any list for stylistic analysis, too many of the top words are not useful for detecting a modernista vocabulary (la, de, y, el, en). In contrast, employing the patterns one finds interesting words that do not appear among the top MFWs: grecia, griega, pálidas, venus, marfil, and many others.


Periodizing Modernista Poetry

I. Intro

Gutiérrez Nájera never published his poems in the form of a book. They appeared in the numerous newspapers for which he worked and, after he passed away (1895), his friends collected them in a single volume, along with a preface written by Julio Sierra. The book (you can find a copy of it in included 158 poems to which modernista scholars such as Mapes, Boyd C Carter, González Guerrero and others have continued to add new texts throughout the years. At the moment, according to Angel Muñoz Fernández, there are 235 poems attributed to the Mexican poet (13). It is usually assumed that Nájera, as a poet, had a “youthful” and a “mature” artistic periods, but there is no clear consensus about when one period ends and the other begins. The closest thing we have to a periodization of his poetry is the grouping of the poems introduced by González Guerrero in his 1953 edition of Poesías completas. Even though he divides Nájera’s poetic work into several chronological periods, González Guerrero also groups them according to themes and poetic forms. The critic’s seemingly chaotic periodization goes as follows: Under the general heading of “Primeras Poesías,” he adds two subdivisions, “La fe de mi infancia” (1875-1881) and “Trovas de amor” (1875-1880). The rest of the poems are placed in the following sections: “Otros poemas juveniles” (1877-1881), “Caminos del viento” (1880-1883), “Ala y abismo” (1884-1887), “Elegías” (1887-1890), “Nuevas canciones” (1888-1895), “Odas breves” (No dates given), “Poesías varias” (1876-1891), “Versiones” (1880-1884). The last group contains Nájera’s translations of French poems, some of which, at one point, were mistaken for original creations. One could argue that González Guerrero divides Nájera’s poetic trajectory into a youthful period that goes from 1875 to 1881, a transitional period from 1880-1883, a middle period, from 1884-1887, and a mature period that goes from 1888 to 1895.

My objective was to  apply a stylometric analysis to Nájera’s poetry with the purpose of creating a new periodization. In the next two sections of this post, I will summarize the problems I had with preparing the data and with some of technical aspects of the analytical process. If you prefer, you can jump to the last section of the post, in which I contrast my results to González Guerrero’s and propose a new periodization of Nájera’poetic work.

II. The 1896 edition and its afterlife

Although a total of 235 poems are recognized as forming Nájera’s poetic corpus, that number also includes poems translated from French literature and at least one poem written entirely in French. I excluded those from my analysis bringing down the total to 220 poems. The biggest problem in classifying the poems, however, had to do with the dates of composition and/or publication. The 1896 posthumous edition was supposed to be organized chronologically, but many of the texts do not follow that order, and many others have no date assigned to them. None of the scholars in charge of the editions of Nájera’s poems that came after, fixed the problem, often simply reproducing the composition/publication dates found in the 1896 edition. Angel Muñoz Fernández’s comments, in his preface to the 2000 edition of Nájera’s poetry (which contains a facsimile of the 1896 edition, of course), describes the complexity of the problem: “Revisando algunos diarios de la época, encontré que el célebre ‘Francia y México’, con fecha 1882 en la edición de 1896, fue publicado en El Nacional el 5 de mayo de 1881, apareciendo junto al título la fecha 1879, que pudiera corresponder al año en que el poema fue escrito” (17).

I was unable to determine the date of a total of 34 poems, bringing down the number of poems I could use for my analysis to 186.

III. Length, etc

The technical side of the project created additional problems. Initially, I envisioned grouping Nájera’s poems by year, and treating each year as if it were a single text. I would then tokenize the poems and get the word counts and frequencies in relation to that year alone. However, Nájera had a very uneven poetic production and some periods were more productive than others. Some years he wrote so few poems that it became impossible to get an accurate author signal because there were not enough tokens per year of production. In his paper, “Does Size Matter? Authorship Attribution, Short Samples, Big Problem,” Maciej Eder argues that the current methods for doing stylometric analysis do not allow the study of very short texts: “using 2,000-word samples will hardly provide a reliable result, to say nothing of shorter texts.” The number of words needed to get an accurate authorship signal in a text varies. With regard to poetry, Eder explains that in his experiment “the results for the three poetic corpora (Greek, Latin, English) proved ambiguous, suggesting that some 3,000 words or so would be usually enough, but significant misclassification would also occur occasionally.” To analyze Nájera’s poetic corpus, I combined the texts from adjacent years in order to create two-year periods with around 4000 words. Only a few of the years surpassed the 4000 token mark and I left those by themselves. I was forced to create a multi-year period for the last years of Nájera’s life because of his extremely low production during that time.

FECHA                   TAMAÑO DEL “TEXTO”

1879 ----------------------- 8611  

1880 ----------------------- 5896  

1881 ----------------------- 4352  

1875-1876 ------------------ 5461  

1877-1878 ------------------ 8594  

1882-1883 ------------------ 3995  

1884-1885 ------------------ 6073 

1886-1887 ------------------ 8648  

1888-1889 ------------------ 7335  

1890-1895 ------------------ 9541  

After combining the years to obtain a higher token number, I compared the style for each time period employing as my classification method, Burrow’s Delta with zscores. The following images show the results, employing 150 of the most frequent words


and with 300 MFWs


I did not eliminate any pronouns or overrepresented words. I have yet to apply other methods (such as SVM and PCA) to this data.

IV. Periodization.

In spite of all the problems related to dating the poems and length of samples, the stylometric analysis I performed makes it possible to propose a new periodization of Nájera’s poetic work (however provisional it might be). Looking at the following visualization of the classification resulting from the Burrow’s Delta method, the first thing one notices is how a cluster formed with the poetry from 1875 to around 1878/1879 (1879 often appears completely disconnected from the periods coming before and after). In González Guerrero’s view Nájera’s youthful period last until 1881, but in the stylometric analysis, the years from 1880 to around 1887 show strong similarities among them, almost always grouped together.

I was obviously concerned about having influenced the periodization by my creating two year periods to obtain a higher number of tokens. Addressing this problem was especially significant to determine when the transition from the middle to the mature period took place. González Guerrero employed 1888 as the year marking the beginning of Nájera’s last poetic period. When I tried combining 88-89 and 90-95, these two groups tended to move closer to each other than to the other 1880s groups. I then left 1888 by itself (there were enough tokens in that year to do that—over 5000) and created two more groups, 89-90 and 91-95. In this occasion 1888 moved toward 89-90, but not as close to 91-95 as I expected. The higher the number of MFW used, the more 91-95 distanced itself from the late 1880s. In other words, Nájera’s style definitely underwent a change in towards 1888 (possibly marking the beginning of a transitional period that goes until 1890?), but it is not clear that the last period of his poetry began as early as 1888.


To Do:

  • use of other classification methods such as SVM or PCA
  • analysis of the change of vocabulary from the 1870s to the 1890s (topic modeling needed?)
  • Adding prose documents to corpora. Establishing the publication date of those appears to be easier (should I assume that the difference between Nájera ‘s poetic style and his prose style is not significant?)