Stop Anthropomorphizing Literary Periods, or Why the Most Frequent Words Don’t Matter.

Looking for a method to trace the evolution of Gutiérrez Nájera’s poetry from one period to another, I came across an article published
by David Hoover earlier this year. In the essay, Hoover contrasts the word frequencies in three of Henry James’novels, each one written at a different stage in the writer’s career, to analyze their stylistic changes. The article, “A Conversation Among Himselves: Change and the Styles of Henry James” (Chapter 5, in Hoover, Culpeper and O’Halloran. APPROACHES TO CORPUS STYLISTICS, Routledge, 2014), employs a interesting system for comparing the frequencies of the three periods. Hoover assigns a pattern to each word depending on whether its frequency is different in the three periods, the same in two of the three periods, or the the same in all periods. For example, a pattern like LMH (“L” = low, “M” = medium, “H” = high) indicates that a word increased in frequency from the first period to the last. The number of possible patterns is thirteen:

HML, HLM, MLH, MHL, LHM, LMH if the frequencies are different, HLL, LLH, MLL, LLM, HMM, MHH, if for two of the periods the word has same frequency, and LLL (or HHH) if the frequency is the same for all three periods.

Loosely employing the periodization presented in my previous post, I divided Gutiérrez Nájera’s poetry in three major periods, from 1875 to1879, 1880 to1887, and 1888 to 1895. I combined all the poems from each period to create a “text” and then, following Hoover’s example, I reduced the two of the samples to the size of the smallest one, by simply eliminating the part of the “text” exceeding that size. (It occurs to me now that selecting a random sample of the text is possibly a better approach). “Texts” are then tokenized and, according to their frequency, tokens are classified into one of the thirteen possible patterns. Hoover is, of course, known for his use of Excel spreadsheets to perform his text-analyses, but the idea for this technique is simple enough that a few lines of code in R can easily allow us to assign a pattern to each token.

Here is an RStudio image of part of the matrix resulting from applying this classification technique.


Employing these patterns to study the changes in a writer’s style works quite well, producing interesting insights, as Hoover himself shows in his article. In Henry Jame’s case, Hoover is not only interested in words that “show substatial change” across the periods, but also in the MFWs within each pattern (He develops an interesting alternative to determining frequencies, which I will not addressed here, but it is thoroughly explained in his article). The uniqueness of Gutiérrez Najera’s poetic corpus, however, led me into a different direction.

The dates of Gutiérrez Najera’s “late” stylistic period, 1888-1895, coincide with the beginning of the modernista movement. This of course means that the changes in his style towards the end of his life, are not only “personal” changes, but they also could be signs of the advent of a new literary period in Spanish American letters. Patterns such as LLH, LMH and even MLH, which identify words with higher frequencies in the late period, are also possibly pointing to modernista words that have become influential in the late 19th century.

I suppose that until now I have been guilty of anthropomorphizing literary periods. I have assumed that strategies for analyzing a writer’s style can be used to understand the “style” of a literary period. Assembling MFWs list of modernista words has so far led me to frustrating results, and perhaps I should be focusing on words that either emerge or increase in frequency in a literary period in relation to the previous literary period. Ideally, for an analysis of this kind, I would need really “big data,” which I do not have at the moment.

Unlike Hoover, I am not interested in the top words belonging to a specific pattern. Any word that appears overrepresented in the late period in relation to the other periods is interesting because it might indicate the emergence of a new language. Thus, a modernista word like “ninfas,” which follows the pattern LLH, would not appear in the MFWs list because its frequency in the text is not high enough. But if we consider that “ninfas” went from not appearing in the first two periods to appearing five times in the third one (0-0-5), one must acknowledge this change as a significant one (esp. in poetry). In contrast, a token that goes from 11 and 11 to 13, is less relevant for determining the style of a period, but it would probably appear as part of the MFWs because of its high frequency.

To notice the difference between the two methods for obtaining the most significant words in a literary period (MFWs vs pattern analysis), let’s take a look at the top 150 MFWs for Gutierrez Najera’s 1888-1895 period:

 [1] "la"        "de"        "y"         "el"        "en"       
  [6] "que"       "a"         "las"       "los"       "no"       
 [11] "se"        "con"       "qué"       "es"        "mi"       
 [16] "al"        "del"       "su"        "por"       "tu"       
 [21] "ya"        "como"      "me"        "un"        "para"     
 [26] "lo"        "te"        "si"        "sus"       "muy"      
 [31] "mis"       "tus"       "todo"      "pero"      "alma"     
 [36] "amor"      "más"       "ni"        "una"       "oh"       
 [41] "yo"        "cuando"    "vida"      "dios"      "son"      
 [46] "tan"       "tú"        "flores"    "está"      "sin"      
 [51] "le"        "mar"       "noche"     "luz"       "entre"    
 [56] "esa"       "sombra"    "blanca"    "ha"        "porque"   
 [61] "hay"       "o"         "ojos"      "triste"    "mañana"   
 [66] "nos"       "ser"       "así"       "casa"      "cielo"    
 [71] "quién"     "rosas"     "va"        "cual"      "alas"     
 [76] "hasta"     "poeta"     "brazos"    "siempre"   "también"  
 [81] "versos"    "azul"      "cómo"      "ella"      "fin"      
 [86] "fué"       "labios"    "madre"     "amores"    "blancas"  
 [91] "mas"       "sólo"      "amante"    "bien"      "dos"      
 [96] "era"       "hermosa"   "primavera" "sé"        "sobre"    
[101] "sueño"     "tal"       "tiene"     "tristes"   "día"      
[106] "dolor"     "ese"       "espera"    "esperanza" "muerte"   
[111] "nada"      "pues"      "quien"     "rosa"      "señor"    
[116] "aquí"      "ay"        "blanco"    "bueno"     "mientras" 
[121] "musa"      "nadie"     "nunca"     "ondas"     "parece"   
[126] "queda"     "ti"        "tierra"    "todas"     "todos"    
[131] "vez"       "viene"     "aire"      "ama"       "beso"     
[136] "buena"     "coro"      "él"        "eres"      "hoy"      
[141] "luego"     "poco"      "voz"       "acaso"     "almas"    
[146] "altar"     "belleza"   "busca"     "busco"     "cuán" 

The following are some of the words with the patterns LLH, LMH, MLH for the same period, listed alphabetically, without taking into consideration the frequencies.


  [1] "acero"      "acude"      "acuerdo"    "afrodita"   "algunos"   
  [6] "alto"       "amada"      "ambiente"   "ancha"      "apaga"     
 [11] "apagados"   "aparece"    "arena"      "arte"       "azahares"  
 [16] "bajar"      "bonito"     "bosque"     "bote"       "botones"   
 [21] "brillan"    "brillante"  "brillantes" "buenos"     "calla"     
 [26] "callada"    "calles"     "cauda"      "cerca"      "cisnes"    
 [31] "copa"       "copas"      "correr"     "cristo"     "cuanto"    
 [36] "daré"       "déjame"     "dejan"      "dejemos"    "dí"        
 [41] "día"        "dichoso"    "digno"      "dió"        "dioses"    
 [46] "dura"       "edad"       "encaje"     "encanto"    "enciende"  
 [51] "entreabre"  "envuelto"   "escalera"   "esposo"     "estatua"   
 [56] "fronda"     "fue"        "fuerza"     "gardenia"   "gracia"    
 [61] "grecia"     "griega"     "guerrero"   "haber"      "hadas"     
 [66] "heladas"    "hizo"       "hombros"    "id"         "ideas"     
 [71] "iras"       "licor"      "lirios"     "mala"       "mayor"     
 [76] "mire"       "modo"       "muñeca"     "naranjos"   "naturaleza"
 [81] "ninfas"     "nota"       "nuestra"    "obscuras"   "olvides"   
 [86] "pasar"      "perezoso"   "peso"       "pide"       "piensa"    
 [91] "pierde"     "plantas"    "plumaje"    "prometida"  "puerto"    
 [96] "puñal"      "querido"    "quita"      "raudos"     "regatas"   
[101] "riqueza"    "roban"      "roca"       "rocas"      "saben"     
[106] "secas"      "señores"    "senos"      "sentí"      "sentir"    
[111] "sigue"      "subir"      "tener"      "tengo"      "tiembla"   
[116] "tocar"      "toda"       "toma"       "última"     "venid"     
[121] "verde"      "vestidos"   "ví"         "viendo"     "vivo"      
[126] "volcán"     "vuelva"


  [1] "abandona"    "abrir"       "acaso"       "ah"          "alameda"
  [6] "alas"        "álbum"       "alguno"      "alondra"     "altar"
 [11] "amado"       "amantes"     "amiga"       "amigos"      "apacible"
 [16] "aprisa"      "arco"        "áureo"       "baja"        "barca"
 [21] "barranco"    "blanco"      "bocas"       "breves"      "buena"
 [26] "buenas"      "bueno"       "busca"       "busco"       "cae"
 [31] "caja"        "calle"       "campo"       "cantan"      "cantando"
 [36] "cariños"     "casa"        "cautiva"     "cirios"      "ciudad"
 [41] "claridad"    "compasión"   "conchas"     "corales"     "coro"
 [46] "cosas"       "cristal"     "cuánta"      "cuántas"     "cuánto"
 [51] "cuántos"     "cuentos"     "da"          "débil"       "decir"
 [56] "descansa"    "desnuda"     "desnudo"     "dicen"       "dije"
 [61] "dijo"        "dónde"       "é"           "en"          "enlutada"
 [66] "entré"       "esas"        "escuela"     "espera"      "esta"
 [71] "están"       "estás"       "estremece"   "fiesta"      "fin"
 [76] "follaje"     "frescas"     "gallardo"    "gran"        "guantes"
 [81] "guardó"      "hablan"      "hablar"      "hace"        "hada"
 [86] "hago"        "hay"         "hermana"     "hermosa"     "hondo"
 [91] "húmedas"     "huyeron"     "i"           "impacientes" "inmensa"
 [96] "iracundo"    "juega"       "jugando"     "juguetona"   "laurel"
[101] "ligera"      "lo"          "luego"       "malo"        "mamá"       
[106] "mañana"      "mar"         "marfil"      "margarita"   "mariposas"  
[111] "metal"       "mías"        "misa"        "muda"        "mudas"      
[116] "mueren"      "musa"        "muy"         "nadie"       "negras"     
[121] "ni"          "nieve"       "niños"       "no"          "noches"     
[126] "novia"       "nuestro"     "nuevas"      "oh"          "oís"        
[131] "otra"        "padres"      "paje"        "papá"        "para"       
[136] "pasa"        "pedestal"    "pensando"    "pero"        "piernas"    
[141] "plata"       "plumas"      "pobres"      "poco"        "poesía"     
[146] "poeta"       "primavera"   "primero"     "príncipe"    "pues"       
[151] "qué"         "queda"       "quedan"      "quieren"     "quiso"      
[156] "recuerdos"   "risa"        "risas"       "rojas"       "rojos"      
[161] "rosa"        "rosas"       "rubias"      "rumor"       "sabe"       
[166] "salid"       "sangre"      "sí"          "silencio"    "sino"       
[171] "solas"       "sollozando"  "solo"        "sombra"      "son"        
[176] "soñadora"    "sonriendo"   "suelto"      "tenue"       "tienden"    
[181] "tienen"      "tímida"      "todo"        "trémulas"    "trenzas"    
[186] "tristezas"   "túnica"      "tuve"        "unos"        "vamos"      
[191] "vela"        "versos"      "viaje"       "visto"       "vivos"      
[196] "volar"       "vuelan"      "ya" 


  [1] "a"           "allá"        "alzarse"     "ama"         "amante"
  [6] "amorosa"     "años"        "ay"          "azul"        "belleza"
 [11] "beso"        "besos"       "blancas"     "brazos"      "brotó"
 [16] "brumas"      "caer"        "canción"     "cayó"        "cierto"
 [21] "conoce"      "cosa"        "creo"        "cuadro"      "cuán"
 [26] "dan"         "dar"         "dicho"       "dios"        "donde"
 [31] "es"          "esa"         "ese"         "esposa"      "espumas"
 [36] "eternamente" "existe"      "fresca"      "fué"         "fueron"
 [41] "gentil"      "gigantes"    "gracias"     "grana"       "grito"
 [46] "ha"          "hacer"       "haré"        "haz"         "herido"
 [51] "hermosas"    "hermoso"     "hermosura"   "hijas"       "hijo"
 [56] "huerto"      "huyen"       "infeliz"     "infinita"    "instante"
 [61] "joven"       "las"         "le"          "leve"        "llama"
 [66] "llena"       "manto"       "meses"       "mío"         "muchas"
 [71] "muerto"      "muñecas"     "muros"       "océano"      "olor"
 [76] "ondas"       "orillas"     "otras"       "pálida"      "palidez"
 [81] "palomas"     "parecen"     "patria"      "pecado"      "pechos"
 [86] "piadosa"     "pido"        "pluma"       "porque"      "prosa"
 [91] "quién"       "quien"       "quieras"     "rabia"       "razón"
 [96] "responde"    "retozan"     "risueño"     "rompe"       "salud"

A quick look at the modernista vocabulary present in the MFWs reveal that many of the typical modernista words appear in it (azul, flores, rosas, primavera), as well as in the groups selected by patterns. In the MFWs, as it is expected of any list for stylistic analysis, too many of the top words are not useful for detecting a modernista vocabulary (la, de, y, el, en). In contrast, employing the patterns one finds interesting words that do not appear among the top MFWs: grecia, griega, pálidas, venus, marfil, and many others.


2 thoughts on “Stop Anthropomorphizing Literary Periods, or Why the Most Frequent Words Don’t Matter.

  1. silviaegt

    I would love to see why do you think this happens, and also if this phenomenon did not happen only because one poem was about, say, “ninfas” and not because the complete “period” was full of them.
    Anyhow, it is an interesting approach, thank you for sharing!

    1. joseeduardogonzalez Post author

      Thanks for the comment. The possibility of having a poem about “ninfas” is why the MFWs method alone does not work for determining which words have emerged in a literary period. In the method I am exploring in this post, the number of tokens is not that important. What is important is that a word was employed more frequently, por even for the first time in relation to the past periods. And if this is becasue the poet decided to write a poem about “ninfas,” which he had not done before, that is fine. Dario does not have that many poems about “cisnes.” Obviously, it would help to be able to contrast Najera’s “modernista words” to other poets’–I am working on that, and also working on other methods to generate a list of modernista vocabulary.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s