Gutiérrez Nájera never published his poems in the form of a book. They appeared in the numerous newspapers for which he worked and, after he passed away (1895), his friends collected them in a single volume, along with a preface written by Julio Sierra. The book (you can find a copy of it in archive.org) included 158 poems to which modernista scholars such as Mapes, Boyd C Carter, González Guerrero and others have continued to add new texts throughout the years. At the moment, according to Angel Muñoz Fernández, there are 235 poems attributed to the Mexican poet (13). It is usually assumed that Nájera, as a poet, had a “youthful” and a “mature” artistic periods, but there is no clear consensus about when one period ends and the other begins. The closest thing we have to a periodization of his poetry is the grouping of the poems introduced by González Guerrero in his 1953 edition of Poesías completas. Even though he divides Nájera’s poetic work into several chronological periods, González Guerrero also groups them according to themes and poetic forms. The critic’s seemingly chaotic periodization goes as follows: Under the general heading of “Primeras Poesías,” he adds two subdivisions, “La fe de mi infancia” (1875-1881) and “Trovas de amor” (1875-1880). The rest of the poems are placed in the following sections: “Otros poemas juveniles” (1877-1881), “Caminos del viento” (1880-1883), “Ala y abismo” (1884-1887), “Elegías” (1887-1890), “Nuevas canciones” (1888-1895), “Odas breves” (No dates given), “Poesías varias” (1876-1891), “Versiones” (1880-1884). The last group contains Nájera’s translations of French poems, some of which, at one point, were mistaken for original creations. One could argue that González Guerrero divides Nájera’s poetic trajectory into a youthful period that goes from 1875 to 1881, a transitional period from 1880-1883, a middle period, from 1884-1887, and a mature period that goes from 1888 to 1895.
My objective was to apply a stylometric analysis to Nájera’s poetry with the purpose of creating a new periodization. In the next two sections of this post, I will summarize the problems I had with preparing the data and with some of technical aspects of the analytical process. If you prefer, you can jump to the last section of the post, in which I contrast my results to González Guerrero’s and propose a new periodization of Nájera’poetic work.
II. The 1896 edition and its afterlife
Although a total of 235 poems are recognized as forming Nájera’s poetic corpus, that number also includes poems translated from French literature and at least one poem written entirely in French. I excluded those from my analysis bringing down the total to 220 poems. The biggest problem in classifying the poems, however, had to do with the dates of composition and/or publication. The 1896 posthumous edition was supposed to be organized chronologically, but many of the texts do not follow that order, and many others have no date assigned to them. None of the scholars in charge of the editions of Nájera’s poems that came after, fixed the problem, often simply reproducing the composition/publication dates found in the 1896 edition. Angel Muñoz Fernández’s comments, in his preface to the 2000 edition of Nájera’s poetry (which contains a facsimile of the 1896 edition, of course), describes the complexity of the problem: “Revisando algunos diarios de la época, encontré que el célebre ‘Francia y México’, con fecha 1882 en la edición de 1896, fue publicado en El Nacional el 5 de mayo de 1881, apareciendo junto al título la fecha 1879, que pudiera corresponder al año en que el poema fue escrito” (17).
I was unable to determine the date of a total of 34 poems, bringing down the number of poems I could use for my analysis to 186.
III. Length, etc
The technical side of the project created additional problems. Initially, I envisioned grouping Nájera’s poems by year, and treating each year as if it were a single text. I would then tokenize the poems and get the word counts and frequencies in relation to that year alone. However, Nájera had a very uneven poetic production and some periods were more productive than others. Some years he wrote so few poems that it became impossible to get an accurate author signal because there were not enough tokens per year of production. In his paper, “Does Size Matter? Authorship Attribution, Short Samples, Big Problem,” Maciej Eder argues that the current methods for doing stylometric analysis do not allow the study of very short texts: “using 2,000-word samples will hardly provide a reliable result, to say nothing of shorter texts.” The number of words needed to get an accurate authorship signal in a text varies. With regard to poetry, Eder explains that in his experiment “the results for the three poetic corpora (Greek, Latin, English) proved ambiguous, suggesting that some 3,000 words or so would be usually enough, but significant misclassification would also occur occasionally.” To analyze Nájera’s poetic corpus, I combined the texts from adjacent years in order to create two-year periods with around 4000 words. Only a few of the years surpassed the 4000 token mark and I left those by themselves. I was forced to create a multi-year period for the last years of Nájera’s life because of his extremely low production during that time.
FECHA TAMAÑO DEL “TEXTO” 1879 ----------------------- 8611 1880 ----------------------- 5896 1881 ----------------------- 4352 1875-1876 ------------------ 5461 1877-1878 ------------------ 8594 1882-1883 ------------------ 3995 1884-1885 ------------------ 6073 1886-1887 ------------------ 8648 1888-1889 ------------------ 7335 1890-1895 ------------------ 9541
After combining the years to obtain a higher token number, I compared the style for each time period employing as my classification method, Burrow’s Delta with zscores. The following images show the results, employing 150 of the most frequent words
and with 300 MFWs
I did not eliminate any pronouns or overrepresented words. I have yet to apply other methods (such as SVM and PCA) to this data.
In spite of all the problems related to dating the poems and length of samples, the stylometric analysis I performed makes it possible to propose a new periodization of Nájera’s poetic work (however provisional it might be). Looking at the following visualization of the classification resulting from the Burrow’s Delta method, the first thing one notices is how a cluster formed with the poetry from 1875 to around 1878/1879 (1879 often appears completely disconnected from the periods coming before and after). In González Guerrero’s view Nájera’s youthful period last until 1881, but in the stylometric analysis, the years from 1880 to around 1887 show strong similarities among them, almost always grouped together.
I was obviously concerned about having influenced the periodization by my creating two year periods to obtain a higher number of tokens. Addressing this problem was especially significant to determine when the transition from the middle to the mature period took place. González Guerrero employed 1888 as the year marking the beginning of Nájera’s last poetic period. When I tried combining 88-89 and 90-95, these two groups tended to move closer to each other than to the other 1880s groups. I then left 1888 by itself (there were enough tokens in that year to do that—over 5000) and created two more groups, 89-90 and 91-95. In this occasion 1888 moved toward 89-90, but not as close to 91-95 as I expected. The higher the number of MFW used, the more 91-95 distanced itself from the late 1880s. In other words, Nájera’s style definitely underwent a change in towards 1888 (possibly marking the beginning of a transitional period that goes until 1890?), but it is not clear that the last period of his poetry began as early as 1888.
- use of other classification methods such as SVM or PCA
- analysis of the change of vocabulary from the 1870s to the 1890s (topic modeling needed?)
- Adding prose documents to corpora. Establishing the publication date of those appears to be easier (should I assume that the difference between Nájera ‘s poetic style and his prose style is not significant?)