I was adding José Martí’s texts to my modernista database, when it occurred to me that it would be interesting to create a bookworm based on his writings. Bookworm, an online tool created by Benjamin M. Schmidt, is rapidly becoming a practical —and fun— way to allow internet users to interact with a project’s data. The data for my bookworm comes from the Edición Crítica de las Obras Completas de José Martí, which is freely available from the Centro de Estudios Martianos (CEM)’s website. For the Martí bookworm, I used a total of 984 texts. I chose to include only documents written between 1875 and 1885. CEM’s project of compiling Martí’s complete works is still—no pun intended—incomplete and they have edited very little material after 1885. In addition to his creative work (poetry, drama, and fiction), I have of course included his crónicas, and other newspapers articles, as well as many of his private letters.
Martí and modernismo.
Needless to say that my main interest in Martí lies in his relationship to the modernista style. My expectation, given his commitment to the Cuban Independence movement, was that I was going to find in his writings a specific “political” vocabulary not shared by other modernista writers. That assumption proved correct, but I was still surprised by how strong the presence of that vocabulary was. Words such as “america,” “patria,” “guerra,” “vida,” “muerte” and, of course, “españa,” “cuba” and “isla” have a consistently high frequency.
Is it important that Martí’s use of the word “america” increases after 1881? Probably. Especially when one considers that “america latina,” “hispanoamerica” and “norteamerica” begin to appear more often in his writings from the same period.
But I am not really interested in questions about the transformation of Martí’s discourse after he moves to New York, if there is any. For my project, it is more significant that words that appear frequently in Darío’s poetry (azul, flores, rosa, for example) are not as relevant for Martí. The dramatic contrast can be appreaciated when they are placed side by side as in the following image.
Here one could argue that it is unfair to compare Darío’s poetic style with the entirety of Martí’s texts, including non-literary documents. I agree, but I also believe that this unfair contrast generates valid questions: Would the most frequent words (MFW) in Martí’s poetic production show signs of being closer to the modernista vocabulary than to the MFWs in Martí’s complete works? Should one study the MFWs of a genre, instead of the MFWs of a specific time period? Would an analysis of Darío’s complete works reveal a similar situation? (and if it doesn’t, what does that mean? )
In case you missed it, here is the link to the Martí bookworm.
Bookworm and Spanish Texts (Tech Note).
Bookworms with complex interfaces require installing the software in a machine that fulfills all the requirements (MySQL, Python 2.7 or 3, with modules ntlk, numpy, regex and pandas, GNU parallel and a webserver software such as Apache). As this is a simple one, I opted to use Culturomics’ bookworm creator. Even if one chooses to go the Culturomics route, I would still recommend installing a copy of the software in a personal computer as it the best way to test if a bookworm is working properly.
The main issue I had using Bookworm was its inability to accept Spanish accented characters. The Bookworm site says that their “system does pretty decent job of encoding ugly characters, but after too many of them it starts to get upset and may cause your Bookworm to fail when building.” Well, it turns out that accented letters and other characters that are part of the Spanish language are treated as “ugly characters” by Bookworm. The fact that all the texts used for a bookworm are UTF-8 encoded makes the situation even more mysterious. Though I could think of a couple of ways of circumventing the problem, in order to use the Culturomics site I needed to strip all the accented characters from my texts. In other words, it is useless to use accents when searching the José Martí bookworm.