The Inevitable José Martí Bookworm

I was adding José Martí’s texts to my modernista database, when it occurred to me that it would be interesting to create a bookworm based on his writings. Bookworm, an online tool created by Benjamin M. Schmidt, is rapidly becoming a practical —and fun— way to allow internet users to interact with a project’s data. The data for my bookworm comes from the Edición Crítica de las Obras Completas de José Martí, which is freely available from the Centro de Estudios Martianos (CEM)’s website. For the Martí bookworm, I used a total of 984 texts. I chose to include only documents written between 1875 and 1885. CEM’s project of compiling Martí’s complete works is still—no pun intended—incomplete and they have edited very little material after 1885. In addition to his creative work (poetry, drama, and fiction), I have of course included his crónicas, and other newspapers articles, as well as many of his private letters.

Martí and modernismo.

Needless to say that my main interest in Martí lies in his relationship to the modernista style. My expectation, given his commitment to the Cuban Independence movement, was that I was going to find in his writings a specific “political” vocabulary not shared by other modernista writers. That assumption proved correct, but I was still surprised by how strong the presence of that vocabulary was. Words such as “america,” “patria,” “guerra,” “vida,” “muerte” and, of course, “españa,” “cuba” and “isla” have a consistently high frequency.


Is it important that Martí’s use of the word “america” increases after 1881? Probably. Especially when one considers that “america latina,” “hispanoamerica” and “norteamerica” begin to appear more often in his writings from the same period.


But I am not really interested in questions about the transformation of Martí’s discourse after he moves to New York, if there is any. For my project, it is more significant that words that appear frequently in Darío’s poetry (azul, flores, rosa, for example) are not as relevant for Martí. The dramatic contrast can be appreaciated when they are placed side by side as in the following image.


Here one could argue that it is unfair to compare Darío’s poetic style with the entirety of Martí’s texts, including non-literary documents. I agree, but I also believe that this unfair contrast generates valid questions: Would the most frequent words (MFW) in Martí’s poetic production show signs of being closer to the modernista vocabulary than to the MFWs in Martí’s complete works? Should one study the MFWs of a genre, instead of the MFWs of a specific time period? Would an analysis of Darío’s complete works reveal a similar situation? (and if it doesn’t, what does that mean? )

In case you missed it, here is the link to the Martí bookworm.

Bookworm and Spanish Texts (Tech Note).

Bookworms with complex interfaces require installing the software in a machine that fulfills all the requirements (MySQL, Python 2.7 or 3, with modules ntlk, numpy, regex and pandas, GNU parallel and a webserver software such as Apache). As this is a simple one, I opted to use Culturomics’ bookworm creator. Even if one chooses to go the Culturomics route, I would still recommend installing a copy of the software in a personal computer as it the best way to test if a bookworm is working properly.

The main issue I had using Bookworm was its inability to accept Spanish accented characters. The Bookworm site says that their “system does pretty decent job of encoding ugly characters, but after too many of them it starts to get upset and may cause your Bookworm to fail when building.” Well, it turns out that accented letters and other characters that are part of the Spanish language are treated as “ugly characters” by Bookworm. The fact that all the texts used for a bookworm are UTF-8 encoded makes the situation even more mysterious. Though I could think of a couple of ways of circumventing the problem, in order to use the Culturomics site I needed to strip all the accented characters from my texts. In other words, it is useless to use accents when searching the José Martí bookworm.


2 thoughts on “The Inevitable José Martí Bookworm

  1. Jennifer Isasi

    ¡Muy interesante!
    You know I’m not familiar with Martí’s work. However, I think implementing all his work in a database to study the “evolution” of use of some words such the ones you looked for could be really interesting in terms of Modernismo styles, and politics in the period.

    I believe though, like you said, that the contrasting Dario’s poetic work with Martí’s almost complete work is unfair in that you don’t get a real style contrast. I mean, if we compare style in terms of the use of words such as ‘azul’, ‘flor’, ‘cisne’, etc., we have to have in mind that Martí is not going to use those words when talking about politics; those texts, too, would be longer. Looking for MFW at his poetry and compare it to Dario’s could be a better approach. Then, might be they differ greatly in terms of their MFW. Studying Dario’s complete work could be interesting too, of course.

    Also, to your question: “Should one study the MFWs of a genre, instead of the MFWs of a specific time period?” Genre and time period are not incompatible, thus, I ask: Would the study of the MFWs of a genre throughout the production of a writer show changes on specific time periods?

    ps. I’ve done something similar – somehow, in R – with ‘nación, patria, país’ in Galdós; I was not able to draw any conclusion (Opz only required a draft of a methodology to study a particular aspect of a literary work) but could be something. And now I want to use this bookworm 🙂

    1. joseeduardogonzalez Post author

      Thank you for your comment, Jennifer. Yes, I know it was silly to compare Darío’s modernista language with Martí’s MFWs in his Obras completas, but knowing Martí’s will-to-style, I was sort of expecting that even in his “political” writings there would be a strong presence of unusual nouns and adjectives in his prose (one has to think of the style found in esssays such as “Nuestra América”). But, as I say in my post, what if this is true for Darío? What if Darío’s MFWs in his poetry coincides with that of his Complete Works? Does that mean that Darío’s style is so powerful that the content did not matter?

      Working with Galdós must be great because you have so many texts available. Perhaps you shouldn’t be looking at words like “nación” or “patria” but at words that might have associated to the notion of the nation at different times during his career.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s