top of page
Search

The Complexity of Text Mining and Digital History and the Struggle to Find a Resolve

  • jeremiasr4
  • Oct 1, 2023
  • 3 min read

I would like to preface this blog post to address my confusion, and possible ignorance, to the understanding of text mining. I spent hours reading the book The Dangerous Art of Text Mining, by Jo Guldi. I spent another two hours attempting to figure how to both understand what I [attempted] to read, and actually begin the writing process, which led to disappointment and further confusion. After discussing with a classmate about what exactly I was reading, Ross told me that “text mining is the fancy term for having AI read several different documents (books, letters, whatever) and then ‘interpret’ them all by identifying shared terms and phrases. So, it doesn’t actually ‘read’ them, but it quantifies them, if that makes sense.” (Ross) To my dismay, and probably his as well, I was still not fully grasping what exactly this “text mining” was. Guldi argued in this book that “a smarter data science can arise from pursuit of the kings of ‘hybrid knowledge’ that are documented here, where concerns about the bias of data and algorithms from the humanities and questions from historical theory meet mathematical modeling.” (419)

What does the aforementioned thesis mean to someone who lacks the understanding data science? My interpretation, or better put, understanding, of Guldi’s argument is that there needs to be more complex mathematical algorithms to better assist historians in providing less bias-based sources that current AI and search engines are incapable of doing. If my understanding is correct, then this is without question an argument with a resolve that absolutely requires an interdisciplinary result. Quantitative methodology is, without a doubt, almost required for social and economic historians, which has been burgeoning since the 1980s-90s. By no means is quantitative research something that necessary necessitates interdisciplinary research, although this would be helpful. However, the resolve for what Guldi is concerned with does require this type interdisciplinary assistance.

For example, Guldi writes that “with an algorithm, we can use baseline measurements to identify the exact books and years in which new attitudes appeared.” (102) But for historians like myself without the knowledge of basic mathematic arithmetic as well as any knowledge in computer science, how am I supposed to approach this topic? Guldi mentions that “the perspectives of powerful people will tend to dominate every analysis, unless the analyst specifically decides to search for the voices of the powerless.” (39) But it is possible to fix this? Historians are trained to address these biases and understand that there are biases in every primary source, we just need to be aware of them. My lack of knowledge of AI and computer science plagues me from understanding if it is possible, and maybe it is after thoroughly training the AI what exactly bias is and how to address it. Now, Guldi does address this in chapter two, saying that “this chapter hopes to aim at a cursory introduction, hoping to help data analysts to imagine what sorts of skills, teams, and training might work together to produce a rigorous approach to data about human experience.” (59) But I feel as if this answer is still complex.

Complexity is the next and last topic I will discuss in this blog post. I find this book to be sort of ironic. I use the word “ironic” loosely, as this book is extraordinarily well written and has a solid thesis. However, in graduate school we spend a lot of time discussing the dichotomy between academic historical writing, and history written for the common people. Historians yearn to attract people to historical studies, but writing in a non-academic way does not offer advancement in their career. The irony that I find is that this book is an example of how inserting this new world of digital history, at least in such complex levels, almost over-complexifies the study of history even for an academic historian, or graduate historian in this case. Graduate students like myself come across plenty of difficult and complex philosophical texts to interpret, however we are trained to do so. What we are still not trained to do, is understand the complexities of digital tools like are addressed in this book. Classes like digital history are doing so, and I think this is extremely important, but it is also extremely difficult with a thorough understanding of these mathematical algorithms. For example, as I mentioned earlier in this blog post, I struggled and am still struggling to fully grasp this book, but yet I want to understand it more. By no means am I bashing digital history, or this writing, however I find the irony of it becoming difficult for graduate historians, and likely the luddite-style historians, because we are already frustrated with the aforementioned dichotomy. I am looking forward to this week’s discussion to help understand this concept.

 
 
 

Recent Posts

See All

Comments


bottom of page