What are the advantages and disadvantages of text mining historical texts? What did you learn about your soldier’s life from ngram viewers?
Text mining is where you pull high quality information out of a large assortment of text. Usually it shows the information is some sort of visual chart or graph.
Text mining has many advantages. As posted by teunderwood, one advantage is that text mining can categorize documents. This means that whatever your’e using, whether it’s a search engine or another program like Voyant, the program will weed out the documents and texts not related to your search. It does this by using OCR to see which words are used the most in the text to see if it is a good fit for your search result.
Another advantage mentioned is that cluster features can tend to be associated in a given corpus of documents. Meaning that this reverses the logic of the normal clustering of documents. It instead groups words that appear in the same documents, and it creates almost like a map like image.
One more advantage is that text mining can trace the history of particular features over time. Google’s ngram viewer would be the best example of how As teunderwood states, this could be viewed as a special category of corpus comparison, where you’re comparing corpora segmented on the time axis. Personally I didn’t really learn too much about our soldier from Google’s ngram. I searched basic information like Michigan Cavalry, and enlistment. I couldn’t really find much about George Karn’s 1st Michigan Cavalry, but there were books about Michigan during the war. For enlistment, all I could really find about enlistment was foreign enlistment during the Civil War. This sort of applies to George Karn since he was Canadian.
Text mining is an overall cool process, which can really help out on research by weeding out texts and webpages using OCR, to help condense your search to potentially useful information.