by Christopher N. Warren; Daniel Shore; Jessica Otis; Lawrence Wang; Mike Finegold; Cosma Shalizi
When you accidentally find a paper with Cosma Shalizi as one of the co-authors you know you have to read it. And this one is an interesting paper because of some research aspects that are of my interest. The abstract’s first sentence immediately caught my eye.
In this paper we present a statistical method for inferring historical social networks from biographical documents as well as the scholarly aims for doing so.
I found Six Degrees of Francis Bacon: A Statistical Method for Reconstructing Large Historical Social Networks interesting because it connects with my own work in some aspects:
- They focus on the analysis of historical documents. The authors use automated text extraction elements for inferring networks of personal relations from historical data. It relates to my work on the detecting Lisbon’s historical patterns and the unbuilt Lisbon.
Natural Language Processing
- They use many NLP and topic modelling techniques that are state of the art practice.
- A Poisson Graphical Lasso statistical method to infer the network from the co-occurrence matrices is used. More elaborate than the simple co-occurrence matrix I used in the survey of architecture floor design network construction. This approach is probably going to be of use for the future revision of the survey work.
- Code—in R—is available in a GitHub repo.
Use of experts: Peer assessment
- Curious use of experts to tune the quality of the method. Important to compare with work done on the clustering of floor plan designs. Using experts to create ground truths is common, but they come with their own pitfalls.