I think the sociogram of our IEMS5720 blogosphere on the slides is quite cool, it visualizes social connections between students in our class according to our activities on the blogs. A graph is just enough to shows a lot things that are not easy to tell from the raw data (or by reading all the posts and comments).
We also learnt some basic graph theory and other techniques which can be very useful in analyzing the sociograms. I think the best way to learn new things is to practice, therefore I did some experiments on the data of our IEMS5720 blogosphere.
Get the Data
Since all our blogs are hosted on Blogger and Google is kind enough to provide an API for the Blogger service, I have no difficulty to get access to all posts and comments of everyone in our class.Basic Process
The raw data I collected using Blogger API contains all information about the posts and comments. In this practice, I decide to just do analysis on the sociogram, therefore I am not interested in the contents of the posts and comments. The only thing I am interested in here is the relationship. A comment has its own author(sender), and either it is posted directly on a post or it is posted to reply to another comment, in both cases a comment must also have a recipient. So for each comment, we find the relationship established between two people. By checking all comments, I get enough data drawing the sociogram.Draw the Graph
I do have some experiences on drawing graphs. This time I decide to try a new library called d3.js. D3.js is a JavaScript library for manipulating documents based on data. It can be used to draw graphs and also do other cool things.Updated 20-Nov-2012:
- Nested comments/replies are counted correctly
- Self-comments are ignored
- Users not enrolled in our class are not shown
- Update delay has been reduced to 30min
Analysis of the Sociogram
We've learned some metrics to evaluate the centrality of a graph: degree centrality, closeness centrality and betweenness centrality. The degree centrality is the most simple one and it can be computed just by counting. Closeness centrality and betweenness centrality are a little complected: we have to get the short paths. In the lecture, we use powers of matrix to compute the shortest paths. The number n at position (i,j) in the k-th power of the adjacency matrix means there are n different paths with length k from (i,j). Therefore by computing the powers of matrix, we can get the shortest paths between any pair of nodes. However, in practice when we implement the shortest paths algorithm in the computer program, we usually use other methods. There are some classic algorithms for finding shortest paths: Dijkstra Algorithm for single source shortest paths, and Floyd–Warshall Algorithm for all-pairs shortest paths. These algorithms can handle weighted graph, while the power-of-matrix method cannot.In my program, instead of implement all of these by myself, I use a graph library called NetworkX. NetworkX is a Python library for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. The results I got is at here. I am still not sure about the betweenness centrality I got, because it is rather small comparing to the other two values.

