Thursday, November 8, 2012

Sociogram and SNA

We've learnt lots of things about SNA in the last lecture.
I think the sociogram of our IEMS5720 blogosphere on the slides is quite cool, it visualizes social connections between students in our class according to our activities on the blogs. A graph is just enough to shows a lot things that are not easy to tell from the raw data (or by reading all the posts and comments).

We also learnt some basic graph theory and other techniques which can be very useful in analyzing the sociograms. I think the best way to learn new things is to practice, therefore I did some experiments on the data of our IEMS5720 blogosphere.

Get the Data

Since all our blogs are hosted on Blogger and Google is kind enough to provide an API for the Blogger service, I have no difficulty to get access to all posts and comments of everyone in our class.

Basic Process

The raw data I collected using Blogger API contains all information about the posts and comments. In this practice, I decide to just do analysis on the sociogram, therefore I am not interested in the contents of the posts and comments. The only thing I am interested in here is the relationship. A comment has its own author(sender), and either it is posted directly on a post or it is posted to reply to another comment, in both cases a comment must also have a recipient. So for each comment, we find the relationship established between two people. By checking all comments, I get enough data drawing the sociogram.

Draw the Graph

I do have some experiences on drawing graphs. This time I decide to try a new library called d3.js. D3.js is a JavaScript library for manipulating documents based on data. It can be used to draw graphs and also do other cool things.


Updated 20-Nov-2012:
  • Nested comments/replies are counted correctly
  • Self-comments are ignored
  • Users not enrolled in our class are not shown
  • Update delay has been reduced to 30min

Analysis of the Sociogram

We've learned some metrics to evaluate the centrality of a graph: degree centrality, closeness centrality and betweenness centrality. The degree centrality is the most simple one and it can be computed just by counting. Closeness centrality and betweenness centrality are a little complected: we have to get the short paths. In the lecture, we use powers of matrix to compute the shortest paths. The number n at position (i,j) in the k-th power of the adjacency matrix means there are n different paths with length k from (i,j). Therefore by computing the powers of matrix, we can get the shortest paths between any pair of nodes. However, in practice when we implement the shortest paths algorithm in the computer program, we usually use other methods. There are some classic algorithms for finding shortest paths: Dijkstra Algorithm for single source shortest paths, and Floyd–Warshall Algorithm for all-pairs shortest paths. These algorithms can handle weighted graph, while the power-of-matrix method cannot.

In my program, instead of implement all of these by myself, I use a graph library called NetworkX. NetworkX is a Python library for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. The results I got is at here. I am still not sure about the betweenness centrality I got, because it is rather small comparing to the other two values.

29 comments:

  1. Amazing graph! Looking forward to the completed article

    ReplyDelete
  2. Wow that's amazing! The graph is much illustrative for the big picture. And to me it's always an annoying task to draw a graph, seems d3.js is quite useful there. Can you tell me how to access my own home.ie? Thank you.

    ReplyDelete
    Replies
    1. @Monolas, you can find the info about IE Home on https://wiki.ie.cuhk.edu.hk/general/index.php/Create_Your_Personal_Web_Page
      You may need to enter your IE UNIX account info if you're accessing this page from non-CUHK network.

      Delete
  3. WOW,that's sooooooooooo wonderful! It's a very good way to get data automatically instead of collecting by manual work. But I think SNA is more than the graph shown. You can put your data in NodeXL, it will give more information about SNA results.

    ReplyDelete
    Replies
    1. hmm...I've tested on this. It seems have some delay. I post the last comment, but it doesn't show the new state, pls fix it.
      And, do you think your product will influence the comment behavior of our classmates. I mean, firstly, students may work harder on comments for keeping up with active ones; secondly, students may give more comments on those who get less comments for helping. That will be very interesting.
      Jobs changes the world, and you change the whole class. Great! hahah~

      Delete
    2. @Sabrina, thank you for your replies. Yes, you're right. The graph does not show the latest stats and there is a delay. Currently the graph data is updated hourly, at the beginning of each hour(??:00). Since there is a limit (10k/day) on the access to Google's Blogger API and updating the whole graph needs about 100 API requests to Blogger, we cannot do it in real time. But I also think 1 hour is too long, so maybe I can optimize my code a little bit to make it 10 minutes.

      I never thought this graph would change our class, but I am really glad that you find this useful :-)

      Delete
  4. This graph can clearly describe the relationship among all students, it is amazing.

    ReplyDelete
  5. Wool,this tool is so cool to show the connection between each other. By using this way, we have no need to calculate the comments one by one. But I want to know if this tool can elaborate the degree centrality or between ess centrality As nodexl can do?

    ReplyDelete
    Replies
    1. I will implement the centrality functions when I have time. :-)

      Delete
  6. What great practise you have done and provide an automatic sociogram for the whole class. Good job!
    I also drew a sociogram of my own blog network before I read this blog and I think it is so stupid to do it manually...Thank you for setting a good example and looking forwards to the competed article.

    ReplyDelete
    Replies
    1. Hi, thank you for your comment. My article seems never to be complete :-( but I've just update my sociogram tool http://blogosphere.id3.cc/

      Delete
  7. When I click this sociogram picture, a beautiful thing was showing up. I can easily get how many "in" and "out" for everybody. You definitely give us a good example that how to use our knowledge in this class to make a good product.

    ReplyDelete
  8. Wow~ it is really so cool. You add the user limitation list who must be in our class. so I can check my comment number easily. Of course, you also add the analysis function in it. Good job~ Looking forward to the more completed project.

    ReplyDelete
  9. You focusd on the sociogram, that is, "relationship", but actually when we read and reply a blog, the contect is the core of our communication. how do you banlence it?

    ReplyDelete
  10. It's so wonderful. And I admire your programming capacity.
    But I have a question. After I open the picture and choose the sociogram forum, I find that my point is smaller than others who have both fewer in-comments and fewer out-comments than me. So I want to ask how to decide the size of the point in your programming.

    ReplyDelete
  11. Your are a hero... And the updated version is even cooler. LOLLL

    ReplyDelete
    Replies
    1. Yup~~ Thanks to OR, we could easily collect our comments now.

      Delete
    2. :-) I am going to add an "export to Excel" function for the comments, because I am too lazy to collect my comments. It will be ready before the comments submission deadline.

      Delete
    3. Upgraded to superhero!!!... (this comment is simply for admiration. No intention to count it as a qualified entry tho. LOL

      Delete
  12. This graph is great, barely have no time drawing a graph like this. You provide me a good example of the graph.Thank you.
    This graph really can show many parameters of a social network.

    ReplyDelete
  13. The new website is awsome!!!
    However, the part of recent post might not be fair for the students who post their blog earlier. Maybe adding a recommendation part to show the blog randomly or within some recommendation algorithm is better, so that everyone has the changce to show up.

    ReplyDelete
  14. What you have done is amazing. But there is a small mistake in my comment record. The result shows that I have just comment on one blog and didn't post any blog. Can you solve the problem for me? thanks.

    ReplyDelete
    Replies
    1. It seems that two of your comments were posted using your bloger profile and your own blog posts and the rest of your comments were posted using your Google+ plus profile. In this case, they are counted as two different users. You may find another "Zhao Yifan" on the lower part of the users page and your posts should be listed correctly on that one. http://blogosphere.id3.cc/user/g106039637770459870405

      Delete
    2. can I do something to make the two users together? It is very strange to have two Zhao Yifan.

      Delete
    3. @Yifan Yes and No. It's possible to implement a "merge" or "alias" function. But I don't have time to do that, sorry :-(

      Delete
  15. The sociogram illustrate the relationship between our classmate in the blog assignment. it has animation! very amazing!

    ReplyDelete
  16. So exciting of your graph.I was so curious about the the method drawing the graph before. After I read your blog, I really understand the process you creat the graph. It's definitely useful in our daily lives. Thank you very much~

    ReplyDelete
  17. what a usful data-collecting platform !! Good job~~ this blogosphere is so strong to collecting all students'data and give a brillent analysis for blog users. and the danymic socialgraph is also amazing ! I believe all of our classmates are benefit by this~

    ReplyDelete