You are here: TUTWiki>Wille>SNAToolkit

Wille SNA Toolkit

Diagram representing Wille SNA Toolkit operation principles

The objective of Wille SNA Toolkit is to streamline data harvesting and processing procedures related to visualisation oriented data-driven social network analysis done with Wille. The toolkit also enables the development of context-sensitive social network (SN) analysis and visualisation applications. Examples of possible data sources for the analysis include wikis, social networking services and Web APIs. The emphasis of Wille SNA Toolkit is in visual analysis of social networks but it can also be used to export data e.g. to numerical analysis tools.

Features

  • Support for collecting SNA data from wikis, social networking engines, Web APIs and others sources.
  • Context-sensitive SNA data processing and visualisation.
  • Instant SN visualisation with Vizster and other visualisation tools.
  • Exporting data to popular SNA tools including NodeXL, Pajek and others.
  • Includes an example app sna-innovationecosystems providing useful examples on accessing and processing large sets of linked data.
  • Includes a set of ready-made components to be reused, extended and tailored.
  • Runs on a local machine, thus enables the usage of sensitive data sources for SNA.

Requirements

  • Wille SNA Toolkit uses the following libraries to process the data:
    • lxml for processing SNA data represented in XML.
    • numpy for matrix algebra and scientific computing in general.

The libraries can be installed e.g. with EasyInstall.

In addition, to use the context-sensitive features of Wille SNA Toolkit, you need a browser that supports augmented browsing implemented with Greasemonkey scripts. One option is to use Mozilla Firefox together with the Greasemonkey add-on.

Introduction

Some or all of the following steps are involved in data-driven SNA processes:
  1. Selecting a set of resources to be used as the sample (source of SNA data). A set of wiki pages, for example.
  2. Crawling the resources to collect the source data. This may require e.g. logging in to the source system.
  3. Creating a logical sociograph on basis of the source data.
  4. Calculating different SNA metrics for the different nodes in the sociograph (node degree, centrarity, prestige) and for the sociograph in general.
  5. Visualising the sociograph and the data related to it.
  6. Transforming the data into various formats supported by the different SNA tools and exporting data for further analysis.
Wille SNA Toolkit enables the development of information visualisation applications that implement the different steps of the analysis process. The toolkit comes with a set of reusable, extendable and tailorable components for SNA data processing, visualisation and export. Further, through the component-based architecture of Wille, Wille SNA Toolkit can be extended either by integrating existing SNA software to be used as Wille components or by developing entirely new components as needed.

In addition to providing support for SNA data processing, Wille SNA Toolkit introduces means to deliver SNA tools to different contexts in Web from where the SNA data is originated or the actors within the data are appearing. The Visualise app collects context information from the pages that the analyst is browsing the Web and notifies when SN visualisations and usable SN data is available. See Salonen and Huhtamäki (2010) for the background for context-sensitive visualisation launching.

One of the main advantages of Wille is the fact that it is run on local machine. This gives the analyst the possibility to use also sensitive data as a source of analysis data. The analyst can, for example, manually log in to a source data system to collect the data to be processed with Wille or, when possible, use Wille to log into the source system to crawl the data automatically.

Quite often, the crawling process is very slow as are some of the SN analysis processes as well. Thus, a local cache may be used speed up the process. The SNA example applications included in the toolkit introduce means to implement a simple cache mechanism with Wille. Together with the possibility to define the data directory used when starting Wille server, social network analyst can manage different versions of SNA data in a straightforward manner: a local version of the data can be used for immediate analysis or, if time allows, data can be refreshed from the original source. More advanced crawling tools such as Scrapy can, of course, be used to complement the ones implemented in Wille.

The main social network visualisation tool included in the Wille SNA Toolkit is Vizster, a tool for interaction-intensive visualisation of (online social) network. Vizster is developed by Jeffrey Heer and published in 2005 PDF. See Vizster homepage for details. Due to its open licencing policy, support for importing SNA data in a dialect of GraphML, capability in managing thousands of nodes, Vizster is still a viable tool for visualising social networks in Web context. Further, we look forward to the progress of JSVis (http://www.jsviz.org/) and other Javascript-based graph visualisation tools that provide even more straightforward analysis process than using Vizster.

In addition to enabling the immediate visualisation of different social networks in the context in which the data originates from, SNA data can be exported in different formats to support the usage of different state-of-the-art SN analysis and visualisation tools. As a legacy format Pajek is still very popular and supported by tools including Orange and others, for example, Wille SNA Toolkit provides a component for serialising sociographs in Pajek format.

To demonstrate the way that Wille SNA Toolkit can be used to facilitate data-driven visual SN analysis and visualisation, we include an additional example service, sna-innovationecosystems, to the delivery package. The example is prepared in close co-operation with the Martha Russell ( Media X, Stanford University), Neil Rubens (University of Electro-Communications, Tokyo) and Kaisa Still of Innovation Ecosystems Consortium, "[a]n international benkyo-kai (study group) of scientists and practitioners engaged in discovery and developing insights about technology-based economic development". Please see below for detailed instructions on setting up and running the example visualising Innovation Ecosystems Database. For demonstration purposes, live data from Crunchbase is used for this tutorial. For further information on Innovation Ecosystems Database, please refer to Innovation Ecosystems homepage.

Testdrive Wille SNA Toolkit

In order to test Wille SNA Toolkit, you need to:

  • Download a complaint version of Wille 2 Core (see Download)
  • Download the SNA toolkit (see the end of this page)

Start Wille Server with SNA toolkit in command-line by entering the sna toolkit folder and invoking:

python wille-server.py -d -p 8080 -f sna

For more information on configuration and command-line options, see tutorial on running wille server

Load Wille frontpage at http://localhost:8080. If you run Wille on a port other than 8080, please adjust the example URIs in this tutorial accordingly.

You should now see a set of apps and services running in your local machine.

As mentioned before, Wille can be used to speed up the crawling process with a simple proxy mechanism. To see a copy of data on Facebook, for example, select http://localhost:8080/apps/sna-innovationecosystems/data/company/facebook.js. When you load the data for the first time, a local copy of the data is created and served when later requested.

sna-innovationecosystems implements a cache functionality for the produced SNA data as well. To create a document representing a network of companies and people surrounding a particular company, say Spotify for example, load http://localhost:8080/apps/sna-innovationecosystems/sna/data/nodeedge/spotify.xml to your browser.

A local data cache develops as the crawling process moves forward:

Example view to Wille SNA Toolkit data cache

Please note that it may take even a couple of hours to collects the data on the social surroundings of a company with a large surrounding social network such as Facebook. This happens because the current implementation collects all the people and related companies within two hops from the selected. To reduce the processing time for this tutorial, select a company with a smaller set of connections. Spotify is used in the following examples.

Further, a Vizter-compliant GraphML representation of the same network is served under http://localhost:8080/apps/sna-innovationecosystems/sna/data/graphml/facebook.xml. This is also an example of the cache mechanism: to create the GraphML version, a local copy of the previously created graph data is used as the source data, again creating a new level into the data cache.

The sociograph is represented with a simple XML vocabulary designed for internal use in Wille SNA Toolkit. For ease of reference, let's call the vocabulary as Wille network XML. The reason for not using e.g. GraphML is that we wanted to keep the vocabulary as simple as possible. In GraphML, for example, the schema of the data has to be expressed in the beginning of the document. This adds to the things that a transformation has to implement when importing data to be used in the toolkit.

We see, however, that GraphML could be introduced as the internal format for graphs in the future. This would add to the interoperability between Wille and other systems. Again, it is very straightforward to transform data between the toolkit's internal graph format and GraphML. Further, we would like to point out that the GraphML dialect that Vizster supports differs from the official GraphML specification (http://graphml.graphdrawing.org). This means that several formats are, in any case, needed to represent graphs.




   
      
         alex-kazim
         contributor
         http://api.crunchbase.com/assets/images/resized/0001/9175/19175v1-max-450x450.jpg
         http://www.crunchbase.com/person/alex-kazim
      
      
         daniel-ek
         contributor
         http://api.crunchbase.com/assets/images/resized/0003/5104/35104v1-max-450x450.jpg
         http://www.crunchbase.com/person/daniel-ek
      
      
         klaus-hommels
         contributor
         http://api.crunchbase.com/assets/images/resized/0001/6650/16650v4-max-450x450.jpg
         http://www.crunchbase.com/person/klaus-hommels
      
      
         company
         http://api.crunchbase.com/assets/images/resized/0001/7768/17768v3-max-450x450.jpg
         http://www.crunchbase.com/company/spotify
            
      
         company
         http://api.crunchbase.com/assets/images/resized/0000/1387/1387v1-max-450x450.png
         http://www.crunchbase.com/company/skype
      
   
   
      
         daniel-ek
         spotify
      
      
         klaus-hommels
         spotify
      
      
         klaus-hommels
         skype
      
   

An Vizster-compliant GraphML representation of the network can be created with Wille service sna.graph2graphml:



   
      
      
      
      
      
      
      
      
      
   
   
      
      
      
      
      
      
      
      
      
   
   
      
      
      
      
      
      
      
      
      
   
   
      
      
      
      
      
      
      
      
      
   
   
      
      
      
      
      
      
      
      
      
   
   
   
   

Further, Wille service sna.graph2pajek can be used to create a Pajek representation of the graph:

*Network From  to 
*Vertices 5
1 contributor_alex-kazim 0.0000 0.0000 0.5000  ic Red bc Black  0.0000 0.0000 0.5000 
2 contributor_daniel-ek 0.0000 0.0000 0.5000  ic Red bc Black  0.0000 0.0000 0.5000 
3 contributor_klaus-hommels 0.0000 0.0000 0.5000  ic Red bc Black  0.0000 0.0000 0.5000 
4 company_spotify 0.0000 0.0000 0.5000  ic Green bc Black  0.0000 0.0000 0.5000 
5 company_skype 0.0000 0.0000 0.5000  ic Green bc Black  0.0000 0.0000 0.5000 
*Arcs
2 4
3 4
3 5

The complete set of URIs may share light on the flexibility and extensibility of the sna-innovationecosystems and other similarly constructed Wille apps. (Note that means for elegant URI design common e.g. in RESTful applications are in use in Wille.) Please note the use of company shortname in URIs: it is used to identify the data set used in different views of the data.

Examples on the structure of the data URI space for Spotify (shortname spotify) follows:

A copy of the original company data
http://localhost:8080/apps/sna-innovationecosystems/data/company/spotify.js
The surroundings of Spotify social network (previous and current employees of Spotify and related companies)
http://localhost:8080/apps/sna-innovationecosystems/sna/data/nodeedge/spotify.xml
Spotify sociograph in Vizster-compliant GraphML. Created with the existing sociograph data as the source
http://localhost:8080/apps/sna-innovationecosystems/sna/data/graphml/spotify.xml
A Java Web Start file (in JNLP format) for launching Vizster with Spotify data
http://localhost:8080/apps/sna-innovationecosystems/sna/launch/vizster/spotify.jnlp
A HTML-formatted list of the employees in the Spotify sociograph
http://localhost:8080/apps/sna-innovationecosystems/sna/view/contributorlist/spotify.html
The sociograph in Pajek format
http://localhost:8080/apps/sna-innovationecosystems/sna/export/pajek/spotify.paj

To see the social surroundings of Facebook, you may have to wait for a couple of hours: http://localhost:8080/apps/sna-innovationecosystems/sna/launch/vizster/facebook.jnlp.

Internal to the toolkit, the URI scheme looks like this:

urls = (
     ("/context/analyse/", ContextAnalysis),
     ("/data/(.*)", ServeOrFetchFiles), 
     ("/sna/data/(.*)", ServeOrProduceSNAData),
     ("/sna/launch/vizster/lib/(.*)", ServeVizsterFiles),
     ("/sna/launch/vizster/(.*).jnlp", VizsterLauncher),
     ("/sna/view/contributorlist/(.*).html", Contributorlist),
     ("/sna/export/pajek/(.*).paj", PajekExport),
) 

A similar fetch-or-serve routine can be implemented to other SNA application, too, to speed up your Wille processes as needed. Together with the possibility to define the data directory used during a Wille session as a command-line parameter, the analyst can manage a local copy of the SNA source data in a straightforward manner: refreshed data can be used when time allows and older copies of the data are served more quickly.

Launching visualisations: the Visualise application

The SNA Toolkit supports Visualise with Wille, a tool that can detect visualisable data while using a web browser. For more information on the tool and installation instructions, see the tools page.

To test visualise with an SNA visualisation, open the Spotify page at Crunchbase. Once the page is completely loaded, you should see a Wille widget appearing in the up-right corner of your browser. Select the links to open a visualisation or to export the data.

Creating a new SNA application

Next, a procedure to create a new SNA application is explained.

Start by downloading the Wille SNA Toolkit and use the tutorial above to make sure that everything is working as intended. Create a new application and name it as sna-yourapp. (Replace yourapp with a name representing your app.) Use the sna-example app as a template for your app.

To register to receive context information, add visualise to the list of app profiles in file willeapp.properties:

profile=sna,example,visualise
description=My SNA Application.

After this, the Visualise app starts to report context information to your app. Context information includes two variables, uri and resource. Variable uri refers to the address of the page and resource contains the whole contents of the resource ("web page") currently open in Web browser serialised in HTML. (The concept of representation is used here in the spirit of the architecture of the World Wide Web.)

To receive the information, create a context analysis class and map it to URI /apps/sna-yourapplication/context/analyse:

urls = ( ("/context/analyse/", ContextAnalysis), ) 

A simple template for the context analysis class:

class ContextAnalysis: 
    def POST(self, context, params=''):
        # URI of the context to be visualised
        uri = context.input()['uri'][0]
        # The contents of the context page serialised in HTML format
        representation = context.input()['representation'][0]
        print 'Visualising %s' % uri
        print 'Page contents: %s' % representation
        return (simplejson.dumps({'status': 'visualisable', 
                                  'visualisationset': {'examplevisualisation': {'visualisation_uri': 'http://localhost:8080/apps/sna-example/exampleview'},
                                  }}),
                {'content-type': 'application/json'}, 200)

In the example above, the return value of the analysis means that the app is able to visualise the context with one visualisation. The name and URI of the visualisation are included in the return value. When the app is not able to visualise the context, the return value should include a status field with value novisualisation. The Visualise app merges the candidate visualisation from all the different apps and lays out the options to the Wille widget injected to the browser window.

Again, see sna-example app for a detailed example.

Next, create the visualisation that your app is referring to: Add a new handler to url variable and implement a class for serving the visualisation. For examples e.g. on launching Vizster and exporting Pajek data, please refer to sna-innovationecosystems.

Further app development is relatively straightforward: Use the existing Wille components to connect to different data sources, to transform the data between formats and to export data to formats that can be imported to different SNA applications. Build new Wille components as needed or integrate existing components that implement the needed functionalities.

Obviously, there is no need to map every visualisation to a URI. Further, some of the SNA application can be developed as Wille scripts.

Tips for Creating New Apps

As the complexity of the analysis procedures you develop increases, we suggest taking a look into the following libraries:

Topic attachments
I Attachment Action Size Date Who Comment
wille-cache-example.pngpng wille-cache-example.png manage 35.8 K 29 Apr 2010 - 09:54 JukkaHuhtamaeki Contents of Wille Cache During a Data Crawling Session
wille-sna-toolkit-diagram.pngpng wille-sna-toolkit-diagram.png manage 300.4 K 28 Apr 2010 - 16:06 JukkaHuhtamaeki Wille SNA Toolkit
wille2-sna-2010-05-19.zipzip wille2-sna-2010-05-19.zip manage 18434.3 K 24 May 2010 - 11:54 AdminUser  
wille2-sna-2010-09-23.zipzip wille2-sna-2010-09-23.zip manage 12730.6 K 23 Sep 2010 - 14:37 JukkaHuhtamaeki  
Print version |  PDF  | History: r63 < r62 < r61 < r60 | 
Topic revision: r63 - 23 Sep 2010 - 14:37:05 - JukkaHuhtamaeki
 

TUTWiki

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TUTWiki? Send feedback