processq

process / inspiration / research notes by Alex Dragulescu

Ekisto: motivation, goals, algorithmic and design challenges

Motivation and goals

The motivation for this work came from some of the questions that drove our research in the Sociable Media Group at the MIT Media Lab.

How does an online community look like? How do we design a data portrait of our online persona? How does a group portrait look like? Can we help a newcomer understand an already existing online space? Can we help an existing user understand his/her surroundings? [1][2]

My latest answer to these questions is Ekisto,[3] an interactive group portrait of three online communities. This portrait is a hybrid form, a remix of visualization tropes: part explorable, layered map, part photograph, part bar chart, part network visualization.

The existing form resembles a city, a metaphor I embraced for this design, since it describes and resonates with the subject and underlying data of the portrait: online communities are habitats of our digital personas. Metaphors are important tools that help us understand complex abstract concepts and shape our perceptions [4].

Below is an early, alternate design route, showing user reputation/influence as spheres from a top view. In contrast, using rectangular cuboids instead of spheres renders the data in a familiar form (the bar chart) and informs the city metaphor (skyscrapers).

Early sketches of Ekisto

Multiple lenses

Unlike real life cities, where the layout is rigid and dictated by geography and urbanistics, the underlying mapping logic of the online habitat is flexible and mutable. Inundated by data, we must make sense of it by looking at it through the lenses of various algorithms.

The 2D position of users in Ekisto is determined using a graph layout algorithm which highlights clusters of similar nodes (OpenOrd[5]). Similarity between a pair of users is computed as the cosine similarity, a very simple and efficient way to measure how analogous two entities are.

In the 3D space, each user’s volume represents reputation points (Stackoverflow), or the network influence of the user computed using the PageRank algorithm (Github, Friendfeed).

Cosine similarity, PageRank and OpenOrd is one combination of algorithms that I chose among many possible others. For example, there are many similarity measures besides cosine similarity (e.g., Dice’s Coefficient, Jaccard index, SimRank). Moreover, there are multiple ways of calculating network influence, with trade-offs in computational complexity and expressive power. For example, Github has complex relationships of code ownership, contribution and collaboration which are not completely accounted for by a measure like PageRank.

However, the cosine coefficient and PageRank provide a good, relatively computationally inexpensive, baseline view.

With the right amount of computation power and data acquisition, one can add other lenses, for example compute semantic or topic similarity based on the links and content of user comments (Friendfeed), use a more complex measure of a user’s influence by looking at content flow through the networks, or use other force-directed graph layout algorithms.

Design challenges and limitations

So, why not just use an existing interactive graph visualization tool?

First and foremost, the intention was to create a portrait, packing several dimensions of data in a form legible to a general reader whose first impulse will not be to plot the Eigenvector Centrality Distribution of the graph.

I believe that the current representation offers an immediate understandable view of an unfamiliar space to a newcomer or outsider. For a veteran in the community, it offers a mirror reflecting how algorithms describe his/her persona, as well as the opportunity to discover proximity relationships and clustering patterns.

Off-the-shelf existing graph exploration tools require heavy computational resources and lose interactivity when networks reach over 10,000 nodes. The networks visualized here are close to 1 million nodes and hundreds of millions of edges.

In an ideal world, where our browsers could display 12 million triangles [6] at 30 fps , this portrait would be an interactive 3D experience, allowing filtering, layout remapping according to various algorithmic lenses, as well as controlling the camera direction, angle of view or type of perspective.

In order to be usable and responsive in the browser, Ekisto presents just one slice of time and virtual space, and renders only one view of the data.

The Visualization Pipeline

I will follow up with a detailed discussion of the process, including techniques and tools that I used. For now, here is a brief outline:

  1. Data hustlin’
  2. Sanitizing and pre-processing
  3. Similarity vectors - number crunching
  4. OpenOrd, the good, the fast and the beautiful
  5. An unorthodox solution to node overlap
  6. The gigapixel image
  7. Baking the RTree, pickling the vertices
  8. Ekisto.js, bringing everthing to the browser
  9. Serving it all with Python, Flask and the Cloud

Future Work

An interesting possible future direction is to create a time-lapse portrait of Stackoverflow, following its growth at a resolution of one month or  several months (since regular data snapshots are available).

In the interactive part of the portrait, more data layers could be helpful in the self-exploration process. For example drawing the edges between nodes, highlighting incoming or outgoing links. This is, however, technically challenging for big hubs with lots of edges.

Another useful feature to add to the interactive side is the ability to highlight the neighborhoods, either by allowing users to tag them (“C++ island”, “Little Italy” ) or highlight them in a different data layer, as precomputed by cluster-detection algorithms.

References

[1] I was a research member of the Sociable Media Group between 2007-2009. Here is a link to our group publications: http://smg.media.mit.edu/papers.html

[2] The Social Machine: Designs for Living Online by Judith Donath, the core of our class readings and discussions: publication list Amazon link

[3] Metaphors We Live By by George Lakoff and Mark Johnson, excerpt Amazon link

[4] Ekisto, comes from ekistics, the science of human settlements.

[5] OpenOrd: an open-source toolbox for large graph layout by Shaun Martin et al, SPIE library paper link, Software link

[6] 1 million user blocks x 6 faces x 2 triangles = 12 million triangles. The 2000+ icon textures mapped onto the top users is another large resource needed to be handled by the browser.

6 Notes

  1. imclab reblogged this from processq
  2. processq posted this