* Calculation of the betweenness centrality
- the simplest way is to find the shortest paths between every pair of
vertices, and then count how many of them go through the vertex v for
which we want to calculate the betweenness
- finding paths between s and t using two-source BFS takes O(m/sqrt(n)),
and for n^2 pairs, this would go as O(mn^3/2), or as O(n^5/2) on sparse
graphs. This is a bit inefficient
- instead, we can use BFS and construct the shortest-path tree for every
vertex s in time O(m + n). Then, we can traverse a path from every vertex
to s in time O(nlog n), because diameter is approximately log n. Thus, the
total time would go as O(m + n + nlog n) = O(m + n log n). Repeating this
for every vertex as a source, the total time would be O(mn + n^2log n)
- we can do even better by noting that many of the paths in the shortest
path tree share edges
- we run the following algorithm for each vertex s as the source
- let w_i be the number of shortest paths to the source
that start at i
- during the BFS starting at s, we make w_s = 1 (clearly there is
only 1 shortest path from s to s)
- for a vertex i at distance d, when we find neighbor j with
unassigned distance, we set w_j = w_i
- if the neighbor j has already been assigned a distance d+1, we set
w_j <- w_j + w_i
- the fraction of paths that pass through j and also pass through i is
w_i/w_j
- let x_j(s) represent the contribution to the betweenness of vertex j
when all shortest paths to the source s are considered. Then x_j(s)
can be computed starting from the "leaves" of the shortest-path tree.
This can be done conveniently by iterating over the queue array in the
reverse fashion
- start by making x_i(s) = 1 for all i. This is because we know for
sure that all the shortest paths to the s that start at i, pass
through i (and hence the ratio of the two is 1)
- update the score x_i(s) for all the predecessors i of j as:
x_i(s) <- x_i(s) + x_j(s) * w_i(s) / w_j(s)
this is because the predecessor i gets w_i(s) / w_j(s) fraction of the
total score x_j(s), because that is exactly the fraction of paths that
pass through j and also pass through i
- by updating x values over all possible sources, we get the final
betweenness scores