* Closeness centrality
- measures the average distance of a vertex from the other vertices
- mean shortest distance from i to others is:
l_i = (1/n)\sum_j d_{ij}
- closeness centrality:
C_i = 1/l_i = n/\sum_j d_{ij}
- closeness centrality and degree centrality are positively correlated
- when there are more than 1 components, this definition presents
a problem because then l_i is infinite for all i, and C_i is 0
- two solutions:
- average over only the vertices in the same component as i
- small vertices have small average distances to other vertices.
This increases their C_i. This is counterintuitive: vertices in
larger components should be more important
- redefine the closeness centrality as the average of the inverse
distances:
C_i = (1/(n-1)) \sum_j(j\neq i) (1/d_{ij})
Now d_ij can be considered to be infinity when i and j lie in
different components. Also, this gives more weight to the vertices
that are closer, which is what naturally should happen
- calculation is straightforward using the BFS, and takes time O(m + n)
* Betweenness centrality
- measures the number of shortest (geodesic) paths in the network that
pass through the vertex. Thus, it tells us, how much the vertex falls
"between" the other vertices.
- if vertices exchange information only via shortest paths, then the
vertices with high betweenness have huge control over the that
information, and can be very influential.
- the high betweenness vertices are also the ones, whose removal will
disrupt the communication in the network quickly.
- in real-world networks, not all communications happen via the shortest
paths, neverthless, betweenness centrality is a good idea to the influence
of vertices on the flow of information in a network
- when there is only one shortest path between every pair of vertices
x_i = \sum_{st} n_{st}^i
- Here, n_{st}^i is the number of shortest paths between vertices
s and t that pass through i
- Each path gets counted twice (from s to t and then from t to s)
- shortest path from every vertex to itself is also included
- start and end points for each path are also included
- maximum and minimum values of betweenness are n^2-n + 1 for the central
vertex in the star graph, and 2n-1 for a leaf
- when multiple paths exist between a given pair, we give equal weight to
each path: if n_{st} paths exist, each path gets weight 1/n_{st}
- this leads to the following redefinition:
x_i = \sum_{st} n_{st}^i/n_{st}
where n_{st} is the total number of shortest paths between s and t
- the same definition works even for directed networks
- betweenness can be large even for vertices with very low degree or very
low closeness. Vertices with very high betweenness are sometimes called
"brokers" in sociology
- Sometimes the values are divided by n^2 so that vertices in networks of
different sizes can be compared