Algorithms for basic network quantities
* Even for simple quantities, it is worth looking at the running times
- there might be more than one method to calculate the same thing, and
one of them might be significantly faster than the other
- also then one can make an estimate of the actual amount of time
needed on a computer
* Degrees
- for adjacency list format, we anyway keep degrees stored in an array
it takes O(1) to look up a degree value from this array
- for adjacency matrix, calculating degree takes O(n) time, hence it
makes sense to keep the degree array with us anyway
* Clustering coefficients
- local clustering coefficient: only check if there is an edge between
u < v (go through every pair only once).
- global clustering coefficient: number of triangles can be calculated
by going over pairs u < v of neighbors of each vertex. Factor of 3 is
automatically taken care of. Denominator takes constant time
- For scale-free networks, this calculation might take much longer
than other networks (even when adjacency matrix is used to check the
presence of an edge)
* Assortativity coefficient
- It is not a good idea to implement the formula for r directly
because of the double summation (O(n^2)).
- Define Se, S1, S2, S3: these take O(m) and O(n) respectively
* Distances, components and shortest paths
(1) Distances, components and the BFS:
- shortest distances in a network can be computed using an algorithm
called 'breadth-first-search' or BFS for short. It works for undirected
as well as directed networks.
- one run of the algorithm computes distances from a given vertex s, to
all other vertices in the network. Usually s is called a source vertex
- the algorithm works by assigning distances to vertices starting from the
source. Initially, s is assigned distance 0 (clearly s is at distance 0
from itself!), and all other vertices are assigned an unknown distance.
Also, a variable d is set to 0
- at each stage, all the vertices at distance d are located. For each of
those vertices, its neighbors are located, and assigned distance d+1 if
they have not yet been assigned the distance already.
- algorithm stops when at any given point, no new neighbors with
unassigned distance are found. At this stage, distances to all the
vertices in the component of s are located. Hence, apart from the
distances, the algorithm also finds the component to which the source
vertex belongs. Thus, the algorithm can be used to find all the components
and their sizes.
(2) Naive implementation of BFS
Steps:
1. create an array of size n to store distances of vertices from the
source s. Assign unknown values to all the elements (for example, by
making all elements -1)
2. create a variable d, and assign value 0 to it. Also, assign value 0 to
the source in the distance array
3. find all the vertices at a distance d from the source vertex using the
distance array. Locate neighbors of all these vertices
4. for each neighbor, find the distance from the distance array, if it is
unknown (i.e. is -1), assign distance d+1 to it. If all neighbors have
already been assigned distances, stop the algorithm
5. increse d by 1, and repeat from step 3
Running time:
- settting up the distance array takes O(n)
- at each stage, locating the vertices at distance d takes O(n). In the
worst case, this has to be done n times (because that is the maximum
possible value of the diameter). Thus, in total, this operation takes
O(n^2) time
- for each vertex, neighbors can be located in time O(m/n) if the network
is stored in the form of an adjacency list. This has to be done for each
vertex in the component once. In the worst case, the component could be
of size n, and so the total time of locating neighbors goes as O(n * m/n)
or O(m)
- hence the total time the algorithm takes is O(m + n^2)
(3) A better implementation of BFS
In the naive implementation, most time is spent in locating vertices at a
distance d from the distance array. However, these vertices are simply the
neighbors to which distance was assigned on the previous step! If we could
cleverly save them, we could save all the time required to locate them.
This can be done using the following implementation
1. create distance array of size n to store the distances, and another
array (called queue array) to store the newly assigned vertices.
Initialize the distance array by making each element -1
2. create two variables called 'read pointer' and 'write pointer'
respectively. Initially, make read pointer 1, and write pointer 2. Store
vertex s in the first element of the queue array, and make its distance 0
in the distance array. Thus, at this stage, read pointer points to the
first element in the queue, while the write pointer points to the next
empty element
3. read the vertex pointed to by the read pointer, and also read its
distance d from the distance array (at the start, the read
pointer points to the first element of the queue, and hence the source
vertex is read).
4. Locate all the neighbors of this current vertex, and
to each neighbor, assign distance d+1 if the neighbor is still unassigned.
If assigned, save this neighbor at the position pointed by write pointer,
and increase the write pointer by 1.
5. increase the read pointer by 1. If the read and write pointer both
point to the same location, stop the algorithm. Else, , repeat from step 3
Running time:
- setting up distance array and queue array takes O(n)
- for each vertex read using the read pointer, locating neighbors takes
O(m/n) if the network is stored in the form of the adjacency matrix
- neighbors must be located for each vertex exactly once, and hence the
total time this operation takes is O(n * m/n) = O(m)
- thus, in total, this algorithm takes time O(m + n), much better than
O(m + n^2) of the naive implementation
- to find the distance between all vertex pairs, we use each vertex as
a source in turn. Thus, the distances between all pairs can be computed in
O(mn + n^2) or O(n^2) on sparse graphs.