Network computations
* General comments about algorithms and softwares
- most network analysis is performed using computers
- many networks have large sizes (sometimes million vertices and
trillion edges)
- in the early days computations were performed by hand
- computers were unavailable or expensive or difficult to use
- networks were small
- network algorithms
- several network calculations require some thought for accuracy and
efficiency
- softwares
- many excellent softwares are available
- however, understanding the working of the algorithms is essential
because one must be able to make sense of the answers a software
is producing. Also sometimes what we need is not available in
packages. Finally, research questions should shape the packages; not
the other way around
* Running time and computational complexity
- it is always good to estimate the amount of time a code would take to
finish before actually writing it!
- computational complexity is the measure of the running time of an
algorithm, as a function of the input size in the worst possible case
- finding the maximum number in a list has complexity O(n)
- O(n^k) means the running time varies to the leading-order as a constant
times n^k
- O(1) denotes the constant time
- For networks, there are two inputs: n and m
- An algorithm like the breadth-first-search for which the running time
is am + bn is denoted by O(m) + O(n), or by O(m+n). For sparse graphs,
this reduces to O(n)
- estimating the running time helps in practice: we could estimate the
actual time required for a big network calculation by running the
computation on a much smaller network. (This gives the constant C)
- pre-analysis of finding complexity, generating test networks, performing
short runs, doing scaling calculations is an extra work, but is worth it
- memory architecture of computers also affect the actual running times
- also, since complexity only gives the worst-case behavior, actual run
times might be much shorter than the estimates
* Storing network data
-This is usually the first step in network computations: read the network
data from a file, and then store it in a certain data structure to work
with it. Many ways exist for storing network data inside files: e.g.
stroring data about every vertex and every edge. How the data is stored
inside the data structures usually has a huge impact on the performance of
the algorithms
- As a first step, we must uniquely label the nodes. If data is already
stored in a computer file, usually this is already done
- Information attached to the vertices can be saved in arrays of length n
- storing edges is a more complex matter
- The adjacency matrix
- easy to implement as a 2d-array (of integers or of floats)
- Mathematical formulas can be directly turned into computer
expressions
- adding or removing an edge takes O(1) time. Checking if an edge
exists also takes O(1) time
- there is a redundancy for undirected networks because adjacency
matrices are symmetric
- not always convenient: finding the neighbors takes O(n) time
- inefficiently uses memory for sparse graphs
- however for dense graphs or small graphs, usually it is a good
choice
- The adjacency list
- most widely used method to store network data
- it's a set of lists, one for each vertex i, and each list contains
the labels of neighbors of i
- usually a 2d-array is used to store with an additional array for
degree values of the vertices
- for undirected networks, each edge appears twice
- for directed networks, it is better to use two adjacency lists even
though one list contains the whole information
- self-loops and multiple edges are represented by using multiple
entries
- adding an edge takes O(1) time
- finding and removing an edge takes around O(m/n) time, which becomes
O(1) on sparse networks
- deletion is achieved by overwriting the element to be deleted by the
last element in the list and decreasing the degree by 1. This takes
O(1)
- listing all the neighbors of a vertex takes O(m/n) time
- Other representations
- Using both adjacency list and adjacency matrix makes sense for
medium sized networks: adding, deleting and finding edges can be done
in O(1) time and listing the neighbors can be done in O(m/n) time
- Edge list is usually used to store networks inside files