File Name: graph theory and networks in biology .zip
Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. DOI:
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs.
In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats.
Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.
While most recent review articles focus on biomedical and biological networks and their applications McGillivray et al. The aim of this review is to tackle questions raised by today's increasing demands and aid researchers in understanding the graph theory behind the biomedical networks as well as concepts such as visualization, annotation, management, clustering, integration, etc.
To do this, we start with an introduction about graphs in discrete mathematics and their different types and we further describe the various data structures and file formats for storage and representation. In addition, we discuss several topological features and network properties, as well as concepts such as graph clustering, clustering comparison, network alignment, motif detection, and edge prediction.
We further comment on the various layout and graph drawing techniques as well as on methods regarding network alignment and link predictions and we highlight the state-of-the-art tools for analyzing such networks.
Finally, we try to bring graph theory into a biomedical context by providing a thorough description about the different types of biomedical networks and the sources used for their construction. We hope this review becomes a useful handbook for readers regardless of their scientific background and help non-experts in handling and interpreting networks more easily.
In general, networks or graphs mathematical way of representing a network are used to capture relationships between entities or objects.
Examples of networks which we interact with in everyday life include the electricity grid, road maps, the world wide web, the internet, airline connections, citation and language networks, telecommunication channels, social networks, economic networks, and many others. Graph theory has been the established mathematical field for the study and the analysis of such networks and is applicable to a wide variety of disciplines, ranging from mathematics, physics, computer science, engineering, and sociology to biology and medicine Junker and Schreiber, ; Pavlopoulos et al.
In the biomedical field for example, many biological networks consist of molecules such as DNA, RNA, proteins and metabolites, and graphs can be used to capture the interactions between these molecules. Therefore, it is essential to know the various network types which can be used, in order to be able to communicate and visualize such interactions. While one graph can have multiple representations, two different graphs may be isomorphic if they contain the same number of vertices connected in the same way.
Examples are shown in Figures 1A—C. Figure 1. Network representations and types. F Semantic graph. N A five-node clique on the right. Any node is connected with any other node. There are various graph categories. The most known are undirected, directed, weighted, bipartite, multi-edge, hypergraphs , and trees. In such case, vertices i and j are called direct neighbors e. A graph is called directed if an edge between vertices i and j is represented by an arrow, thus indicating a direction from vertice i to vertice j or vice versa.
Notably, in biology there are a number of directed relationships which can be graphically shown as different arrow types toward a semantic approach e. Most of the times, the weight w ij of the edge between nodes i and j represents the relevance of the connection e. A graph is called multi-edge if it contains multiple edges or otherwise parallel edges that are incident to the same two vertices e.
A simple graph for example, has no multiple edges. A hypergraph consists of a set of vertices V and a set of hyperedges E where an edge can join any number of vertices e. A tree is an undirected graph in which any two vertices are connected by exactly one path, or equivalently a connected acyclic undirected graph e. Examples of the various graph types are shown in Figures 1D—L.
A graph is connected if there is a path from any point to any other point in the graph. In a complete graph , every pair of distinct vertices is connected by a unique edge.
A cluster Figure 1M is a graph formed from the disjoint union of complete graphs and a clique Figure 1N in an undirected graph is a subset of vertices such that every pair of vertices in the clique is connected.
A network can be stored as i adjacency matrix, ii adjacency list, or iii sparse matrix. In the case of a simple graph, the adjacency matrix is a Sabidussi, ; Yue et al. In both undirected simple and weighted graphs, the adjacency matrix is symmetric equal to its transpose-rows and columns are the same. In the case of directed graphs, the matrix is not symmetric, thus differentiating its upper triangular part from its lower triangular part ij is not the same as ji.
An overview of adjacency matrices and their representations are shown in Figures 2A—C. Figure 2. Adjacency matrices and alternative data structures. B A directed graph represented by a non-symmetric adjacency matrix.
C A simple weighted graph. D The bipartite graph and its adjacency matrix. E The graph's projections. In the projected network colored as green, node V 1 for example is connected to node V 2 through node node V 4. F The upper triangular part of the adjacency matrix. G The upper triangular part of the adjacency matrix in a linear form.
H The graph presented as an adjacency list. Each vertex is accompanied by a list containing all other vertices adjacent to it. I A data structure for efficiently storing sparse matrices with many zeros. The first two rows indicate the coordinates in an adjacency matrix, whereas the third column contains the connection weight. Bipartite graphs, as opposed to generic networks, have their own characteristics Pavlopoulos et al. One major property is that any bipartite graph can be presented as two biadjacency matrices or otherwise projections.
While in an original bipartite graph, vertices which belong to a set are not connected to each other, in its biadjacency form they are connected through nodes that belong to the other set indirect connections.
This concept is described in Figures 2D,E , whereas an extensive review about their biomedical application can be found elsewhere Pavlopoulos et al. Adjacency matrices are memory inefficient for storing larger sparse networks as they require O V 2 memory. Notably, the O notation in graph theory is a theoretical measure to classify algorithms according to how their running time or space requirements grow as the input size grows Knuth, Let's assume that in a gene co-expression network, one wants to store an all-vs.
To partially overcome this barrier, a simple approach would be to take advantage of the adjacency matrix symmetry by only storing the upper triangular part in an array B in a linear form Figures 2F,G. The linear representation B requires V V - 1 2 memory which is half the size compared to the memory needed for a complete adjacency matrix A. For sparse networks, adjacency lists are proposed as an alternative data structure.
An adjacency list is an array A of separate lists. Each element of the array A i is a list, which contains all the vertices that are adjacent to vertex i. If the graph G is weighted, then each item in the adjacency list is either a two-item array or an object, giving the vertex number, and the edge weight Figure 2H.
Moreover, finding all vertices adjacent to a given vertex in an adjacency matrix representation, requires O V time, whereas in an adjacency list such operation is as fast as reading the corresponding list smaller length. An alternative to the adjacency list, is the use of a sparse matrix data structure. In such case only the non-zero elements are kept along with their coordinates and everything else is discarded as non-informative. An example of such a data structure is shown in Figure 2I where the first row keeps the i coordinate for each element in A [ i, j ], the second row the j coordinate in A [ i, j ] and the third row the weight w ij.
In the case of unweighted simple graphs referring to the default value which equals to 0, indicating that no link exists , the third row can be completely skipped, remembering that w ij is always one. As degree deg i , we define the total number of edges adjacent to a vertex. The indegree refers to the number of arcs, incident from the vertex, whereas the outdegree to the number of arcs incident to the vertex.
In a social network for example, the indegree would represent the followers, whereas the outdegree the people one follows. Looking at all nodes in a network, in order to study the degree distribution p k , we consider the probability that a randomly selected vertex has degree equal to k.
The same information can also be found as cumulative degree distribution p c k which shows the a-posterior probability of a randomly selected vertex to have degree larger than k. Notably, the degree distribution is one of the most important topological features and is characteristic to different network types. In the simplest case, p k can be estimated by a histogram of degrees. An example is shown in Figure 3B. Networks, whose degree distribution follow a power law, are called scale-free networks.
Figure 3. Network properties and topological features. Each node's size has been adjusted according to its degree. Network has been visualized with Cytoscape. B A scatterplot histogram showing the degree distribution. The Y axis holds the values about how many nodes have certain degree values in X axis.
C Clustering coefficient. Notably dotted lines represent the direct connections of node V , whereas the solid lines represent the connections between the first neighbors of node V. E The closeness centrality in blue, the betweenness centrality in red and the eccentricity centrality in orange. The graph consists of 6 nodes and 5 edges. Density is the ratio between the number of edges in a graph and the number of possible edges in the same graph.
In a fully connected graph e. The Clustering coefficient is a measure which shows whether a network or a node has the tendency to form clusters or tightly connected communities e. The clustering coefficient of a node is defined as the number of edges between its neighbors divided by the number of possible connections between these neighbors. An example is shown in Figures 3C,D.
Metrics details. Understanding complex systems often requires a bottom-up analysis towards a systems biology approach. The need to investigate a system, not only as individual components but as a whole, emerges. This can be done by examining the elementary constituents individually and then how these are connected. The myriad components of a system and their interactions are best characterized as networks and they are mainly represented as graphs where thousands of nodes are connected with thousands of vertices. In this article we demonstrate approaches, models and methods from the graph theory universe and we discuss ways in which they can be used to reveal hidden properties and features of a network.
Markov Chains and Random Walks64 4. Kruskal's Algorithm 1. In the field of microbiology, graph can express the molecular structure, where cell, gene or protein can be denoted as a vertex, and the connect element can be regarded as an edge. Many algorithms are used to solve problems that are modeled in the form of graphs… These things, are more formally referred to as vertices, vertexes or nodes, with the connections themselves referred to as edges. Part of Springer Nature. As an effective modeling, analysis and computational tool, graph theory is widely used in biological mathematics to deal with various biology problems. Directed Graph.
Section 4 is concerned with the application of graph theoretical measures of centrality or importance to biological networks. In particular, we shall.
Eigenvector Centrality61 3. Likewise, graph theory is useful in biology and conservation efforts where a vertex can represent regions where certain species exist or inhabit and the edges represent migration paths or movement between the regions. Page Rank67 Chapter 6.
Skip to search form Skip to main content You are currently offline. Some features of the site may not work correctly. Beranek and V. Beranek , V. In this paper we will present some basic concepts of network analysis. We will present some key aspects of network analysis on analysis of social network. These methods are used to simulate the properties observed in biological networks as well.
Graph theory deals with the mathematical study and analysis of networks. These networks play a vital role in the environment and public health.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs and how to get involved. Authors: Oliver Mason , Mark Verwoerd. MN ; Quantitative Methods q-bio.
Advanced Technologies. The theory of complex networks plays an important role in a wide variety of disciplines, ranging from communications to molecular and population biology.
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features.
Objectives: This tutorial invited biologists, mathematicians and computer scientists to learn more about graph theory. Biologists learned how graph theory can inform their understanding of many common biological patterns that are in and of themselves graphs: pedigrees, fate maps, phylogenetic trees, metabolic pathways, food webs, epidemiological networks, interactomes, etc. Mathematicians and computer scientists learned how graph theoretical concepts such as interval graphs, planar graphs, trees, networks, Delaunay triangulations, Gabriel graphs, minimal spanning trees, etc.
Your email address will not be published. Required fields are marked *