Why Community Finding?
Community finding serves two purposes. First, it allows us to focus our attention to proteins closer to known telomerase-associated proteins. This is especially important for computationally taxing algorithms, as we will soon see for diffusion distance. Second, it gives us a group of biologically related proteins that we can associate to a functional group in the cell. In essence, this gives us an idea of the amount of ‘damage’ to the cell when the telomerase protein (or indeed the target protein) is knocked out; we will revisit this important idea in our target evaluation. Community finding therefore is a natural beginning point of our network theory analysis.
Random Walk Philosophy
Our understanding of protein interactions suggests that most important biological functions within a cell are carried out by protein complexes (such as our telomerase complex) and even larger structures of functional groups, rather than individual proteins in themselves. These are densely-connected clusters of proteins, often resembling network cliques and k-plexes (1). We therefore expect our PPI network to have many densely-connected clusters and sparser connections between these clusters.
In light of this, and also in the hope of representing the probabilistic nature of protein interactions, we used a community detection algorithm based on random walks, WalkTrap. Existing literature suggests that random walks-based methods are more effective than traditional community detection methods at mitigating the noisy and lossy nature of PPI networks (2). Given the dense clusters that represent protein complexes and functional groups in our network, it’s likely that a random walker will become trapped in these regions, giving a set of communities that might be more reliable than methods based on modularity and partitioning.