# reading in Saccharomyces cerevisiae PPI network
G0 = nx.read_weighted_edgelist("4932.protein.links.v12.0.txt",comments="#",nodetype=str)
# remove edges below threshold schore
threshold_score = 700 #HIGH CERTAINTY
for edge in G0.edges:
weight = list(G0.get_edge_data(edge[0],edge[1]).values())
if(weight[0] <= threshold_score):
G0.remove_edge(edge[0],edge[1])
# take largest connected component
largest_cc = max(nx.connected_components(G0),key=len)
G = G0.subgraph(largest_cc)Why Network Theory?
PPIs, Network Theory and Yeast
Protein-protein interaction (PPI) networks allow us to study the behaviour and impact of proteins in cells using the language of network theory. PPI networks are undirected, binary networks: each protein is represented as a node in this network, and an edge between nodes indicates that the proteins interact in the cell. While we are interested in human cancer cells and telomerase, at current the human interactome is not completely understood; Hart et. al. suggests the human interactome is only 10% complete (1). Yeast PPI networks, on the other hand, are better studied (perhaps, 50% complete (1)).
The string-db database
Here, we use the yeast (saccharomyces cerevisiae) PPI network from string-db. The weights of the edges in the network represent the probability of protein interaction as scores out of 1000. Since the network is already highly connected, to find meaningful protein targets we keep only edges with confidence scores greater than 700. We also only analyse the largest connected component:
Of interest to us are the EST1, EST2, EST3 proteins (making up the telomerase complex) and the TEL1 and POP1 target proteins.