HOME TUTORIAL STATISTICS ABOUT US REFERENCES

NET-GE tutorial NET-GE and Enrichment Analysis
NET-GE is a network-based method for enrichment analysis of human gene sets. More information about enrichment is here.
This server is free and open to all users and there is no login requirement. The home page is shown in Figure 1.

Figure 1. NET-GE: main web-page.

Start an Enrichment Analysis - Input box
To start an enrichment analysis fill the input box with the parameters of your interest. An example of filled input box is shown Figure 1.

Step 1
The server takes as input a set of identifiers (UniProtKB ACC; Gene symbols; ENSEMBL identifiers (ENSG)). Select the identifier format from the drop-down box and and paste a list of identifiers into the text box (one identifier per line; min:1 - max:200; Timing: 1-10 minutes depending on the size of the input set and/or the annotation database).
An example set of UniProtKB accession numbers can be loaded by clicking "Example - UniProt_ACC: OMIM #143465 - ADHD". Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disease of childhood affecting the cognitive and behavioral functions. It has been linked to variations in the dopamine receptors DRD4 (UniProtKB ACC: P21917) and DRD5 (UniProtKB ACC P21918), as reported in OMIM #143465.

Step 2
For enrichment analysis select a protein-protein interaction network.
The selected network has been utilized for the construction of modules containing protein directly annotated for a term (such as a GO term or a pathway) and some of their interacting partners. Details about modules are here and in Di Lena et al., 2015. NET-GE gives the possibility to choose between two human protein-protein interaction networks:
- String: this network contains all the links with documented action, irrespectively of the STRING score and of the supporting evidence (current version: String 10)
- String0.9: this network contains all the links with documented action and having a STRING score >=0.9 irrespectively of the supporting evidence.

Step 3
NET-GE allows searching for overrepresented terms either in processes or in pathways.
NET-GE offers the annotations of five different process/pathway databases:
- KEGG: a collection of manually drawn pathway maps representing the knowledge on the molecular interaction and reaction networks
- REACTOME: a curated databased of proteins and small molecules partecipating in pathway involved in cellular events
- Gene Ontology - Biological process: a biological process term describes a series of events accomplished by one or more organized assemblies of molecular functions
- Gene Ontology - Molecular function: a molecular function term describes activities that occur at the molecular level
- Gene Ontology - Cellular component: a cellular component term describes a component of a cell that is part of a larger object, such as an anatomical structure.

Step 4
Choose a method for multiple testing correction. NET-GE implements both the Bonferroni and the Benjamini-Hochberg (False Discovery Rate (FDR)) procedures.

Step 5
Choose a significance threshold. Only terms with a significance lower than the selected threshold are reported.
Lower significance levels indicate higher enrichment.

Step 6
Enter your e-mail (optional). You will receive an e-mail with a link to access your results.

Step 7
Click "Submit" to run the enrichment analysis. You will be redirected to the Job Summary page listing the status of the jobs (Figure 2). By bookmarking the page you can access results at a later time.
To check the status of the job please refresh the page (or click on the link; the page is automatically refreshed every 40 seconds). By clicking on the results link, you will be redirected to the results page only if the submitted job is finished.


Figure 2. NET-GE job summary web-page.

Output
The output generated by NET-GE consists of three elements:
1) a color-coded graph of all significantly enriched terms with their information content (Figure 3). The graph can be downloaded by clicking on "Save image (*.svg)".

Figure 3. NET-GE shows a color-coded graph of all significantly enriched terms (KEGG database).

2)the list of enriched terms given as output by the standard method, ranked by their corrected p-value ( Figure 4 )

Figure 4. Terms enriched by the standard method (KEGG database). Table legend is given below.

3) the list of enriched terms given as output by the network-based method, ranked by their corrected p-value (Figure 5).


Figure 5. Terms enriched by the network-based method (KEGG database). Terms not included in the annotations of the input set are highlighted with the double star symbol (N**).

The two figures show 7 different fields:
  1. Enrichment - It reports if the term derives from the standard enrichment (S) or from the network-based enrichment (N). Regarding the network-based method, terms not included in the annotations of the input set are highlighted with the double star symbol (N**).
  2. TERM - It reports the identifier of the term. The term is linked to the corresponding database (AmiGO2 or KEGG or REACTOME) containing detailed information about it.
  3. N1 - The number and the list of input IDs associated to the term. By clicking "[+] Show genes" it is possible to show the input proteins/genes associated to the term. Genes are given in the format "HGNC_identifier:Gene_name". If the HGNC identifier or the gene symbol is not available, info are given by providing also the UniProtKB ACC (and marked by the double star symbol; identifiers are separated by the semi-coloum simbol).
  4. N2 - The number and the list of genes associated to the term. By clicking "Show genes" it is possible to show the genes associated to the term. Also in this case, genes are given in the format "HGNC_identifier:Gene_name". If the HGNC identifier or the gene symbol is not available, info are given by providing also the UniProtKB ACC (and marked by the double star symbol; multiple identifiers are separated by the semicolon simbol). By clicking "Show protein info" it is possible to show also the proteins (UniProtKB ACC) linked to the the genes associated to the term. Info are listed in blocks collecting all the proteins referring to the same gene. Each row consists of the block identifiers (format: number_gene), the UniProtKB ACC, the HGNC identifier, the gene symbol (if available) and the genomic coordinates (if available; human genome build 37).
  5. Corrected p-value - A Bonferroni- or Benjamini-Hochberg- corrected p-value of the Fisher's exact test.
  6. Description - The name of the term.
  7. Graph visualization link to the the Term-specific network (when available). Explanation is given below.

Term-specific network
This page shows a color-coded graph of the term-specific network; only the first neighbours of the submitted IDs are visualized (Figure 6). A link to the second neighbours of the submitted IDs and to the whole network is provided.
"Submitted IDs" refer to proteins given in input and mapping on the term-specific network; "Seed nodes" represents genes annotated with the term; "Connecting nodes" represents connecting genes.
The term-specific network can be downloaded as plain text files, where information of node and arcs are provided.

Figure 6. Term-specific network (KEGG - hsa04723 pathway); only the first neighbours of the submitted IDs are visualized.
Blue circles: connecting nodes; Orange circles: seed nodes; Purple ring: submitted ID.

Complete annotation set of the input set
The complete set of annotations for each submitted ID is available. Annotations are reported both for the standard method and for the network-based method.
For each term protein info are provided in blocks as also reported in the "N2" field of the tables previously described.


Study case: Obsessive compulsive disorder
An application of NET-GE is reported in Bovo et al.,2016: doi:http://dx.doi.org/10.18547/gcb.2017.vol3.iss3.e45.
The list of the 27 UniProKB IDs used in the study case can be downloaded here