In the following the frequency information about the data (e.g. ambiguity, more frequent types, etc.) is provided.
Average (term) ambiguity = number_of_entries_in_the_db / number_of_different_terms.
Average synonymy (i.e. ID ambiguity) = number_of_entries_in_the_db / number_of_different_ids.
Average synonymy can be understood as the average size of the synsets where a synset is a collection of terms that an ID can be expressed by. Each ID has its own synset (i.e. the synset can be named by the name of the ID). If there is ambiguity then some synsets overlap.
Different terms: 2347734
Average ambiguity of terms: 2.00633035940187
Frequency | Term | Description |
---|---|---|
6182 | hypothetical protein | |
2891 | MHC class I antigen | |
2241 | Cytochrome b | |
1973 | CYTB | |
1769 | Cytochrome b-c1 complex subunit 3 | |
1769 | Complex III subunit 3 | |
1769 | Ubiquinol-cytochrome-c reductase complex cytochrome b subunit | |
1769 | Complex III subunit III | |
1706 | HLA-B | |
1623 | COB | |
1620 | MTCYB | |
1602 | MT-CYB | |
1436 | MHC class II antigen | |
1394 | major histocompatibility complex, class I, B | |
1386 | B7.2 |
Frequency | Token count | Description |
---|---|---|
1205278 | 1 | |
372704 | 2 | |
258818 | 3 | |
191899 | 4 | |
116389 | 5 | |
76966 | 6 | |
44508 | 7 | |
29174 | 8 | |
18545 | 9 | |
11859 | 10 | |
7734 | 11 | |
4665 | 12 | |
3057 | 13 | |
1714 | 14 | |
737 | 15 | |
476 | 16 | |
474 | 19 | |
434 | 18 | |
339 | 17 | |
275 | 20 | |
178 | 21 | |
162 | 22 | |
132 | 29 | |
130 | 28 | |
126 | 27 | |
121 | 31 | |
119 | 25 | |
112 | 26 | |
110 | 23 | |
109 | 30 | |
109 | 24 | |
85 | 32 | |
75 | 33 | |
56 | 34 | |
26 | 35 | |
18 | 36 | |
12 | 37 | |
3 | 40 | |
3 | 38 | |
1 | 61 | |
1 | 39 | |
1 | 43 |
Ambiguity | Term | Types |
---|---|---|
8 | TPP | molecule:ChEBI, molecule:ChemIDplus, compound:CAS, human:UNIPROT, PROT, molecule:KEGG_COMPOUND, enzyme:EC, GEN |
8 | CTX | disease:UMLS, molecule:KEGG_DRUG, human:UNIPROT, drug:KEGG, drug:DrugBank, PROT, GEN, compound:CAS |
8 | MEA | disease:UMLS, human:UNIPROT, molecule:ChEMBL, molecule:ChemIDplus, PROT, drug:DrugBank, GEN, compound:CAS |
8 | CPZ | molecule:KEGG_DRUG, human:UNIPROT, drug:KEGG, molecule:ChemIDplus, drug:DrugBank, PROT, GEN, compound:CAS |
8 | PCP | compound:CAS, disease:UMLS, drug:KEGG, human:UNIPROT, PROT, molecule:KEGG_COMPOUND, GEN, enzyme:EC |
8 | CP | disease:UMLS, molecule:ChEBI, human:UNIPROT, drug:KEGG, drug:DrugBank, PROT, GEN, compound:CAS |
8 | PAH | molecule:ChEBI, molecule:DrugBank, compound:CAS, human:UNIPROT, PROT, molecule:SUBMITTER, enzyme:EC, GEN |
7 | ETA | molecule:ChEBI, human:UNIPROT, drug:DrugBank, PROT, molecule:KEGG_COMPOUND, GEN, compound:CAS |
7 | ADH | disease:UMLS, human:UNIPROT, PROT, GEN, molecule:KEGG_COMPOUND, enzyme:EC, compound:CAS |
7 | CD | disease:UMLS, molecule:DrugBank, MI, human:UNIPROT, PROT, GEN, compound:CAS |
7 | CDP | molecule:NIST_Chemistry_WebBook, molecule:UniProt, human:UNIPROT, PROT, drug:DrugBank, molecule:KEGG_COMPOUND, compound:CAS |
7 | TTP | disease:UMLS, molecule:ChEBI, human:UNIPROT, PROT, molecule:KEGG_COMPOUND, GEN, compound:CAS |
7 | AMP | molecule:UniProt, molecule:ChEBI, human:UNIPROT, PROT, GEN, molecule:KEGG_COMPOUND, compound:CAS |
7 | ATP | molecule:UniProt, molecule:ChEMBL, drug:KEGG, human:UNIPROT, PROT, molecule:KEGG_COMPOUND, compound:CAS |
7 | PAM | disease:UMLS, molecule:ChemIDplus, human:UNIPROT, PROT, GEN, enzyme:EC, compound:CAS |
7 | PGA | molecule:NIST_Chemistry_WebBook, disease:UMLS, molecule:ChEBI, human:UNIPROT, PROT, GEN, compound:CAS |
7 | HEP | molecule:NIST_Chemistry_WebBook, disease:UMLS, human:UNIPROT, PROT, enzyme:EC, GEN, compound:CAS |
7 | DNA | molecule:UniProt, MI, human:UNIPROT, PROT, molecule:KEGG_COMPOUND, compound:CAS, molecule:IUPAC |
7 | NA | molecule:ChEBI, molecule:ChEMBL, drug:KEGG, human:UNIPROT, PROT, GEN, compound:CAS |
7 | PA | molecule:ChEBI, drug:KEGG, human:UNIPROT, PROT, drug:DrugBank, GEN, compound:CAS |
Unchanged | No whitespace | Alphanumeric | Lowercase | Alpha | |
---|---|---|---|---|---|
ID_ORG | 2.43562737417767 | 2.43811966525815 | 2.46853332590569 | 2.55692978139286 | 9.99613695360654 |
ID | 1.05875795915954 | 1.05921690918326 | 1.06319099451482 | 1.06834813308531 | 4.13384102456848 |
Different IDs: 1462783
Average ambiguity of IDs: 3.220115355456
Frequency | ID | Description |
---|---|---|
5811 | CLKB:HUMAN | |
1703 | CLKB:MOUSE | |
746 | compound:CAS:CAS:95422-24-5 | |
701 | compound:CAS:CAS:83665-54-7 | |
639 | compound:CAS:CAS:499-02-5 | |
634 | compound:CAS:CAS:4836-13-9 | |
549 | compound:CAS:CAS:103-90-2 | |
542 | compound:CAS:CAS:119459-68-6 | |
528 | CLKB:_CLKB | |
517 | compound:CAS:CAS:13243-65-7 | |
482 | compound:CAS:CAS:2847-00-9 | |
459 | compound:CAS:CAS:69408-81-7 | |
456 | compound:CAS:CAS:100676-10-6 | |
438 | compound:CAS:CAS:122864-73-7 | |
381 | compound:CAS:CAS:63712-45-8 |
Different types: 46
Frequency | Type | Description |
---|---|---|
1599655 | PROT | UniProtKB protein name |
1005216 | GEN | UniProtKB gene name |
820721 | human:UNIPROT | |
624331 | compound:CAS | |
424748 | disease:UMLS | |
27430 | enzyme:EC | |
25893 | molecule:IUPAC | |
22866 | symptom:UMLS | |
20900 | drug:DrugBank | |
17829 | molecule:ChEBI | |
17001 | ocs | NCBI common name, species or below |
16341 | molecule:ChemIDplus | |
12414 | drug:KEGG | |
10765 | molecule:ChEMBL | |
10687 | molecule:KEGG_COMPOUND | |
8878 | ogs2 | oss name, genus abbreviated (e.g. `A. thaliana') |
8877 | oss | NCBI scientific name, species or below |
8727 | CLKB | CLKB cell line name |
4469 | molecule:NIST_Chemistry_WebBook | |
3917 | molecule:UniProt | |
3316 | oca | NCBI common name, above species |
2908 | molecule:PDBeChem | |
2561 | osa | NCBI scientific name, above species |
2270 | molecule:Chemical_Ontology | |
2189 | MI | PSI-MI term |
1061 | molecule:DrugBank | |
1056 | molecule:JCBN | |
1011 | molecule:SUBMITTER | |
551 | molecule:KEGG_DRUG | |
264 | molecule:UM-BBD | |
258 | molecule:MolBase | |
231 | molecule:CBN | |
215 | molecule:LIPID_MAPS | |
186 | molecule:WHO_MedNet | |
146 | molecule:IUBMB | |
121 | molecule:Patent | |
96 | molecule:KEGG_GLYCAN | |
79 | molecule:IUPHAR | |
53 | molecule:RESID | |
41 | molecule:COMe | |
24 | molecule:EMBL | |
14 | molecule:PDB | |
7 | ogs1 | NCBI selected genus name (e.g. `Arabidopsis') |
4 | molecule:EuroFIR | |
2 | molecule:Beilstein | |
1 | molecule:EBI_Industry_Programme |