The RDF endpoint is reachable at : https://glyconnect.expasy.org/glystreem/sparql
No user interface is yet ready to send Sparql query to the endpoint.
Data is licenced under CC BY 4.0
Write a command including the query to select 10 predicates and return data as json :
curl -X POST https://glyconnect.expasy.org/glystreem/sparql --data-urlencode 'query=SELECT * { ?s ?p ?o
} LIMIT 10' --data-urlencode 'format=json'
The SPARQL query can be loaded from a file (count_structures_tn_antigen.rq in the example):
curl -X POST https://glyconnect.expasy.org/glystreem/sparql -H 'content-type: application/sparql-query'
-H 'accept: application/sparql-results+json' --data-binary @count_structures_tn_antigen.rq
To estimate the running time of SPARQL queries we tested against three triple stores of different sizes, 191, 2790 & 4808 structures. Real time is wall clock time, and user and system times together give total CPU time. These runs were timed on a DELL Latitude 7140 running Ubuntu 20, GraphDB 9.9.0 and python 3.9 (to automize the queries). There were 33 queries run, of which a subset are presented in the paper (available on this page). As can be seen from the table, the CPU time varies little between the different size datasets. The real time increases linearly with the increase in size.
Using extrapolation we predict a real time of just under 1.5 minutes to run 30 queries on a dataset of 100,000 structures (see chart). Data point 1 = Dataset 1 (a, b), Data point 2 = GlySTreeM (c, d), extrapolate to x (100,000 structures) using formula,
`f(x) = b + (x - a) * (d - b)/(c - a)`
Size | Avg Real Time (s) | Avg User Time (s) | Avg Sys Time (s) | Avg CPU Time (s) | |
---|---|---|---|---|---|
Dataset 1 | 191 | 0.71 | 0.272 | 0.025 | 0.297 |
Dataset 2 | 2780 | 2.713 | 0.259 | 0.029 | 0.288 |
GlySTreeM | 4808 | 4.544 | 0.262 | 0.017 | 0.279 |
Extrapolate | 100,000 | 83.592 | 0.056 | -0.148 | -0.092 |
The 30 queries were also run on a remote instance of GlySTreeM - this took 9.5 seconds in real time and 0.5 seconds total CPU time. Using the above table we estimate that it would take just under 3 minutes to run the queries on a remote instance of a 100000 size dataset. However, this needs to be confirmed.