Skip to content

Commit 1cd55bf

Browse files
author
n1mus
authored
Merge pull request #103 from kbase/DATAUP-734-test-escape-chars
Special Chars for Fulltext Search
2 parents 51078b0 + fd93db4 commit 1cd55bf

4 files changed

Lines changed: 190 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ These specifications are used by the [Relation Engine API](relation_engine_serve
1818
The relation engine server (`relation_engine_server/`) is a simple API that allows KBase community developers to interact with the Relation Engine graph database. You can run stored queries or do bulk updates on documents.
1919

2020
## Relation Engine Startup
21-
* Docker image is built with environment variable `SPEC_RELEASE_PATH=/opt/spec.tar.gz'. This contains the specs from the repo itself.
21+
* Docker image is built with environment variable `SPEC_RELEASE_PATH=/opt/spec.tar.gz`. This contains the specs from the repo itself.
2222
* Wait for response from auth, workspace, and arangodb services, as they are set up
2323
* Specs are set up. Either the repo specs or remote specs are loaded into the specs root path
2424
* Collections, views, and analyzers from the specs are added to the ArangoDB server. If the collection, view, or analyzer already exists, but in a different configuration, it will _not_ be overwritten.

spec/stored_queries/generic/fulltext_search.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
# Should be REVISED or DEPRECATED.
2+
# Is currently unused outside testing.
3+
#
14
# Search a collection with a fulltext index with an attribute name and search text
25
# Also supports filtering by outer-level attributes
36
# Not recommended for fast searching because it can be very slow and even timeout at 60s

spec/test/data/ncbi_taxon.json

Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2260,5 +2260,167 @@
22602260
"expired": 1612915015846,
22612261
"release_created": 1541030400000,
22622262
"release_expired": 1612137599999
2263+
},
2264+
{
2265+
"_key": "338794_2018-11-01",
2266+
"_id": "ncbi_taxon/338794_2018-11-01",
2267+
"_rev": "_b2jbO4G--D",
2268+
"id": "338794",
2269+
"scientific_name": "low G+C Gram-positive bacterium HTA462",
2270+
"rank": "species",
2271+
"strain": false,
2272+
"aliases": [],
2273+
"ncbi_taxon_id": 338794,
2274+
"gencode": 11,
2275+
"first_version": "2018-11-01",
2276+
"last_version": "2021-02-01",
2277+
"created": 1541030460000,
2278+
"expired": 9007199254740991,
2279+
"release_created": 1541030400000,
2280+
"release_expired": 9007199254740991
2281+
},
2282+
{
2283+
"_key": "586732_2018-11-01",
2284+
"_id": "ncbi_taxon/586732_2018-11-01",
2285+
"_rev": "_b2kB1gK--B",
2286+
"id": "586732",
2287+
"scientific_name": "Integrating expression vector pJEB403+drrA",
2288+
"rank": "species",
2289+
"strain": false,
2290+
"aliases": [],
2291+
"ncbi_taxon_id": 586732,
2292+
"gencode": 11,
2293+
"first_version": "2018-11-01",
2294+
"last_version": "2021-02-01",
2295+
"created": 1541030460000,
2296+
"expired": 9007199254740991,
2297+
"release_created": 1541030400000,
2298+
"release_expired": 9007199254740991
2299+
},
2300+
{
2301+
"_key": "1127597_2018-11-01",
2302+
"_id": "ncbi_taxon/1127597_2018-11-01",
2303+
"_rev": "_b2lFmce--B",
2304+
"id": "1127597",
2305+
"scientific_name": "Fusarium cf. solani 3+4-uuu DPGS-2011",
2306+
"rank": "species",
2307+
"strain": false,
2308+
"aliases": [],
2309+
"ncbi_taxon_id": 1127597,
2310+
"gencode": 1,
2311+
"first_version": "2018-11-01",
2312+
"last_version": "2021-02-01",
2313+
"created": 1541030460000,
2314+
"expired": 9007199254740991,
2315+
"release_created": 1541030400000,
2316+
"release_expired": 9007199254740991
2317+
},
2318+
{
2319+
"_key": "1173779_2018-11-01",
2320+
"_id": "ncbi_taxon/1173779_2018-11-01",
2321+
"_rev": "_b2lOxFa--_",
2322+
"id": "1173779",
2323+
"scientific_name": "Salmonella enterica subsp. diarizonae serovar 60:r:e,n,x,z15",
2324+
"rank": "no rank",
2325+
"strain": true,
2326+
"aliases": [],
2327+
"ncbi_taxon_id": 1173779,
2328+
"gencode": 11,
2329+
"first_version": "2018-11-01",
2330+
"last_version": "2021-02-01",
2331+
"created": 1541030460000,
2332+
"expired": 9007199254740991,
2333+
"release_created": 1541030400000,
2334+
"release_expired": 9007199254740991
2335+
},
2336+
{
2337+
"_key": "1906029_2018-11-01",
2338+
"_id": "ncbi_taxon/1906029_2018-11-01",
2339+
"_rev": "_b2nDL5---_",
2340+
"id": "1906029",
2341+
"scientific_name": "Nostoc sp. 'Peltigera sp. \"hawaiensis\" P1236 cyanobiont'",
2342+
"rank": "species",
2343+
"strain": false,
2344+
"aliases": [],
2345+
"ncbi_taxon_id": 1906029,
2346+
"gencode": 11,
2347+
"first_version": "2018-11-01",
2348+
"last_version": "2021-02-01",
2349+
"created": 1541030460000,
2350+
"expired": 9007199254740991,
2351+
"release_created": 1541030400000,
2352+
"release_expired": 9007199254740991
2353+
},
2354+
{
2355+
"_key": "1945188_2018-11-01",
2356+
"_id": "ncbi_taxon/1945188_2018-11-01",
2357+
"_rev": "_b2nJbF2--_",
2358+
"id": "1945188",
2359+
"scientific_name": "Reporter vector p1168hIL6mC/EBP-luc+",
2360+
"rank": "species",
2361+
"strain": false,
2362+
"aliases": [],
2363+
"ncbi_taxon_id": 1945188,
2364+
"gencode": 11,
2365+
"first_version": "2018-11-01",
2366+
"last_version": "2021-02-01",
2367+
"created": 1541030460000,
2368+
"expired": 9007199254740991,
2369+
"release_created": 1541030400000,
2370+
"release_expired": 9007199254740991
2371+
},
2372+
{
2373+
"_key": "1945295_2018-11-01",
2374+
"_id": "ncbi_taxon/1945295_2018-11-01",
2375+
"_rev": "_b2nJbIK--_",
2376+
"id": "1945295",
2377+
"scientific_name": "Vector pEntry-attR2-IRES-eGFP-luc+-pA-attL3",
2378+
"rank": "species",
2379+
"strain": false,
2380+
"aliases": [],
2381+
"ncbi_taxon_id": 1945295,
2382+
"gencode": 11,
2383+
"first_version": "2018-11-01",
2384+
"last_version": "2021-02-01",
2385+
"created": 1541030460000,
2386+
"expired": 9007199254740991,
2387+
"release_created": 1541030400000,
2388+
"release_expired": 9007199254740991
2389+
},
2390+
{
2391+
"_key": "2727889_2021-02-01",
2392+
"_id": "ncbi_taxon/2727889_2021-02-01",
2393+
"_rev": "_b2n6us---A",
2394+
"id": "2727889",
2395+
"scientific_name": "Pleurocapsales cyanobacterium 'Beach rock 4+5\"'",
2396+
"rank": "species",
2397+
"strain": false,
2398+
"aliases": [],
2399+
"ncbi_taxon_id": 2727889,
2400+
"gencode": 11,
2401+
"first_version": "2021-02-01",
2402+
"last_version": "2021-02-01",
2403+
"created": 1612915015847,
2404+
"expired": 9007199254740991,
2405+
"release_created": 1612137600000,
2406+
"release_expired": 9007199254740991
2407+
},
2408+
{
2409+
"_key": "fake_2021-02-01",
2410+
"_id": "ncbi_taxon/fake_2021-02-01",
2411+
"_rev": "fake",
2412+
"id": "fake",
2413+
"scientific_name": "|Fake|fake|fake| ||fake||",
2414+
"rank": "species",
2415+
"strain": false,
2416+
"aliases": [],
2417+
"ncbi_taxon_id": -1,
2418+
"gencode": 11,
2419+
"first_version": "2021-02-01",
2420+
"last_version": "2021-02-01",
2421+
"created": 1612915015847,
2422+
"expired": 9007199254740991,
2423+
"release_created": 1612137600000,
2424+
"release_expired": 9007199254740991
22632425
}
22642426
]

spec/test/stored_queries/test_fulltext_search.py

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
ncbi_taxa = json.load(fh)
3636

3737
# scinames_test_all are all the test scinames
38+
# These are selected from the ncbi_taxon collection
3839
scinames_test_all = [
3940
# --- Token preceded by punctuation ---
4041
"Lactobacillus sp. 'thermophilus'",
@@ -55,7 +56,18 @@
5556
"Vaccinia virus WR 65-16",
5657
"Dengue virus 2 Jamaica/1409/1983",
5758
"Dengue virus 2 Thailand/NGS-C/1944",
58-
# --- Dups (techinically only applicable to live data) ---
59+
# --- Escape chars ( ,:+-|"' ) ---
60+
# --- TODO sample scinames with the escape chars in different variety of syntaxes ---
61+
"Salmonella enterica subsp. diarizonae serovar 60:r:e,n,x,z15",
62+
"Fusarium cf. solani 3+4-uuu DPGS-2011",
63+
"Integrating expression vector pJEB403+drrA",
64+
"Vector pEntry-attR2-IRES-eGFP-luc+-pA-attL3",
65+
"low G+C Gram-positive bacterium HTA462",
66+
"Reporter vector p1168hIL6mC/EBP-luc+",
67+
"Pleurocapsales cyanobacterium 'Beach rock 4+5\"'",
68+
"Nostoc sp. 'Peltigera sp. \"hawaiensis\" P1236 cyanobiont'",
69+
"|Fake|fake|fake| ||fake||",
70+
# --- Dups (technically only applicable to live data) ---
5971
"environmental samples",
6072
"Listeria sp. FSL_L7-0091",
6173
"Listeria sp. FSL_L7-1519",
@@ -64,7 +76,8 @@
6476
"Corticiaceae sp.",
6577
"Escherichia coli",
6678
]
67-
# scinames_test_latest are the test scinames that are compatible with a current timestamp
79+
# scinames_test_latest are the test scinames that are not expired and
80+
# compatible with a current timestamp
6881
scinames_test_latest = [
6982
"Lactobacillus sp. 'thermophilus'",
7083
"Rabbit fibroma virus (strain Kasza)",
@@ -79,6 +92,15 @@
7992
"Vaccinia virus WR 65-16",
8093
"Dengue virus 2 Jamaica/1409/1983",
8194
"Dengue virus 2 Thailand/NGS-C/1944",
95+
"Salmonella enterica subsp. diarizonae serovar 60:r:e,n,x,z15",
96+
"Fusarium cf. solani 3+4-uuu DPGS-2011",
97+
"Integrating expression vector pJEB403+drrA",
98+
"Vector pEntry-attR2-IRES-eGFP-luc+-pA-attL3",
99+
"low G+C Gram-positive bacterium HTA462",
100+
"Reporter vector p1168hIL6mC/EBP-luc+",
101+
"Pleurocapsales cyanobacterium 'Beach rock 4+5\"'",
102+
"Nostoc sp. 'Peltigera sp. \"hawaiensis\" P1236 cyanobiont'",
103+
"|Fake|fake|fake| ||fake||",
82104
"environmental samples",
83105
"Listeria sp. FSL_L7-0091",
84106
"Listeria sp. FSL_L7-1519",

0 commit comments

Comments
 (0)