11# Server tests  
22
3- Python based server tests scenario using [ BDD] ( https://en.wikipedia.org/wiki/Behavior-driven_development ) 
4- and [ behave] ( https://behave.readthedocs.io/en/latest/ ) :
5- 
6- *  [ issues.feature] ( ./features/issues.feature )  Pending issues scenario
7- *  [ parallel.feature] ( ./features/parallel.feature )  Scenario involving multi slots and concurrent requests
8- *  [ security.feature] ( ./features/security.feature )  Security, CORS and API Key
9- *  [ server.feature] ( ./features/server.feature )  Server base scenario: completion, embedding, tokenization, etc...
3+ Python based server tests scenario using [ pytest] ( https://docs.pytest.org/en/stable/ ) .
104
115Tests target GitHub workflows job runners with 4 vCPU.
126
13- Requests are
14- using [ aiohttp] ( https://docs.aiohttp.org/en/stable/client_reference.html ) , [ asyncio] ( https://docs.python.org/fr/3/library/asyncio.html ) 
15- based http client.
16- 
177Note: If the host architecture inference speed is faster than GitHub runners one, parallel scenario may randomly fail.
188To mitigate it, you can increase values in ` n_predict ` , ` kv_size ` .
199
@@ -39,26 +29,19 @@ It's possible to override some scenario steps values with environment variables:
3929| --------------------------| ------------------------------------------------------------------------------------------------| 
4030|  ` PORT `                    |  ` context.server_port `  to set the listening port of the server during scenario, default: ` 8080 `  | 
4131|  ` LLAMA_SERVER_BIN_PATH `   |  to change the server binary path, default: ` ../../../build/bin/llama-server `                          | 
42- |  ` DEBUG `                   |  "ON"  to enable steps and server verbose mode ` --verbose `                                        | 
32+ |  ` DEBUG `                   |  to enable steps and server verbose mode ` --verbose `                                        | 
4333|  ` N_GPU_LAYERS `            |  number of model layers to offload to VRAM ` -ngl --n-gpu-layers `                                 | 
4434
45- ### Run @bug  , @wip   or @wrong_usage annotated scenario  
46- 
47- Feature or Scenario must be annotated with ` @llama.cpp `  to be included in the default scope.
48- 
49- -  ` @bug `  annotation aims to link a scenario with a GitHub issue.
50- -  ` @wrong_usage `  are meant to show user issue that are actually an expected behavior
51- -  ` @wip `  to focus on a scenario working in progress
52- -  ` @slow `  heavy test, disabled by default
53- 
54- To run a scenario annotated with ` @bug ` , start:
35+ To run slow tests:
5536
5637``` shell 
57- DEBUG=ON  ./tests.sh --no-skipped --tags bug --stop 
38+ SLOW_TESTS=1  ./tests.sh
5839``` 
5940
60- After changing logic in  ` steps.py ` , ensure that  ` @bug `  and  ` @wrong_usage `  scenario are updated. 
41+ To run with stdout/stderr display in real time (verbose output, but useful for debugging): 
6142
6243``` shell 
63- ./tests.sh --no-skipped --tags bug,wrong_usage  ||   echo   " should failed but compile " 
44+ DEBUG=1  ./tests.sh -s -v -x 
6445``` 
46+ 
47+ To see all available arguments, please refer to [ pytest documentation] ( https://docs.pytest.org/en/stable/how-to/usage.html ) 
0 commit comments