- 
                Notifications
    You must be signed in to change notification settings 
- Fork 71
Description
Describe the bug
The PyTorch SageMaker endpoint cloudwatch log level is INFO only which cannot be changed without creating a BYO container.
Hence all the access including /ping besides the /invocations are generating logs that clutters the cloudwarch log stream making it difficult to go directly to the errors for troubleshooting. In my understanding, this will incur the cloudwatch cost as well.
2020-08-28 00:31:16,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0Copy2020-08-28 00:31:16,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:31:21,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0Copy2020-08-28 00:31:21,598 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:54100 "GET /ping HTTP/1.1" 200 0
... repetition of the INFO on /ping.
The AWS support case 7309023801 was opened and it was indicated the log level cannot be changed, or need to build our own container to control the log level.
the sagemaker python inference serving container module with earlier version need to change the log4j from the sagemaker inference toolkit as it was built based on the sagemaker inference module, we need to change the log4j in the sagemaker inference toolkit. So I have built two wheel files for my custom docker container and installed both of them in the Dockerfile:
=========
#RUN pip install --no-cache-dir "sagemaker-pytorch-inference<2"
COPY sagemaker_inference-1.5.3.dev0-py2.py3-none-any.whl /tmp
RUN pip install /tmp/sagemaker_inference-1.5.3.dev0-py2.py3-none-any.whl
COPY sagemaker_pytorch_inference-1.5.1-py2.py3-none-any.whl /tmp
RUN pip install /tmp/sagemaker_pytorch_inference-1.5.1-py2.py3-none-any.whl
=========
Then I built and pushed the docker to my ECR and used that docker image to deploy the model to an endpoint. I was only seeing the logs related to the model server, for example:
=========
EVENTS	1598507953058	2020-08-27 05:59:07,670 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 533	1598507947905
EVENTS	1598508601335	2020-08-27 06:09:55,987 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 62	1598508596097
EVENTS	1598508607396	2020-08-27 06:10:01,440 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 292	1598508602099
=========
The ping health check logs are all gone. Please note that, as the logs for the invocation requests handled by the model server is logged at INFO level, so I would suggest to keep the log4j setting as below:
=========
log4j.logger.com.amazonaws.ml.mms = INFO, mms_log
log4j.logger.ACCESS_LOG = ERROR, access_log
log4j.logger.MMS_METRICS = WARN, mms_metrics
log4j.logger.MODEL_METRICS = WARN, model_metrics
log4j.logger.MODEL_LOG = WARN, model_log
=========
Now let's come back to the issue of changing logging level. The environment variable you used 'SAGEMAKER_CONTAINER_LOG_LEVEL' is the correct one. From our sagemaker sdk source code [2][3], you can see that the service used the same value when it is set in the python sdk. I have used our example notebook and used the parameter in the PyTorchModel function as below:
----
sagemaker_model = PyTorchModel(
                                  model_data = 's3://<mybucket>/pytorch-training-2020-08-26-00-44-33-303/output/model.tar.gz',
                                  role = role,
                                  container_log_level=40,
                                  py_version='py3',
                                  entry_point = 'mnist.py')
----
I can confirm the environment variable set up for the model is 
Key	                                                              Value
SAGEMAKER_CONTAINER_LOG_LEVEL	40
However, this value is only for the logs that are generated by python logging (entrypoint script).  I have tested this with a training job and can see the logs only shows DEBUG or ERROR. I modified the entrypoint script to print the environment variables and also trying to use logger.info and logger.error to add test logs to the model_fn as below:
----
def model_fn(model_dir):
    logger.info("Run model function")
    print(os.environ)
    logger.error("display error message")
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = torch.nn.DataParallel(Net())
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    return model.to(device)
----
I can confirm that the logger.info is not showing in the cloudwatch and the 'display error message' logged by logger.error was showing in the cloudwatch log. This means the container log level is correct.
As in pytorch serving containers, we are using 'mxnet-model-server' which handles the \ping and \invocation http calls for you [8]. The model server will not be controlled by the container logging level. The logs that you have seen in the endpoint shown as [INFO] is defined here [5] in the log4j.properties. For example, in the endpoint logs, you may see the below entry:
2020-08-26 03:27:10,540 [INFO ] pool-1-thread-2 ACCESS_LOG - /10.32.0.2:52222 "GET /ping HTTP/1.1" 200 22
This is defined in the "log4j.logger.ACCESS_LOG = INFO, access_log" section in [5]. 
Therefore, to change the log levels for the serving module you need to build your own pytorch image based on our github repo [6]. But this will require you to make necessary code change to our Dockerfile as well as installing additional modules. 
To reproduce
Deploy a PyTorch Model where Python log level is set to logging.ERROR via SageMaker SDK and refer to the cloudwach log for /aws/sagemaker/Endpoints/<endpoint_name>.
import logging
logging.basicConfig()
logging.getLogger().setLevel(logging.ERROR)
Expected behavior
The log level configuration is reflected and only ERROR will be logged in the cloudwatch.
System information
SageMaker endpoint in us-east-1.
- 
Toolkit version: 
 Not sure
- 
Framework version: 
 PyTorch 1.4.0, 1.5.1
- 
Python version: 
 Python 3.6
- 
CPU or GPU: 
 GPU
- 
Custom Docker image (Y/N): 
 N
Additional context
Endpoint startup message in the cloudwatch.
Warning: Calling MMS with mxnet-model-server. Please move to multi-model-server.
2020-08-28 00:29:08,131 [INFO ] main com.amazonaws.ml.mms.ModelServer -
MMS Home: /opt/conda/lib/python3.6/site-packages
Current directory: /
Temp directory: /home/model-server/tmp
Number of CPUs: 1
Max heap size: 3806 M
Python executable: /opt/conda/bin/python
Config file: /etc/sagemaker-mms.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8080
Model Store: /.sagemaker/mms/models
Initial Models: ALL
Log dir: /logs
Metrics dir: /logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Preload model: false
Prefer direct buffer: false
2020-08-28 00:29:08,222 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-9000-model
2020-08-28 00:29:08,320 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - model_service_worker started with args: --sock-type unix --sock-name /home/model-server/tmp/.mms.sock.9000 --handler sagemaker_pytorch_serving_container.handler_service --model-path /.sagemaker/mms/models/model --model-name model --preload-model false --tmp-dir /home/model-server/tmp
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Listening on port: /home/model-server/tmp/.mms.sock.9000
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - [PID] 43
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - MMS worker started.
2020-08-28 00:29:08,321 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Python runtime: 3.6.10
2020-08-28 00:29:08,322 [INFO ] main com.amazonaws.ml.mms.wlm.ModelManager - Model model loaded.
2020-08-28 00:29:08,328 [INFO ] main com.amazonaws.ml.mms.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2020-08-28 00:29:08,341 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.mms.sock.9000
2020-08-28 00:29:08,406 [INFO ] main com.amazonaws.ml.mms.ModelServer - Inference API bind to: http://0.0.0.0:8080
Model server started.
2020-08-28 00:29:08,408 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Connection accepted: /home/model-server/tmp/.mms.sock.9000.
2020-08-28 00:29:08,409 [WARN ] pool-2-thread-1 com.amazonaws.ml.mms.metrics.MetricCollector - worker pid is not available yet.
2020-08-28 00:29:08,972 [INFO ] W-9000-model-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - PyTorch version 1.5.1 available.
2020-08-28 00:29:42,837 [INFO ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend response time: 34388
2020-08-28 00:29:42,839 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerLifeCycle - attachIOStreams() threadName=W-model-1
2020-08-28 00:29:55,821 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 19
2020-08-28 00:30:00,606 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 1
2020-08-28 00:30:05,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:10,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:15,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:20,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:25,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
2020-08-28 00:30:30,605 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:34704 "GET /ping HTTP/1.1" 200 0
...