API
LLM Guard can be deployed as an API. We rely on FastAPI and Uvicorn to serve the API.
Configuration
All configurations are stored in config/scanners.yml. It supports configuring via environment variables.
Note
Scanners will be executed in the order of configuration.
Default environment variables
LOG_LEVEL(bool): Log level. Default isINFO. If set asDEBUG, debug mode will be enabled, which makes Swagger UI available.CACHE_MAX_SIZE(int): Maximum number of items in the cache. Default is unlimited.CACHE_TTL(int): Time in seconds after which a cached item expires. Default is 1 hour.SCAN_FAIL_FAST(bool): Stop scanning after the first failed check. Default isFalse.SCAN_PROMPT_TIMEOUT(int): Time in seconds after which a prompt scan will timeout. Default is 10 seconds.SCAN_OUTPUT_TIMEOUT(int): Time in seconds after which an output scan will timeout. Default is 30 seconds.APP_PORT(int): Port to run the API. Default is8000.
Best practices
- Enable
SCAN_FAIL_FASTto avoid unnecessary scans. - Enable
CACHE_MAX_SIZEandCACHE_TTLto cache results and avoid unnecessary scans. - Enable authentication and rate limiting to avoid abuse.
- Enable lazy loading of models to avoid failed HTTP probes.
- Enable load of models from a directory to avoid downloading models each time the container starts.
Load models from a directory
It's possible to load models from a local directory.
You can set model_path in each supported scanner with the folder to the ONNX version of the model.
This way, the models won't be downloaded each time the container starts.
Lazy loading
You can enable lazy_load in the YAML config file to load models only on the first request instead of the API start.
That way, you can avoid failed HTTP probes due to the long model loading time.
Observability
There are built-in environment variables to configure observability:
Logging
Logs are written to stdout in a structured format, which can be easily parsed by log management systems.
Metrics
The following exporters are available for metrics:
- Console (console): Logs metrics to
stdout. - Prometheus (prometheus): Exposes metrics on
/metricsendpoint. - OpenTelemetry (otel_http): Sends metrics to an OpenTelemetry collector via HTTP endpoint.
Tracing
The following exporters are available for tracing:
- Console (console): Logs traces to
stdout - OpenTelemetry (otel_http): Sends traces to an OpenTelemetry collector via HTTP endpoint.
- AWS X-Ray (xray): Sends traces to OpenTelemetry collector in the AWS X-Ray format.