NVIDIA GPU Plugin
When you enable the NVIDIA GPU Plugin cluster add-on, you can pass the following key/value pairs as arguments.
Note that to ensure that workloads running on NVIDIA GPU worker nodes are not interrupted unexpectedly, we recommend that you choose the version of the NVIDIA GPU Plugin add-on to deploy, rather than specifying that you want Oracle to update the add-on automatically.
Key (API and CLI) | Key's Display Name (Console) | Description | Required/Optional | Default Value | Example Value |
---|---|---|---|---|---|
affinity |
affinity |
A group of affinity scheduling rules. JSON format in plain text or Base64 encoded. |
Optional | null | null |
nodeSelectors |
node selectors |
You can use node selectors and node labels to control the worker nodes on which add-on pods run. For a pod to run on a node, the pod's node selector must have the same key/value as the node's label. Set JSON format in plain text or Base64 encoded. |
Optional | null | {"foo":"bar", "foo2": "bar2"} The pod will only run on nodes that have the |
numOfReplicas |
numOfReplicas | The number of replicas of the add-on deployment. (For CoreDNS, use |
Required | 1 Creates one replica of the add-on deployment per cluster. |
2 Creates two replicas of the add-on deployment per cluster. |
rollingUpdate |
rollingUpdate |
Controls the desired behavior of rolling update by maxSurge and maxUnavailable. JSON format in plain text or Base64 encoded. |
Optional | null | null |
tolerations |
tolerations |
You can use taints and tolerations to control the worker nodes on which add-on pods run. For a pod to run on a node that has a taint, the pod must have a corresponding toleration. Set JSON format in plain text or Base64 encoded. |
Optional | null | [{"key":"tolerationKeyFoo", "value":"tolerationValBar", "effect":"noSchedule", "operator":"exists"}] Only pods that have this toleration can run on worker nodes that have the |
topologySpreadConstraints |
topologySpreadConstraints |
How to spread matching pods among the given topology. JSON format in plain text or Base64 encoded. |
Optional | null | null |
Key (API and CLI) | Key's Display Name (Console) | Description | Required/Optional | Default Value | Example Value |
---|---|---|---|---|---|
deviceIdStrategy
|
Device ID Strategy |
Which strategy to use for passing device IDs to the underlying runtime. One of:
|
Optional |
uuid
|
|
deviceListStrategy
|
Device List Strategy |
Which strategy to use for passing the device list to the underlying runtime. Supported values:
Multiple values are supported, in a comma-separated list. |
Optional |
envvar
|
|
driverRoot
|
Driver Root | The root path for the NVIDIA driver installation. | Optional |
/
|
|
failOnInitError
|
FailOnInitError |
Whether to fail the plugin if an error is encountered during initialization. When set to |
Optional |
true
|
|
migStrategy
|
MIG Strategy |
Which strategy to use for exposing MIG (Multi-Instance GPU) devices on GPUs that support it. One of:
|
Optional |
none
|
|
nvidia-gpu-device-plugin.ContainerResources
|
nvidia-gpu-device-plugin container resources |
You can specify the resource quantities that the add-on containers request, and set resource usage limits that the add-on containers cannot exceed. JSON format in plain text or Base64 encoded. |
Optional | null |
{"limits": {"cpu": "500m", "memory": "200Mi" }, "requests": {"cpu": "100m", "memory": "100Mi"}}
Create add-on containers that request 100 milllicores of CPU, and 100 mebibytes of memory. Limit add-on containers to 500 milllicores of CPU, and 200 mebibytes of memory. |
passDeviceSpecs
|
Pass Device Specs | Whether to pass the paths and desired device node permissions for any NVIDIA devices being allocated to the container. | Optional |
false
|
|
useConfigFile
|
Use Config File from ConfigMap |
Whether to use a configuration file to configure the Nvidia Device Plugin for Kubernetes. The configuration file is derived from a ConfigMap. If set to The ConfigMap is referenced by the |
Optional |
false
|
Example of nvidia-device-plugin-config
ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
name: nvidia-device-plugin-config
namespace: kube-system
data:
config.yaml: |
version: v1
flags:
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy: envvar
deviceIDStrategy: uuid