Queries Used in the Kubernetes Solution
The queries are used to extract the metrics, events, topology, and analysis data to display in various panels available in the Kubernetes solution. Some of the log based queries like the ones for Kubernetes System, OS health, Events, Workloads Summary, Node Summary, and Pod Summary must be run in the Log Explorer and the other queries are metrics based which must be run in the Metric Explorer.
Topics:
Queries for the Right Panel Widgets in Kubernetes Solution
Topics:
- Kubernetes System
- OS health
- CPU Cores (Used/Allocatable)
- Memory (Used/Allocatable)
- CPU Cores used
- Memory used
- Network: bytes rx
- Network: bytes tx
- Network Packet Rx Rate
- Network: Packet Tx Rate
- Network: Packet Rx Dropped Rate
- Network: Packet Tx Dropped Rate
- API Server Request Duration
- etcd Request Duration
- Total API Server Requests
- API Response Size
- API Request Execution Duration
Kubernetes System
Displays the overall health or performance metrics of critical Kubernetes system components (such as kube-apiserver, controller manager, or scheduler). This widget helps ensure essential control plane services operate smoothly.
Tabs: Cluster
Filters: None
Scope: Log Explorer
Query:
'Component Type' in ('Kubernetes System', 'Kubernetes Control Plane') and Component != null
| link Time, Component, cluster()
| rename 'Cluster Sample' as 'Error Sample'
| eval Error = if(length('Error Sample') > 50, substr('Error Sample', 0, 50) || ' ...', substr('Error Sample', 0, 50))
| classify correlate = -*, Error Component as Trend
| timestats name = 'All Issues' sum(Count) as 'Issue Trend'
| timestats name = 'Potential Issues' sum(Count) as Issues by Component, Error
| fields -'Potential Issue', -Error
OS health
Provides insights into the operating system’s health on Kubernetes nodes, often aggregating metrics such as CPU, memory, disk, and system process statuses. This ensures the underlying OS does not become a bottleneck for workloads.
Tabs: Cluster, Node
Filters: None
Scope: Log Explorer
Query:
'Component Type' = 'Linux System' and Component != null and (Label = null or (Label != null and 'Problem Priority' != null))
| link includenulls = true Time, Node, Component, Label, cluster()
| stats max('Problem Priority') as 'Problem Priority'
| where 'Problem Priority' != null or 'Potential Issue' = '1'
| eval Error = if(length('Cluster Sample') > 30, substr('Cluster Sample', 0, 30) || ' ...', substr('Cluster Sample', 0, 30))
| classify correlate = -*, Error, Label Component as Trend
| timestats name = 'All Issues' sum(Count) as 'Issue Trend'
| timestats name = 'Issues Trend' sum(Count) as Issues by Component, Error, Label, Node
| fields -'Potential Issue', -Error
CPU Cores (Used/Allocatable)
This widget displays the percentage of CPU cores currently in use compared to the total allocatable cores in your Kubernetes environment. It offers a quick visual indication of overall CPU utilization efficiency across your cluster.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Query:
clusterCPUUtilization[1m]{clusterName = "k8s_solution_development"}.mean()
For an example query with filters, see Network Bandwidth widget query.
Memory (Used/Allocatable)
Shows the percentage of memory being used compared to the total allocatable memory across the cluster. It helps in assessing memory pressure and ensuring workloads have sufficient memory resources.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Query:
clusterMemoryUtilization[1m]{clusterName = "k8s_solution_development"}.mean()
For an example query with filters, see Network Bandwidth widget query.
CPU Cores used
This shows the absolute number of CPU cores currently consumed by workloads within your cluster. Monitoring this helps identify CPU consumption trends and potential hotspots.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Query:
podCpuUsage[1m]{clusterName = "oke-cw22-ll", namespace = "kube-system", nodeName = "192.0.2.1", podName =~"coredns*"}.mean().grouping().sum()
Memory used
Indicates the total amount of memory (in units KB, MB, GB) currently in use across your cluster resources. This helps you monitor memory trends for capacity planning and troubleshooting.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Query:
podMemoryUsage[1m]{clusterName = "oke-cw22-ll", namespace = "kube-system", nodeName = "192.0.2.1", podName =~"coredns*"}.mean().grouping().sum()
Network: bytes rx
Displays the cumulative or current rate of network bytes received by the cluster or a specific node/pod. This metric helps monitor inbound network traffic volume. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
container_network_receive_bytes_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
Network: bytes tx
Shows the cumulative or current rate of network bytes transmitted from the cluster or a specific node/pod. It helps to track outbound traffic and potential egress issues. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
container_network_transmit_bytes_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
Network Packet Rx Rate
Indicates the rate at which network packets are being received. This helps identify trends and spikes in inbound packet volume which may impact network performance. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
container_network_receive_packets_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
Network: Packet Tx Rate
Measures the rate of network packets being sent out from the nodes or pods. It helps diagnose outbound network activity and possible saturation. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
container_network_receive_packets_dropped_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
Network: Packet Rx Dropped Rate
Shows the rate of incoming network packets dropped due to issues like network congestion or buffer overflow. A high drop rate indicates potential networking problems requiring attention. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
“container_network_transmit_packets_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
Network: Packet Tx Dropped Rate
Displays the rate at which outbound network packets are being dropped. Persistent high values can suggest network buffer issues, misconfigurations, or hardware limitations. Run the query against mgmtagent_kubernetes_metrics namespace.
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace, Workload, Node
Scope: Metric Explorer
Sample Query:
container_network_transmit_packets_dropped_total[1m]{clusterName=“oke-cw22-ll“}.groupBy(interface).rate().filter(x=>x>=0)
API Server Request Duration
Displays statistical data (average, percentile, etc.) on how long API server requests take to complete. Longer durations can signal API server bottlenecks or backend service problems.
Tabs: Cluster, Workload, Node, Pod
Filters: None
Scope: Metric Explorer
Query:
apiserver_request_duration_seconds_sum[5m]{clusterName = "oke-cw22-ll",verb=~"LIST|GET|PUT|POST|PATCH|DELETE|APPLY"}.groupBy(verb).sum() / apiserver_request_duration_seconds_count[5m]{clusterName = "oke-cw22-ll",verb=~"LIST|GET|PUT|POST|PATCH|DELETE|APPLY"}.groupBy(verb).sum()
etcd Request Duration
Shows the latency involved in processing requests to etcd, the distributed key-value store backing Kubernetes. Higher durations here can affect overall control plane responsiveness.
Tabs: Cluster
Filters: None
Scope: Metric Explorer
Query:
etcd_request_duration_seconds_sum[5m]{clusterName = "oke-cw22-ll"}.groupBy(le).sum()/etcd_request_duration_seconds_count[5m]{clusterName = "oke-cw22-ll"}.groupBy(le).sum()
Total API Server Requests
Shows the total number of requests received by the Kubernetes API server over a specified period. High or spiky numbers could indicate increased cluster management activity or potential issues like automated script overloads.
Tabs: Cluster
Filters: None
Scope: Metric Explorer
Query:
apiserver_request_total[5m]{clusterName = "oke-cw22-ll"}.groupBy(code).rate().filter(x=>x>=0)
API Response Size
Measures the size of responses (in bytes) returned by the Kubernetes API server. Large response sizes may impact network utilization and client performance.
Tabs: Cluster
Filters: None
Scope: Metric Explorer
Query:
apiserver_response_sizes_sum[5m]{clusterName = "oke-cw22-ll"}.groupBy(verb).sum() / apiserver_response_sizes_count[5m]{clusterName= "oke-cw22-ll"}.groupBy(verb).sum()
API Request Execution Duration
Focuses on the time required by the API server to process and execute incoming requests. Monitoring this helps to detect performance degradations or overloaded API servers.
Tabs: Cluster
Filters: None
Scope: Metric Explorer
Query:
apiserver_request_duration_seconds_sum[5m]{clusterName = "oke-cw22-ll", verb=~"LIST|GET|PUT|POST|PATCH|DELETE|APPLY"}.groupBy(verb).sum() / apiserver_request_duration_seconds_count[5m]{clusterName = "oke-cw22-ll", verb=~"LIST|GET|PUT|POST|PATCH|DELETE|APPLY"}.groupBy(verb).sum()
Queries for the Events Table in Kubernetes Solution
Tabs: Cluster, Workload, Node, Pod
Filters: Namespace
Scope: Log Explorer
Query:
'Log Source' = 'Kubernetes Event Object Logs' and 'Kubernetes Event Action' != deleted
| eval Object = 'Involved Object Kind' || ' - ' || 'Involved Object Name'
| link includenulls = true Object, 'Event Source Component', Event
| rename 'Event Source Component' as Component
| stats latest('Kubernetes Event Action') as Status, unique(Namespace) as Namespace, latest(Reason) as Reason, latest('Event Description') as Message, unique('Event Type') as Type, earliest('First Event Time') as 'First Event Time', latest('Last Event Time') as 'Last Event Time', latest('Event Count') as 'Event Count'
| eventstats count(Namespace) as Records by Namespace
| eval Age = 'Last Event Time' - 'First Event Time'
| createtable name = 'Namespace Summary' select Namespace, Records
| createtable name = 'Namespace Details' select Namespace, Type, Reason, 'Last Event Time', Age, Message, Object, Component
| sort Namespace, -'Last Event Time'
| fields -Event, -'Start Time', -'End Time', -'First Event Time', -Count
Queries for the Cluster Topology and Details Tables in Kubernetes Solution
Topics:
Workloads Summary
Tabs: Workload
Filters: Namespace, Workload
Scope: Log Explorer
Query:
'Log Source' in ('Kubernetes DaemonSet Object Logs', 'Kubernetes Deployment Object Logs', 'Kubernetes CronJob Object Logs', 'Kubernetes Job Object Logs', 'Kubernetes StatefulSet Object Logs')
| eval Name = if('Log Source' = 'Kubernetes Deployment Object Logs', Deployment, 'Log Source' = 'Kubernetes DaemonSet Object Logs', DaemonSet, 'Log Source' = 'Kubernetes CronJob Object Logs', CronJob, 'Log Source' = 'Kubernetes Job Object Logs', Job, 'Log Source' = 'Kubernetes StatefulSet Object Logs', StatefulSet, null)
| link Namespace, 'Log Source', Name
| eval Type = if('Log Source' = 'Kubernetes Deployment Object Logs', literal(Deployment), 'Log Source' = 'Kubernetes DaemonSet Object Logs', literal(DaemonSet), 'Log Source' = 'Kubernetes CronJob Object Logs', literal(CronJob), 'Log Source' = 'Kubernetes Job Object Logs', literal(Job), 'Log Source' = 'Kubernetes StatefulSet Object Logs', literal(StatefulSet), null)
| addfields [ 'Log Source' = 'Kubernetes Deployment Object Logs'
| eventstats latest('Available Status') as 'DP Available', latest(Replicas) as 'DP Desired', latest('Ready Replicas') as 'DP Ready', latest('Updated Replicas') as 'DP Updated', latest('Available Replicas') as 'DP Available R', latest('Object Creation Time') as 'DP OCR' by Namespace, Name ], [ 'Log Source' = 'Kubernetes DaemonSet Object Logs'
| eventstats latest('Desired Number Scheduled') as 'DM Desired', latest('Ready Count') as 'DM Ready', latest('Updated Replicas') as 'DM Updated', latest('Current Scheduled') as 'DM Scheduled', latest('Object Creation Time') as 'DM OCR' by Namespace, Name ], [ 'Log Source' = 'Kubernetes CronJob Object Logs'
| eventstats latest(Schedule) as 'CR Schedule', latest(Suspend) as 'CR Suspended', latest('Last Schedule Time') as 'CR Last Schedule', latest('Object Creation Time') as 'CR OCR' by Namespace, Name ], [ 'Log Source' = 'Kubernetes Job Object Logs'
| eventstats latest(Completions) as 'JB Completions', latest(Status) as 'JB Status', latest('Object Creation Time') as 'JB OCR' by Namespace, Name ], [ 'Log Source' = 'Kubernetes StatefulSet Object Logs'
| eventstats latest(Replicas) as 'SS Replicas', latest('Current Replicas') as 'SS Current Replicas', latest('Desired Replicas') as 'SS Desired Replicas', latest('Ready Replicas') as 'SS Ready Replicas', latest('Object Creation Time') as 'SS OCR' by Namespace, Name ]
| eval OCR = if('Log Source' = 'Kubernetes Deployment Object Logs', 'DP OCR', 'Log Source' = 'Kubernetes DaemonSet Object Logs', 'DM OCR', 'Log Source' = 'Kubernetes CronJob Object Logs', 'CR OCR', 'Log Source' = 'Kubernetes Job Object Logs', 'JB OCR', 'Log Source' = 'Kubernetes StatefulSet Object Logs', 'SS OCR', null)
| eval Age = unit('End Time' - OCR, ms) | eval Status = if('Log Source' = 'Kubernetes Deployment Object Logs' and 'DP Available' = true, Available, 'Log Source' = 'Kubernetes DaemonSet Object Logs' and 'DM Desired' = 'DM Ready', Available, 'Log Source' = 'Kubernetes CronJob Object Logs' and 'CR Suspended' = true, Suspended || ' (Last Scheduled: ' || formatDate('CR Last Schedule') || ')', 'Log Source' = 'Kubernetes CronJob Object Logs' and 'CR Suspended' = false, ' Last Scheduled: ' || formatDate('CR Last Schedule'), 'Log Source' = 'Kubernetes Job Object Logs' and 'JB Status' = complete, Complete, 'Log Source' = 'Kubernetes Job Object Logs' and 'JB Status' = failed, Failed, 'Log Source' = 'Kubernetes Job Object Logs', 'JB Status', 'Log Source' = 'Kubernetes StatefulSet Object Logs' and ('SS Desired Replicas' = 'SS Replicas' and 'SS Desired Replicas' = 'SS Ready Replicas'), Available, 'Not Available')
| createtable name = Workloads select Namespace, Type, Name, Status, Age
| fields -'DP *', -'DM *', -'JB *', -'SS *', -OCR
Node Summary
Tabs: Node
Filters: Namespace, Workload, Node
Scope: Log Explorer
Query:
'Log Source' = 'Kubernetes Node Object Logs'
| link Node
| stats latest('Ready Status') as 'Ready Status', latest('Ready Reason') as 'Ready Reason', latest('Disk Pressure Status') as 'Disk Pressure', latest('Memory Pressure Status') as 'Memory Pressure', latest('PID Pressure Status') as 'PID Pressure', latest(Architecture) as Arch, latest('Operating System Image') as 'OS Image', latest('Kernel Version') as Kernel, latest('Container Runtime Version') as 'Container Runtime', latest('Kubelet Version') as Kubelet, latest('KubeProxy Version') as KubeProxy, latest('CPU Allocatable') as 'CPU A', latest('CPU Capacity') as 'CPU C', latest('Memory Allocatable') as 'Memory A', latest('Memory Capacity') as 'Memory C'
| eval Status = if('Ready Status' = true, Ready, 'Not Ready')
| eval Issues = if('Disk Pressure' != true and 'Memory Pressure' != true and 'PID Pressure' != true and Status = Ready, 'No Issues', 'Disk Pressure' = true and 'Memory Pressure' = true and 'PID Pressure' = true, 'Low on Disk, Memory and PID', 'Disk Pressure' = true and 'Memory Pressure' = true and 'PID Pressure' != true, 'Low on Disk and Memory', 'Disk Pressure' = true and 'Memory Pressure' != true and 'PID Pressure' = true, 'Low on Disk and PID', 'Disk Pressure' = true and 'Memory Pressure' != true and 'PID Pressure' != true, 'Low on Disk', 'Disk Pressure' != true and 'Memory Pressure' = true and 'PID Pressure' = true, 'Low on Memory and PID', 'Disk Pressure' != true and 'Memory Pressure' = true and 'PID Pressure' != true, 'Low on Memory', 'Disk Pressure' != true and 'Memory Pressure' != true and 'PID Pressure' = true, 'Low on PID', Status = 'Not Ready' and 'Ready Reason' != null, 'Ready Reason', Unknown)
| eval CPU = 'CPU C' || ' / ' || 'CPU A' | eval 'Memory (Capacity)' = unit('Memory C', byte)
| eval 'Memory (Allocatable)' = unit('Memory A', byte)
| eval Age = unit('Query End Time' - 'End Time', ms)
| eval 'Kubelet / KubeProxy Versions' = Kubelet || ' / ' || KubeProxy
| eval OS = 'OS Image' || ' (' || Arch || ') ' || Kernel
| createtable name = 'Node Summary' select Node as Name, Status, Issues, Age, OS, 'Container Runtime', 'Kubelet / KubeProxy Versions', CPU, 'Memory (Capacity)', 'Memory (Allocatable)'
Pod Summary
Tabs: Pod
Filters: Namespace, Workload, Node
Scope: Log Explorer
Query:
'Log Source' = 'Kubernetes Pod Object Logs'
| link Pod
| stats latest('Pod Phase') as Status, latest(Node) as Node, latest(Namespace) as Namespace, latest('Pod IP Address') as 'Pod IP', latest(Controller) as Controller, latest('Controller Kind') as 'Controller Kind', latest('Scheduler Name') as Scheduler
| createtable name = 'Pod Summary' select Pod as Name, Status, Node, Namespace, 'Pod IP', Controller, 'Controller Kind', Scheduler
| fields 'End Time' as 'Last Reported', -Count