Troubleshooting with Mesh Doctor
Mesh Doctor is a tool to troubleshoot and debug issues in a Service Mesh setup. Mesh doctor collects required configurations and logs, runs an analysis, and generates reports. The reports summarize the state of your Service Mesh and suggest necessary troubleshooting steps to fix any issues.
Using Mesh Doctor
You can use the Mesh Doctor tool in two ways.
- OCI CLI: Run Mesh Doctor from the command line with the debug command.
- Console: You can run the Mesh Doctor from the console using the following
steps.
- Navigate to one of the Mesh Doctor supported resources.
- Click Troubleshoot on the resource's details page.
- Select the context of the OKE cluster.
- Click Start Troubleshooting.
The OCI Console opens the Mesh Doctor command in an OCI Cloud Shell window and runs a command. When the command completes, Mesh Doctor provides the path to the zipped file with the generated reports. To view the reports, unzip the file in Cloud Shell or download the zipped file from Cloud Shell.
Cloud Shell powers the Mesh Doctor user interface. This command only works in public clusters. The command times out in private clusters.
Mesh Doctor Command-Line Options
The following table provides a detailed list of all the Mesh Doctor command line
options based on the oci service-mesh debug
base command.
Parameter | isOptional | Value Type | Default | Example | Notes |
---|---|---|---|---|---|
kubeconfig |
True | FilePath(String) | kubeconfig present in
~/.kube/config |
~/config |
Config of the Kubernetes cluster. If the config isn't provided, the default config is used by the command. |
resource-id |
True | OCID | Null | ocid1.mesh.oc1.iad.id | Resource to be diagnosed. If the resource isn't provided the command diagnoses the installation. |
context |
True | String | current-context in kube-config | context-aaa | The context of the Kubernetes cluster. |
thread-pool-size |
True | Int | 25 | 10 | Number of threads used to parallelize the processing. |
Using Mesh Doctor CLI to Troubleshoot Setup
To troubleshoot an entire service mesh setup in the Kubernetes cluster, run the following command.
oci service-mesh debug report
Using Mesh Doctor CLI to Troubleshoot Mesh Resources
The following Mesh Doctor CLI commands provide example use cases.
oci service-mesh debug report --resource-id ocid1.mesh.oc1.iad.aaa...
Bundle file path: /my-home/service-mesh-debug-report_07-01-2022_20-00-00
=============================== Mesh Report Analysis ===============================
OLM version: v0.20.0
| Sidecar Image Versions |
| Version | Count |
| 0.1.520 | 13 |
All sidecars are using same version
| Config Versions |
| Version | Count |
| 5 | 13 |
All configs are of the same version
All Operator Services are installed
All Mesh Webhooks are installed
All Mesh Custom Resources are installed
oci service-mesh mesh-debug report --resource-id ocid1.meshvirtualservice.oc1.iad.aaa...
oci service-mesh mesh-debug report --resource-id ocid1.meshvirtualdeployment.oc1.iad.aaa...
oci service-mesh mesh-debug report --resource-id ocid1.meshingressgateway.oc1.iad.aaa...
The following is a sample Mesh Doctor report run on a mesh.
report-mesh.json
{
"metrics_server": [
{
"labels": {},
"name": "Unavailable",
"namespace": "Unavailable",
"status": "Unavailable",
"version": "Unavailable"
}
],
"oci_cli_version": [
"X.X.X"
],
"oci_service_operator_for_kubernetes": [
{
"labels": {
"control-plane": "controller-manager",
"pod-template-hash": "aaa"
},
"name": "oci-service-operator-controller-manager-aaa-tm52n",
"namespace": "oci-service-operator-system",
"status": {
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2022-04-13T00:06:20Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-04-13T00:06:30Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-04-13T00:06:30Z",
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-04-13T00:06:20Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "cri-o://aaa...",
"image": "iad.ocir.io/aaa/oci-service-operator:1.0.X",
"imageID": "iad.ocir.io/aaa/oci-service-operator@sha256:aaa",
"lastState": {},
"name": "manager",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-04-13T00:06:24Z"
}
}
}
],
"hostIP": "10.0.10.X",
"phase": "Running",
"podIP": "10.244.2.X",
"podIPs": [
{
"ip": "10.244.2.X"
}
],
"qosClass": "Burstable",
"startTime": "2022-04-13T00:06:20Z"
},
"version": "1.0.X"
}
],
"olm": [
{
"labels": {
"app": "olm-operator",
"pod-template-hash": "aaa"
},
"name": "olm-operator-aaa-k42xw",
"namespace": "olm",
"status": {
"running": {
"startedAt": "2022-04-13T00:05:37Z"
}
},
"version": "v0.20.0"
}
],
"pod_summary": [
{
"labels": {
"app": "productpage",
"pod-template-hash": "aaa",
"version": "v1"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "productpage-v1-aaa-f5ptd",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:37:57Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/productpage-v1-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
},
{
"labels": {
"app": "reviews",
"pod-template-hash": "aaa",
"version": "v3"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "reviews-v3-aaa-q9z6k",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:37:46Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/reviews-v3-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
},
{
"labels": {
"app": "reviews",
"pod-template-hash": "bbb",
"version": "v2"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "reviews-v2-bbb-9rdpw",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:37:40Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/reviews-v2-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
},
{
"labels": {
"app": "reviews",
"pod-template-hash": "ddd",
"version": "v1"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "reviews-v1-ddd-kq6qr",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:37:27Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/reviews-v1-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
},
{
"ig_id": "ocid1.meshingressgateway.oc1.iad.aaa...",
"igd_key": "my-namespace/bookinfo-ig-deployment",
"labels": {
"pod-template-hash": "eee",
"servicemesh.oci.oracle.com/ingress-gateway-deployment": "bookinfo-ig-deployment"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "bookinfo-ig-deployment-deployment-eee-dj9b5",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T00:12:15Z"
}
},
"proxy_version": "0.1.X"
},
{
"labels": {
"app": "ratings",
"pod-template-hash": "fff",
"version": "v1"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "ratings-v1-fff-67txf",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:35:36Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/ratings-v1-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
},
{
"labels": {
"app": "details",
"pod-template-hash": "aaa",
"version": "v1"
},
"mesh_id": "ocid1.mesh.oc1.iad.aaa...",
"name": "details-v1-aaa-xsmkq",
"namespace": "my-namespace",
"proxy_status": {
"running": {
"startedAt": "2022-04-13T05:38:03Z"
}
},
"proxy_version": "0.1.X",
"vd_id": "ocid1.mesh.oc1.iad.aaa...",
"vdb_key": "my-namespace/details-v1-binding",
"vs_id": "ocid1.meshvirtualservice.oc1.iad.aaa..."
}
],
"sidecar_injection_enabled_namespaces": [
[
"host-mesh-cp-aaa",
"my-namespace"
]
]
}
Mesh Doctor runs kubectl
commands on behalf of the user using the user's existing Kubernetes authorizations. If the required permissions aren't present, the command fails to collect data.
To collect all the required data, users need the following access permissions:
list
,get
,exec
- for pods in the service mesh.list
,get
- for all mesh resources (CRD's).list
,get
,exec
- for the pods in the OLM namespace.list
- permission for services.
For more information on Kubernetes role-based access control, see Using RBAC Authorization
When Mesh Doctor runs, the tool structures the data returned into a reporting hierarchy. When Mesh Doctor runs on a specific resource, the tool includes only the data for that resource and child data in the report. Mesh Doctor uses the following reporting structure.
Mesh <directory>
- Mesh report
- OCI Service Operator for Kubernetes logs
- Dump of cluster service version
- Customer resource definition (CRD) of mesh if present
- Ingress gateway
<directory>
- Ingress gateway report
- CRD of ingress gateway if present
- Ingress gateway deployment
- CRD of ingress gateway deployment
configdump_<podName>_<podNamespace>.json
proxylogs_<podName>_<podNamespace>.log
- Virtual service
<directory>
- Virtual service report
- CRD of virtual service if present
- Virtual deployment
<directory>
- Virtual deployment report
- CRD of virtual deployment if present
- Virtual deployment binding
<directory>
- CRD of virtual deployment binding
configdump_<podName>_<podNamespace>.json
>proxylogs_<podName>_<podNamespace>.log