Troubleshooting Kubernetes Monitoring

If you were not able to successfully install LogicMonitor’s Kubernetes Monitoring Integration, use the following steps to troubleshoot:

If the Helm install commands were not successful:

  • Make sure that the Kubernetes RBAC enabled option was correctly selected during install. This controls the ‘enableRBAC’ Helm chart values, which must match the RBAC in your cluster (i.e. if RBAC is not enabled for your cluster, this value must be false).
  • Confirm that your LogicMonitor API Tokens are valid, and have sufficient permission.
  • Check the logs for the Argus & Collectorset-Controller pods

If Helm install commands were successful, but your cluster hasn’t been added to monitoring successfully:

  • Check whether the Collector Group added. If it wasn’t, there may be an issue with the LogicMonitor API Tokens provided. You can use the LogicMonitor Audit Log to identify if the request was denied.
  • Check whether Collectors were successfully installed. If the Collector Group was added, but Collectors weren’t successfully installed, it’s not likely to be an issue with your API Tokens. It may be that Collector installation memory requirements were not met – confirm that this is not the case. If the Collectors aren’t properly installed, it’s also a good idea to look at the logs for the Argus, Collectorset-Controller, and Collector pods.
  • If you are using OpenShift, you may need to elevate the permissions of the serviceaccount for the Collector to enable install. You can do this via the following command (assumes the default namespace is used): oc adm policy add-scc-to-user anyuid system:serviceaccount:default:collector

If Helm install commands were successful, and your cluster was added into monitoring, but data collection isn’t working:

  • Ensure you’re using the latest version of the DataSources. We regularly release new versions of our DataSources to include bug fixes and improvements. To import newer versions, please see this article
  • If, with the latest version of the DataSources, data collection still doesn’t work – you can modify the DataSource to display the Kubernetes API response and status code to get more information. For example, the Pod DataSource could be modified to print out request and response information to narrow down what error the Kubernetes API is returning during data collection attempts, as in the following screenshot. This may help you identify why data collection isn’t working. 

If you are unable to set up Argus and Collectorset-Controller pods in GKE due to memory, CPU, and ephemeral storage, use the following configuration:

  • Ensure to set the resource limits and requests (memory, CPU, ephemeral storage) for the collector using the Argus configuration file.
    Sample file:
statefulsetspec:
      template:
        spec:                                                                                                                                                                               
         containers:
           - name: collector
           resources:
             requests:
               cpu: 1000m
               ephemeral-storage: 5Gi
               memory: 2Gi

If the following gRPC connection failed error occurs while installing Argus, do the following:

level=warning msg=”Error while creating gRPC connection. Error: context deadline exceeded” argus_pod_id=<pod-id> debug_id=<id> goroutine=1 method=pkg/connection.createGRPCConnection watch=init

Complete the following steps to resolve the issue:

  1. Run the following command to log in to the Argus Pod shell:

    kubectl exec -it <argus_Pod_name> /bin/sh
  1. Check the communication between Argus and Collectorset-Controller Pods by entering the following command: 

    curl http://collectorset-controller:50000
  1. If the communication fails and an error occurs, you must check the parameters of the restrictions set in the internal network policies.