EKS DNS troubleshooter

Go Report   Docker Image CI

Docs

EKS DNS troubleshooter is an automated DNS troubleshooting utility for EKS cluster. It can be used to test, validate and troubleshoot DNS issues in EKS cluster.

This tool runs as a pod in EKS cluster and scans all the cluster configuration, performs DNS resolution and identifies the issue, then generates a diagnosis report in JSON format.

Spend less time in finding a root cause, more on Important Stuff!

Scenarios

Tool verifies the following scenarios to validate/troubleshoot DNS in EKS cluster:

Usage

To deploy the EKS DNS Troubleshooter to an EKS cluster:

  1. Create an IAM OIDC provider and associate it with your cluster
  2. Download an IAM policy for the EKS DNS troubleshooter pod that allows it to make calls to AWS APIs on your behalf. You can view the policy document.
curl -o iam-policy.json https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/iam-policy.json
  1. Create an IAM policy called DNSTroubleshooterIAMPolicy using the policy downloaded in the previous step.
aws iam create-policy \
    --policy-name DNSTroubleshooterIAMPolicy \
    --policy-document file://iam-policy.json

Take note of the policy ARN that is returned.

  1. Create a Kubernetes service account named eks-dns-ts, a cluster role, and a cluster role binding for the EKS DNS troubleshooter to use with the following command.
kubectl apply -f https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/rbac-role.yaml
  1. Create an IAM role for the EKS DNS Troubleshooter and attach the role to the service account created in the previous step.
    1. Create an IAM role named eks-dns-troubleshooter and attach the DNSTroubleshooterIAMPolicy IAM policy that you created in a previous step to it. Note the Amazon Resource Name (ARN) of the role, once you’ve created it. Refer this document for details on creating an IAM role for Service accounts with eksctl, AWS CLI or AWS Management Console.
    2. Annotate the Kubernetes service account with the ARN of the role that you created with the following command (replace IAM Role ARN).
     kubectl annotate serviceaccount -n default eks-dns-ts \
     eks.amazonaws.com/role-arn=arn:aws:iam::111122223333:role/eks-dns-troubleshooter
    
  2. Deploy the EKS DNS Troubleshooter with the following command.
    kubectl apply -f https://raw.githubusercontent.com/joshisumit/eks-dns-troubleshooter/v1.1.0/deploy/eks-dns-troubleshooter.yaml
    
  3. Verify that pod is deployed
kubectl logs -l=app=eks-dns-troubleshooter

Should display output similar to the following.

time="2020-06-27T15:22:34Z" level=info msg="\n\t-----------------------------------------------------------------------\n\tEKS DNS Troubleshooter\n\tRelease:    v1.1.0\n\tBuild:      git-89606ea\n\tRepository: https://github.com/joshisumit/eks-dns-troubleshooter\n\t-----------------------------------------------------------------------\n\t"
{"file":"main.go:81","func":"main","level":"info","msg":"Running on Kubernetes v1.15.11-eks-af3caf","time":"2020-06-27T15:22:34Z"}
{"file":"main.go:100","func":"main","level":"info","msg":"kube-dns service ClusterIP: 10.100.0.10","time":"2020-06-27T15:22:34Z"}
{"file":"verify-configs.go:42","func":"checkServieEndpoint","level":"info","msg":"Endpoints addresses: {[{192.168.27.238  0xc0004488a0 \u0026ObjectReference{Kind:Pod,Namespace:kube-system,Name:coredns-6658f9f447-gm5sn,UID:75d7a6cf-4092-4501-a56f-d3c859cdc5d0,APIVersion:,ResourceVersion:36301646,FieldPath:,}} {192.168.5.208  0xc0004488b0 \u0026ObjectReference{Kind:Pod,Namespace:kube-system,Name:coredns-6658f9f447-gqqk2,UID:f1864697-db9d-460c-b219-d5fbbe7484b8,APIVersion:,ResourceVersion:36301561,FieldPath:,}}] [] [{dns 53 UDP} {dns-tcp 53 TCP}]}","time":"2020-06-27T15:22:34Z"}
...
{"file":"main.go:173","func":"main","level":"info","msg":"DNS Diagnosis completed. Please check diagnosis report in /var/log/eks-dns-diag-summary.json file.","time":"2020-06-27T15:23:34Z"}

Once pod is running it will validate and troubleshoot DNS, which will take around 2 mins and then it will generate the diagnosis report in the JSON format (last line in the above log excerpt).

  1. Exec into pod and fetch the diagnosis report
POD_NAME=$(kubectl get pods -l=app=eks-dns-troubleshooter -o jsonpath='{.items..metadata.name}')
kubectl exec -ti $POD_NAME -- cat /var/log/eks-dns-diag-summary.json | jq

OR download the diagnosis report JSON file locally

kubectl exec -ti $POD_NAME -- cat /var/log/eks-dns-diag-summary.json > eks-dns-diag-summary.json

See diagnosis report in JSON format:

{
  "diagnosisCompletion": true,
  "diagnosisToolInfo": {
    "release": "v1.1.0",
    "repo": "https://github.com/joshisumit/eks-dns-troubleshooter",
    "commit": "git-89606ea"
  },
  "Analysis": {
    "dnstest": "DNS resolution is working correctly in the cluster",
    "naclRules": "naclRules are configured correctly...NOT blocking any DNS communication",
    "securityGroupConfigurations": "securityGroups are configured correctly...not blocking any DNS communication"
  },
  "eksVersion": "v1.16.8-eks-e16311",
  "corednsChecks": {
    "clusterIP": "10.100.0.10",
    "endpointsIP": [
      "192.168.10.9",
      "192.168.17.192"
    ],
    "notReadyEndpoints": [],
    "namespace": "kube-system",
    "imageVersion": "v1.6.6",
    "recommendedVersion": "v1.6.6",
    "dnstestResults": {
      "dnsResolution": "success",
      "description": "tests the internal and external DNS queries against ClusterIP and two Coredns Pod IPs",
      "domainsTested": [
        "amazon.com",
        "kubernetes.default.svc.cluster.local"
      ],
      "detailedResultForEachDomain": [
        {
          "domain": "amazon.com",
          "server": "10.100.0.10:53",
          "result": "success"
        },
        {
          "domain": "amazon.com",
          "server": "192.168.10.9:53",
          "result": "success"
        },
        {
          "domain": "amazon.com",
          "server": "192.168.17.192:53",
          "result": "success"
        },
        {
          "domain": "kubernetes.default.svc.cluster.local",
          "server": "10.100.0.10:53",
          "result": "success"
        },
        {
          "domain": "kubernetes.default.svc.cluster.local",
          "server": "192.168.10.9:53",
          "result": "success"
        },
        {
          "domain": "kubernetes.default.svc.cluster.local",
          "server": "192.168.17.192:53",
          "result": "success"
        }
      ]
    },
    "replicas": 2,
    "podNames": [
      "coredns-76f4cb57b4-25x8d",
      "coredns-76f4cb57b4-2vs9w"
    ],
    "corefile": ".:53 {\n    log\n    errors\n    health\n    kubernetes cluster.local in-addr.arpa ip6.arpa {\n      pods insecure\n      upstream\n      fallthrough in-addr.arpa ip6.arpa\n    }\n    prometheus :9153\n    forward . /etc/resolv.conf\n    cache 30\n    loop\n    reload\n    loadbalance\n}\n",
    "resolvconf": {
      "SearchPath": [
        "default.svc.cluster.local",
        "svc.cluster.local",
        "cluster.local",
        "eu-west-2.compute.internal"
      ],
      "Nameserver": [
        "10.100.0.10"
      ],
      "Options": [
        "ndots:5"
      ],
      "Ndots": 5
    },
    "errorCheckInCorednsLogs": {
      "errorsInLogs": false
    }
  },
  "eksClusterChecks": {
    "securityGroupChecks": {
      "IsClusterSGRuleCorrect": true,
      "InboundRule": {
        "isValid": "true"
      },
      "OutboundRule": {
        "isValid": "true"
      }
    },
    "naclRulesCheck": true,
    "region": "eu-west-2",
    "securityGroupIds": [
      "eksctl-dev-cluster-nodegroup-ng-1-SG-40FRGTVAFSPT",
      "eksctl-dev-cluster-cluster-ClusterSharedNodeSecurityGroup-1PISH2DINONDS"
    ],
    "clusterName": "dev-cluster",
    "clusterSecurityGroup": "sg-0529eb51ffbae7373"
  }
}

Notes

Contribute