fbpx

ContainerD Kubernetes Syslog Forwarding – DZone DevOps


You might have heard that starting version 1.20, Docker is no longer the container runtime in Kubernetes. Although this change didn’t affect the core functionality of Kubernetes, or how pods work in their clusters, there were users that relied on resources provided by the Docker engine. A small sentence in the blog article calls out that a critical component would be affected: logging.

Docker was not the only container runtime at the time of this change. Most cloud providers of Kubernetes (GKE, EKS, or AKS) managed this upgrade by defaulting the new cluster’s runtime to containerd. With this, their native tooling to export logs to their own logging services was properly migrated. If you would deploy a new cluster in version 1.20, you wouldn’t notice that something has changed. Behind the scenes, the monitoring agents were upgraded along with the clusters to start using containerd as a source for logs. No outages, no missing information.

But for those users relying on a third-party logging solution, changing to containerd would break the integration. Loggly, Papertrail, and Syslog destinations using DaemonSets workloads like Logspout were all impacted. They all relied on the Docker runtime to grab logs and send them to a syslog server. 

One solid option I tried is rkubelog. This is a single deployment component that fetches logs from the available Kubernetes API resource via a cluster role:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  namespace: default
  name: rkubelog-reader
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods", "pods/log"]
  verbs: ["get", "watch", "list"]

I found, though, that this doesn’t fit every need. This is a short list of the scenarios in which rkubelog might not be the proper shoe size for your cluster:

  • You’re not allowed to create Cluster Role bindings in the cluster you’re working on.
  • You’re more comfortable distributing the load from log traffic to a cluster-wide DaemonSet instead of having rkubelog fetch and send everything from a single pod. In my particular case, the amount of logs we generate is around 1TB a month due to the size of the cluster, and the resource allocation for that single pod to work with that amount of logs was significant.
  • You require more control over the tags and source labels when sending logs to a Syslog server. There is no filtering or transformation of data available when using rkubelog. The logs that are retrieved are the ones sent.
  • You require to maintain the same format of host/system labels you had before the migration. rkubelog has little room for customization. Although you can fork the project and change the format of the logs being sent.
  • You require some level of filtering before sending the logs to a destination.

So if you find yourself looking for an alternative, the following approach relies on Filebeat (as a DaemonSet) and Logstash to do the job. For this article, I will use Papertrail as a Syslog server as a destination.

Filebeat Installation

Basically, Filebeat will grab the logs from every node /var/log folder and push that data to Logstash. While doing that, it will populate some fields in the objects being sent to allow Logstash to identify what pod are the logs coming from. This is a summary of the pipeline used in Filebeat:

  • It excludes some logs based on their filenames. Some of them will be mandatory for the functionality, like filebeat.* and logstash.* as they could cause recursive logging. But the rest are up to you. You can ignore logs from the Kubernetes System namespace or the ones coming from pods that don’t really have relevant data.
  • Then, it drops the ‘host’ field (if exists), so it can be populated from the filename in the next step.
  • This next step is to generate some fields based on the filename of the logs. The format used in this example es /var/log/containers/%{name}_%{host}_%{uuid}.log
  • Then there is a dissect pipeline that changes the format of every log line. This is because the format in the hosts has a timestamp itself. I found this format tokenizer generic enough for most cases. But if your cluster has a different output, this is the place to tweak.
  • From that previous step, the %{parsed} field is the one with the actual message. So the next two steps are to drop the current value of the key field “message” and replace it with the new parsed one. 
  • At the very end, you just need to add the Logstash destination address. Commented, there is a line to print the output of Filebeat to the stdout. This is useful if the pipeline doesn’t really get the message parsed in your scenario.
apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: monitoring
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.inputs:
    - type: log
      enabled: true
      symlinks: true
      exclude_files: ['filebeat.*',
                      'logstash.*',
                      'azure.*',
                      'kube.*',
                      'ignite.*',
                      'influx.*',
                      'prometheus.*',
                      'rkubelog.*',
                      'node-exporter.*']
      paths:
        - /var/log/containers/*.log
    processors:
      - drop_fields:
          fields: ["host"]
          ignore_missing: true
      - dissect:
          tokenizer: "/var/log/containers/%{name}_%{host}_%{uuid}.log"
          field: "log.file.path"
          target_prefix: ""
          overwrite_keys: true
      - dissect:
          tokenizer: "%{header} F %{parsed}"
          field: "message"
          target_prefix: ""
          overwrite_keys: true
      - drop_fields:
          fields: ["message"]
          ignore_missing: true
      - rename:
          fields:
            - from: "parsed"
              to: "message"
          ignore_missing: true
          fail_on_error: false

    #output.console:
      #pretty: true
    output.logstash:
      hosts: ["${LOGSTASH_HOST}:${LOGSTASH_PORT}"]

Once the configuration is done, you just need to create a DaemonSet for Filebeat. Make sure you have all the proper toleration labels to include tainted nodes.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: monitoring
  labels:
    k8s-app: filebeat
spec:
  selector:
    matchLabels:
      k8s-app: filebeat
  template:
    metadata:
      labels:
        k8s-app: filebeat
    spec:
      terminationGracePeriodSeconds: 30
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: filebeat
        image: docker.elastic.co/beats/filebeat:7.4.0
        args: [
          "-c", "/etc/filebeat.yml",
          "-e",
        ]
        env:
        - name: LOGSTASH_HOST
          value: "logstash"
        - name: LOGSTASH_PORT
          value: "5100"
        securityContext:
          runAsUser: 0
          #If using Red Hat OpenShift uncomment this:
          #privileged: true
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          readOnly: true
          subPath: filebeat.yml
        - name: data
          mountPath: /usr/share/filebeat/data
        - name: varlog
          mountPath: /var/log
          readOnly: true
      volumes:
      - name: config
        configMap:
          defaultMode: 0600
          name: filebeat-config
      - name: varlog
        hostPath:
          path: /var/log
      # data folder stores a registry of read status for all files,
      # so we don't send everything again on a Filebeat pod restart
      - name: data
        hostPath:
          path: /var/lib/filebeat-data
          type: DirectoryOrCreate
      tolerations:
      - key: taintedLabel
        operator: Equal
        value: specialNode
        effect: NoSchedule

Logstash Installation

Once Filebeat is running, you require a Logstash deployment pointing to the Syslog server (Papertrail in this example).

I will rely on the output/tcp plugin for this connection. You can also use the output/syslog plugin, but I found this one to be a little bit more flexible with mutations pipelines. The configuration file for Logstash looks like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-config
  namespace: monitoring
data:
  logstash.conf: |-
    input {
      beats {
          port => 5100
          host => "0.0.0.0"
          type => "log"
      }
    }
    filter {
      mutate {
        replace => { "message" => "%{name} %{message}" }
      }
    }
    output {
      tcp {
          codec => "line"
          host => "logs.papertrailapp.com"
          port => 59999
      }
    }

Basically, this is opening a Filebeat input port at 5100, mutating the message field to include the name of the pod (coming from the Filebeat parsing step), and including it into the message. I’m doing this because, in the Papertrail log input, the first word actually determines the `program` from which grouping and filtering can be done. This is what I meant with customization. Before you send out the logs, you can add as many mutation steps as you need.

The last line is the TCP connection to a Papertrail endpoint. The Logstash deployment looks like this. Don’t forget that the Logstash pod requires a service to be reached from Filebeat.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash
  namespace: monitoring
  labels:
    k8s.service: logstash
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      k8s.service: logstash
  template:
    metadata:
      labels:
        k8s.service: logstash
    spec:
      containers:
      - image: docker.elastic.co/logstash/logstash:7.4.0
        imagePullPolicy: "Always"
        name: logstash
        ports:
        - containerPort: 5100
        resources:
          limits:
            memory: 1024Mi
          requests:
            memory: 1024Mi
        volumeMounts:
          - mountPath: /usr/share/logstash/pipeline/logstash.conf
            subPath: logstash.conf
            name: logstash-config
      hostname: logstash
      restartPolicy: Always
      volumes:
        - name: logstash-config
          configMap:
            name: logstash-config
---
apiVersion: v1
kind: Service
metadata:
  namespace: monitoring
  labels:
    k8s.service: logstash
  name: logstash
spec:
  ports:
  - port: 5100
    targetPort: 5100
    protocol: TCP
    name: logstash
  selector:
    k8s.service: logstash

More documentation and this example can be found here: https://github.com/miguelcallejasp/logging-filebeat-containerd.



Source link

Leave a Reply

Shopping cart

0
image/svg+xml

No products in the cart.

Continue Shopping