Direction for finding zookeeper issues - Go clickhouse-operator

Hello i am running a 1 shard 2 replica ClickHouse instance using clickhouse-operator. Actually everything is fine, incl. using ReplicatedMergeTree and 3 zookeeper instances. Data gets replicated correctly. But both the Grafana clickhouse operator board and ClickHouse show ZooKeeperUserExceptions respectively system.errors. I have issues finding the root cause and seeking for direction where to check.

Prometheus is reporting, e.g.

# HELP chi_clickhouse_event_ZooKeeperUserExceptions 
# TYPE chi_clickhouse_event_ZooKeeperUserExceptions counter
chi_clickhouse_event_ZooKeeperUserExceptions{chi="ugcluster",hostname="chi-ugcluster-production-0-0.production.svc.cluster.local",namespace="production"} 12718
chi_clickhouse_event_ZooKeeperUserExceptions{chi="ugcluster",hostname="chi-ugcluster-production-0-1.production.svc.cluster.local",namespace="production"} 7971

I assume its related to ClickHouse KEEPER_EXCEPTION which i see confirmed

SELECT *
FROM system.errors
ORDER BY value DESC

┌─name────────────────────────┬─code─┬─value─┐
│ KEEPER_EXCEPTION            │  999 │ 12718 │
│ NETWORK_ERROR               │  210 │   407 │
│ FILE_DOESNT_EXIST           │  107 │   241 │
│ ALL_CONNECTION_TRIES_FAILED │  279 │   116 │
│ CANNOT_READ_ALL_DATA        │   33 │    68 │
│ TABLE_IS_READ_ONLY          │  242 │    49 │
│ SYNTAX_ERROR                │   62 │    23 │
│ NOT_FOUND_NODE              │  142 │    18 │
│ UNKNOWN_TABLE               │   60 │    14 │
│ NO_REPLICA_HAS_PART         │  234 │     7 │
│ TOO_MANY_ROWS_OR_BYTES      │  396 │     6 │
│ UNKNOWN_IDENTIFIER          │   47 │     4 │
│ NOT_AN_AGGREGATE            │  215 │     4 │
│ BAD_ARGUMENTS               │   36 │     3 │
│ UNKNOWN_DATABASE            │   81 │     3 │
│ UNKNOWN_FUNCTION            │   46 │     1 │
│ CANNOT_OPEN_FILE            │   76 │     1 │
│ NO_ELEMENTS_IN_CONFIG       │  139 │     1 │
│ REPLICA_IS_ALREADY_EXIST    │  253 │     1 │
│ FUNCTION_NOT_ALLOWED        │  446 │     1 │
└─────────────────────────────┴──────┴───────┘

zookeeper table also looks good

SELECT *
FROM system.zookeeper
WHERE path = '/clickhouse/tables/0'

┌─name────────────────────────────┬─value─┬───────czxid─┬───────mzxid─┬───────────────ctime─┬───────────────mtime─┬─version─┬─cversion─┬─aversion─┬─ephemeralOwner─┬─dataLength─┬─numChildren─┬───────pzxid─┬─path─────────────────┐
│ <redacted db.table name> │       │ 25769804336 │ 25769804336 │ 2021-01-27 15:30:38 │ 2021-01-27 15:30:38 │       0 │       13 │        0 │              0 │          0 │          11 │ 25769804343 │ /clickhouse/tables/0 │
└─────────────────────────────────┴───────┴─────────────┴─────────────┴─────────────────────┴─────────────────────┴─────────┴──────────┴──────────┴────────────────┴────────────┴─────────────┴─────────────┴──────────────────────┘

Clickhouse error logs are not showing anything related to zookeeper, e.g. via kubectl exec chi-ugcluster-production-0-0-0 -- cat /var/log/clickhouse-server/clickhouse-server.err.log

The ZooKeeper logs also don't show any error, mainly the following liveness probe

$ kubectl logs zk-0
...
2021-01-28 14:35:56,051 [myid:0] - INFO  [NIOWorkerThread-2:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:41174
2021-01-28 14:36:01,420 [myid:0] - INFO  [NIOWorkerThread-1:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:41186
2021-01-28 14:36:06,051 [myid:0] - INFO  [NIOWorkerThread-2:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:41188
2021-01-28 14:36:11,420 [myid:0] - INFO  [NIOWorkerThread-1:NIOServerCnxn@507] - Processing ruok command from /127.0.0.1:41200
...

This is the zk.yaml (zk-0/1/2 are running in same namespace production as clickhouse)

apiVersion: v1
kind: ConfigMap
metadata:
  name: zk
  namespace: production
data:
  run.sh: |
    #!/bin/bash

    HOSTNAME=`hostname -s`
    echo "My hostname: $HOSTNAME"
    if [[ $HOSTNAME =~ (.*)-([0-9]+)$ ]]; then
      ORD=${BASH_REMATCH[2]}
      export ZOO_MY_ID=$((ORD))
    else
      echo "Failed to get index from hostname $HOST"
      exit 1
    fi

    echo $ZOO_MY_ID > /zk/data/myid

    /docker-entrypoint.sh ./bin/zkServer.sh start-foreground
  zoo.cfg: |
    dataDir=/zk/data
    dataLogDir=/zk/datalog
    clientPort=2181
    clientPortAddress=0.0.0.0
    maxClientCnxns=0
    tickTime=2000
    initLimit=5
    syncLimit=2
    autopurge.snapRetainCount=3
    autopurge.purgeInterval=0
    standaloneEnabled=true
    admin.enableServer=true
    4lw.commands.whitelist=*
    server.0=zk-0.zk.production.svc:2888:3888;2181
    server.1=zk-1.zk.production.svc:2888:3888;2181
    server.2=zk-2.zk.production.svc:2888:3888;2181
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: zk
  name: zk
  namespace: production
spec:
  podManagementPolicy: Parallel
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: zk
  serviceName: zk
  template:
    metadata:
      labels:
        app: zk
      name: zk
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: zk
            topologyKey: kubernetes.io/hostname
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                - key: node.kubernetes.io/instance-type
                  operator: In
                  values:
                  - s3.large.4
                  - s3.large.8
      containers:
      - command:
        - bash
        - /run.sh
        image: zookeeper:3.6.1
        imagePullPolicy: Always
        livenessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok
          failureThreshold: 6
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: zk
        ports:
        - containerPort: 2181
          name: client
          protocol: TCP
        - containerPort: 2888
          name: follower
          protocol: TCP
        - containerPort: 3888
          name: election
          protocol: TCP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - echo "ruok" | timeout 2 nc -w 2 localhost 2181 | grep imok
          failureThreshold: 6
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: config
        resources:
          requests:
            cpu: 250m
            memory: 256Mi
        volumeMounts:
        - mountPath: /run.sh
          name: config
          subPath: run.sh
        - mountPath: /conf/zoo.cfg
          name: config
          subPath: zoo.cfg
        - mountPath: /zk/data
          name: data
        - mountPath: /zk/datalog
          name: datalog
      volumes:
      - configMap:
          defaultMode: 420
          name: zk
        name: config
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: datalog
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      volumeMode: Filesystem
  - metadata:
      name: data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 20Gi
      volumeMode: Filesystem
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: zk
  name: zk
  namespace: production
spec:
  clusterIP: None
  ports:
  - name: prometheus
    port: 7000
  - name: tcp-client
    port: 2181
    protocol: TCP
    targetPort: client
  - name: follower
    port: 2888
    protocol: TCP
    targetPort: follower
  - name: tcp-election
    port: 3888
    protocol: TCP
    targetPort: election
  publishNotReadyAddresses: true
  selector:
    app: zk
  sessionAffinity: None
  type: ClusterIP

I am mainly asking for directions and hope to find out the issue myself then.

$ kubectl describe deployment clickhouse-operator -n kube-system
Name:                   clickhouse-operator
Namespace:              kube-system
CreationTimestamp:      Wed, 23 Dec 2020 15:38:43 +0100
Labels:                 app=clickhouse-operator
                        clickhouse.altinity.com/app=chop
                        clickhouse.altinity.com/chop=0.13.0
Asked Oct 05 '21 16:10
avatar ben-efiz
ben-efiz

6 Answer:

@ben-efiz try to execute grep -r -i zoo /var/log/clickhouse-server/ on each clickhouse server pod

constantly increments ZookeeperUserExceptions is no good

1
Answered Jan 28 '21 at 18:02
avatar  of Slach
Slach

thx @Slach i did on each pod grep, e.g. kubectl exec chi-ugcluster-production-0-0-0 -- grep -r -i zoo /var/log/clickhouse-server/ and the only lines i get back are expected logs i guess

/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:55:45.194849 [ 65 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removing 125 old parts from ZooKeeper
/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:55:45.209041 [ 65 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removed 125 old parts from ZooKeeper. Removing them from filesystem.
/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:56:18.413144 [ 58 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removing 47 old parts from ZooKeeper
/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:56:18.424520 [ 58 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removed 47 old parts from ZooKeeper. Removing them from filesystem.
/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:56:51.519125 [ 65 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removing 145 old parts from ZooKeeper
/var/log/clickhouse-server/clickhouse-server.log:2021.01.28 19:56:51.544077 [ 65 ] {} <Debug> <redacted table> (ebe486c8-9083-4626-9459-3936538867f9): Removed 145 old parts from ZooKeeper. Removing them from filesystem.

The KEEPER_EXCEPTIONs in ClickHouse are still growing though. In StorageReplicatedMergeTree.cpp it tells me "Cannot allocate block number in ZooKeeper". I can not find this message anywhere (using grep as above).

Side topic: i would like to set the logger level less verbose the debug i found that clickhouse-operator definey a ConfigMap so i guess i would need to pull yaml, change and apply again. I guess i have to apply something on the clickhouse yamls as well to it rebuild to take affect?

1
Answered Jan 29 '21 at 13:16
avatar  of ben-efiz
ben-efiz

I can suggest set logger level as trace, debug is not enough, clickhouse-server rotate log files by default https://clickhouse.tech/docs/en/operations/server-configuration-parameters/settings/#serverconfigurationparameters-logger

1
Answered Jan 29 '21 at 14:25
avatar  of Slach
Slach

@ben-efiz do you have enough free space on Zookeeper disk? How you manager your Zookeeper installation?

1
Answered Jan 29 '21 at 14:26
avatar  of Slach
Slach

@ben-efiz do you have enough free space on Zookeeper disk? How you manager your Zookeeper installation?

Ok. Will set logger to trace for a while directly on the instances. Zookeeper is managed by above yaml on same cluster, same namespace as the clickhouse instances. Regarding storage i have one 20GB data mount and one 20GB datalog mount (was recommended to have two different ones). They are barely used yet (probably because its not yet running that long)

$ kubectl exec zk-0 -- df -h
Filesystem                                                                                        Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:1-131077-0fcba805b9d13040d0facea513c66ba72f0c3d146890f068bf5ecee9baa6bdd8  9.8G  289M  9.0G   4% /
tmpfs                                                                                              64M     0   64M   0% /dev
tmpfs                                                                                             3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/vgpaas-dockersys                                                                       18G   51M   17G   1% /data
/dev/mapper/vgpaas-kubernetes                                                                     9.8G   37M  9.2G   1% /run.sh
/dev/sda                                                                                           20G  325M   20G   2% /zk/datalog
/dev/sdb                                                                                           20G   69M   20G   1% /zk/data
shm                                                                                                64M     0   64M   0% /dev/shm
tmpfs                                                                                             3.9G   12K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                                                             3.9G     0  3.9G   0% /proc/acpi
tmpfs                                                                                             3.9G     0  3.9G   0% /proc/scsi
tmpfs                                                                                             3.9G     0  3.9G   0% /sys/firmware
1
Answered Jan 29 '21 at 15:35
avatar  of ben-efiz
ben-efiz

I found a warning

<Warning> Application: Listen [::]:8123 failed: Poco::Exception. Code: 1000, e.code() = 0, e.displayText() = DNS error: EAI: Address family for hostname not supported (version 21.1.2.15 (official build)). If it is an IPv6 or IPv4 address and your host has disabled IPv6 or IPv4, then consider to specify not disabled IPv4 or IPv6 address to listen in <listen_host> element of configuration file. Example for disabled IPv6: <listen_host>0.0.0.0</listen_host> . Example for disabled IPv4: <listen_host>::</listen_host>

And thats actually exactly how it is setup by clickhouse-operator?

$ kubectl exec chi-ugcluster-production-0-0-0 -- cat /etc/clickhouse-server/config.d/01-clickhouse-listen.xml
<yandex>
    <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
    <listen_host>::</listen_host>
    <listen_host>0.0.0.0</listen_host>
    <listen_try>1</listen_try>
</yandex>
1
Answered Jan 29 '21 at 15:44
avatar  of ben-efiz
ben-efiz