Setup Grafana OnCall @Home

Hello visitor! I hope you are fine.

It’s nice to meet you here.

Oh, you are passionate about alerting? Like me! What a coincidence!

OK, I knew, you are here because of the title…

I’m so happy to share with you my passion. What to say to start… hum… did you hear about Grafana OnCall?!

Will it continue like this for long?

It’s totally awesome, let’s talk about it. Come on, take a lemonade and be prepare for the fun!

Fun, really? Oh, God!

What is Grafana OnCall?

Grafana OnCall is an Open Source Software on-call management system developed by Grafana Labs, famous cloud company for products like Grafana and Loki. Their product are awesome, open-source (and we love open-source) and self-hostable.

If you don’t know Grafana’s products well, you could spend the rest of the day to pay attention to them. It will be worth!

An on-call management system starts to be useful when you have several alerts to manage, on-call cycles to handle and lot of people to organize! Grafana and Alert Manager are great to forward alert to a media, but it stays limited for a professionnal usage. Grafana on-call introduces on-call schedule, SMS/Phone alerts and complexe incident management.

General setup

Required material

  • Kubernetes cluster (Suggestions: k3s or k3d)
  • Grafana instance configured (Oncall’s chart provides its but I prefer to integrate oncall on my instance, don’t you?)

OnCall Installation

Default

# Install grafana repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

# Default configuration
helm install \
  --wait \
  --set base_url=example.com \
  --set grafana."grafana\.ini".server.domain=example.com \
  release-oncall \
  grafana/oncall

Suggested values

That’s probably the main part of why I am writing this article. I had to find a lot of specific configuration and there is not a lot of article/Github issue about it. Here it’s my final values file with the correct configuration.

base_url: oncall.example.org
base_url_protocol: https

env:
  # https://community.grafana.com/t/grafana-oss-oncall-invalid-token-for-cloud/109543/8
  # Author's note: It seems to be this URL for every european users. Maybe in your case it might be different.
  # Anyway, when you create a token on Grafana Cloud, they give you an URL... that might be the correct one to use.
  GRAFANA_CLOUD_ONCALL_API_URL: https://oncall-prod-eu-west-0.grafana.net/oncall

ui:
  # Author's note: when enabled, it requires a Docker image which is not public
  enabled: false # Seems to be a private project

oncall:
  devMode: false
  mirageCipherIV: random-16-bytes-string

ingress:
  enabled: true
  # Author's note: add your favorite ingress configuration
ingress-nginx:
  enabled: false

externalGrafana:
  # Author's note: I use my grafana instance. Don't forget to add the port when
  # your use internal DNS!
  url: http://my-grafana-service.monitoring:3000
database:
  type: postgresql

# It was easier to configure it and nevertheless I think it's better to have the persistence
# layer in a secured postgres, thus we don't give much attention to our testing project's persitence, PVC, etc...
externalPostgresql:
  host: db_host
  port: 5432
  db_name: db_name
  user: db_user
  password: db_password
  # Use an existing secret for the database password
  existingSecret:
  # The key in the secret containing the database password
  passwordKey:
  # Extra options (see example below)
  # Reference: https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
  options:
  # options: >-
  #   sslmode=verify-full
  #   sslrootcert=/mnt/postgres-tls/ca.crt
  #   sslcert=/mnt/postgres-tls/client.crt
  #   sslkey=/mnt/postgres-tls/client.key

# Author's note: we use our own Grafana instance
grafana:
  enabled: false
cert-manager:
  enabled: false
mariadb:
  enabled: false
# Author's note: we use our own PostgreSQL instance
postgresql:
  enabled: false
redis:
  enabled: true
  architecture: standalone
  auth:
    enabled: true
  master:
    # Author's note: personnal opinion, I don't like stateful app
    # so if it's not necessary, I let applications lose their data
    # Please do as you want!
    persistence:
      enabled: false

# Author's note: Rabbitmq was pretty hard to correctly configure. It's my working configuration.
# tips: if you need to change something, ensure the PVC is destroy and recreate,
# to garantee blank and fresh configuration.
rabbitmq:
  auth:
    username: user
    password: long-and-strong-password
  rbac:
    create: false
  clustering:
    enabled: false
  serviceAccount:
    create: true
    name: rabbitmq-ha
    automountServiceAccountToken: true
  networkPolicy:
    enabled: true
    allowExternal: true
  persistence:
    enabled: true
    size: 500Mi

Grafana configuration

Our Grafana instance requires to install the plugin. You can do it by environment variable. If it’s not already the case, I strongly recommend to add a PVC to your Grafana instance. Oncall configuration will require manual operation and we want to do it only once!

  • You can specify a specific version of the pluging to use
  • vv1.9.22 is expected. Don’t ask why.
GF_INSTALL_PLUGINS: "grafana-oncall-app vv1.9.22"

Starting from version 1.9.0, there is a necessary action to configure on-call.

  • root and secret are admin IDs for my grafana instance. Modify according yours.
  • In the payload onCallApiUrl and grafanaUrl should be edited according your configuration. It’s in-cluster DNS address of services.
# With pord forward
curl -X POST 'http://root:secret@localhost:3000/api/plugins/grafana-oncall-app/settings' -H "Content-Type: application/json" -d '{"enabled":true, "jsonData":{"stackId":5, "orgId":100, "onCallApiUrl":"http://grafana-oncall-on-call-engine.monitoring:8080", "grafanaUrl":"http://my-grafana-service.monitoring:3000"}}'
curl -X POST 'http://root:secret@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/install'