K3s Observability Stack
Self-hosted observability stack on K3s using the LGTM suite — Loki (logs), Grafana (dashboards), Tempo (traces), and Prometheus (metrics) — with Alloy as the unified collector and Pyroscope for continuous profiling. No Grafana Cloud required.
All UIs are exposed via Traefik with TLS and restricted to the WireGuard VPN subnet.
Stack
| Component | Role |
|---|---|
| kube-prometheus-stack | Prometheus + Alertmanager + Grafana + node-exporter |
| Loki | Log storage |
| Tempo | Trace storage |
| Alloy | OTLP receiver + auto-collection (daemonset) |
| Pyroscope | Continuous profiling (pprof) |
Prerequisites
- K3s running with Traefik and cert-manager configured (see the K3s Setup article)
- A
cloudflare-cluster-issuerClusterIssuer - A
dashboard-authanddashboard-allowlistMiddleware inkube-system(see K3s Setup) - Helm installed on the operator machine
Helm Repos
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add grafana-community https://grafana-community.github.io/helm-charts
helm repo update
Deploy
kubectl create namespace monitoring
1. Prometheus + Alertmanager + Grafana
kube-prom-values.yaml
grafana:
enabled: true
adminPassword: "YOUR_GRAFANA_PASSWORD"
persistence:
enabled: true
size: 2Gi
grafana.ini:
server:
root_url: "https://grafana.webux.dev"
users:
allow_sign_up: false
auth.anonymous:
enabled: false
additionalDataSources:
- name: Loki
type: loki
url: http://loki-gateway.monitoring.svc.cluster.local
isDefault: false
- name: Tempo
type: tempo
url: http://tempo.monitoring.svc.cluster.local:3200
isDefault: false
jsonData:
tracesToLogsV2:
datasourceUid: loki
tracesToMetrics:
datasourceUid: prometheus
- name: Pyroscope
type: grafana-pyroscope-datasource
url: http://pyroscope.monitoring.svc.cluster.local:4040
isDefault: false
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: default
folder: Kubernetes
type: file
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
node-exporter:
gnetId: 1860
revision: 37
datasource: Prometheus
traefik:
gnetId: 17346
revision: 9
datasource: Prometheus
prometheus:
prometheusSpec:
enableRemoteWriteReceiver: true
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
serviceMonitorSelectorNilUsesHelmValues: false
podMonitorSelectorNilUsesHelmValues: false
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
nodeExporter:
enabled: true
kubeStateMetrics:
enabled: true
helm install kube-prom prometheus-community/kube-prometheus-stack \
-n monitoring -f kube-prom-values.yaml
2. Loki (logs)
Single-binary mode — ideal for single/dual-node setups.
loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
storage:
type: filesystem
schemaConfig:
configs:
- from: "2024-01-01"
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
deploymentMode: SingleBinary
singleBinary:
replicas: 1
persistence:
enabled: true
size: 20Gi
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
ingester:
replicas: 0
querier:
replicas: 0
queryFrontend:
replicas: 0
queryScheduler:
replicas: 0
distributor:
replicas: 0
compactor:
replicas: 0
indexGateway:
replicas: 0
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0
lokiCanary:
enabled: false
test:
enabled: false
helm install loki grafana/loki \
-n monitoring -f loki-values.yaml
3. Tempo (traces)
tempo-values.yaml
tempo:
ingester:
lifecycler:
ring:
replication_factor: 1
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
metricsGenerator:
enabled: true
remoteWriteUrl: "http://kube-prom-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
persistence:
enabled: true
size: 10Gi
helm install tempo grafana-community/tempo-distributed \
-n monitoring -f tempo-values.yaml
4. Alloy (collector)
Alloy runs as a DaemonSet, collecting pod logs from every node, scraping Traefik metrics, receiving OTLP from apps, and forwarding profiles to Pyroscope.
alloy-values.yaml
alloy:
extraPorts:
- name: otlp-grpc
port: 4317
targetPort: 4317
protocol: TCP
- name: otlp-http
port: 4318
targetPort: 4318
protocol: TCP
configMap:
content: |-
// RECEIVERS - Accept OTLP from apps
otelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
metrics = [otelcol.processor.batch.default.input]
logs = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
// PROCESSORS
otelcol.processor.batch "default" {
output {
metrics = [otelcol.exporter.prometheus.default.input]
logs = [otelcol.processor.transform.loki_hints.input]
traces = [otelcol.exporter.otlphttp.tempo.input]
}
}
// Promote OTLP resource attributes to Loki stream labels
otelcol.processor.transform "loki_hints" {
log_statements {
context = "resource"
statements = [
`set(attributes["loki.resource.labels"], "service.name,k8s.namespace.name,deployment.environment")`,
]
}
output {
logs = [otelcol.exporter.loki.default.input]
}
}
// EXPORTERS
otelcol.exporter.prometheus "default" {
forward_to = [prometheus.remote_write.default.receiver]
resource_to_telemetry_conversion = true
}
prometheus.remote_write "default" {
endpoint {
url = "http://kube-prom-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090/api/v1/write"
}
}
otelcol.exporter.loki "default" {
forward_to = [loki.write.default.receiver]
}
loki.write "default" {
endpoint {
url = "http://loki-gateway.monitoring.svc.cluster.local/loki/api/v1/push"
}
}
otelcol.exporter.otlphttp "tempo" {
client {
endpoint = "http://tempo.monitoring.svc.cluster.local:4318"
}
}
// AUTO-COLLECT - Pod logs from all namespaces
discovery.kubernetes "pods" {
role = "pod"
}
discovery.relabel "pod_logs" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
rule {
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_name"]
separator = "/"
target_label = "job"
}
rule {
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
separator = "/"
action = "replace"
replacement = "/var/log/pods/*$1/*.log"
target_label = "__path__"
}
}
loki.source.file "pod_logs" {
targets = discovery.relabel.pod_logs.output
forward_to = [loki.write.default.receiver]
}
// SCRAPE - Traefik metrics (pod direct — metrics port not on service)
discovery.kubernetes "traefik" {
role = "pod"
namespaces { names = ["kube-system"] }
selectors {
role = "pod"
label = "app.kubernetes.io/name=traefik"
}
}
discovery.relabel "traefik" {
targets = discovery.kubernetes.traefik.targets
rule {
source_labels = ["__meta_kubernetes_pod_container_port_name"]
action = "keep"
regex = "metrics"
}
rule {
source_labels = ["__meta_kubernetes_pod_ip"]
target_label = "__address__"
replacement = "$1:9100"
}
}
prometheus.scrape "traefik" {
targets = discovery.relabel.traefik.output
forward_to = [prometheus.remote_write.default.receiver]
job_name = "traefik"
}
// AUTO-PROFILE - Scrape pprof from annotated pods
// Add to your pod spec:
// annotations:
// profiles.grafana.com/scrape: "true"
// profiles.grafana.com/port: "6060"
discovery.kubernetes "pprof_pods" {
role = "pod"
}
discovery.relabel "pprof_pods" {
targets = discovery.kubernetes.pprof_pods.targets
rule {
source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_scrape"]
action = "keep"
regex = "true"
}
rule {
source_labels = ["__meta_kubernetes_pod_annotation_profiles_grafana_com_port"]
target_label = "__meta_kubernetes_pod_container_port_number"
}
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_label_app"]
target_label = "app"
}
}
pyroscope.scrape "pprof" {
targets = discovery.relabel.pprof_pods.output
forward_to = [pyroscope.write.default.receiver]
profiling_config {
profile.process_cpu { enabled = true }
profile.memory { enabled = true }
profile.goroutine { enabled = true }
profile.block { enabled = true }
profile.mutex { enabled = true }
}
}
pyroscope.write "default" {
endpoint {
url = "http://pyroscope.monitoring.svc.cluster.local:4040"
}
}
controller:
type: daemonset
volumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
volumes:
- name: varlog
hostPath:
path: /var/log
rbac:
create: true
serviceAccount:
create: true
helm install alloy grafana/alloy \
-n monitoring -f alloy-values.yaml
5. Certificates and Ingress
Replace webux.dev with your actual domain.
ingress.yaml
---
# Grafana
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: grafana-cert
namespace: monitoring
spec:
secretName: grafana-tls
issuerRef:
name: cloudflare-cluster-issuer
kind: ClusterIssuer
dnsNames:
- grafana.webux.dev
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: grafana
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`grafana.webux.dev`)
kind: Rule
middlewares:
- name: dashboard-allowlist
namespace: kube-system
services:
- name: kube-prom-grafana
port: 80
tls:
secretName: grafana-tls
---
# Alertmanager
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: alertmanager-cert
namespace: monitoring
spec:
secretName: alertmanager-tls
issuerRef:
name: cloudflare-cluster-issuer
kind: ClusterIssuer
dnsNames:
- alerts.webux.dev
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: alertmanager
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`alerts.webux.dev`)
kind: Rule
middlewares:
- name: dashboard-auth
namespace: kube-system
- name: dashboard-allowlist
namespace: kube-system
services:
- name: kube-prom-kube-prometheus-alertmanager
port: 9093
tls:
secretName: alertmanager-tls
---
# Prometheus
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: prometheus-cert
namespace: monitoring
spec:
secretName: prometheus-tls
issuerRef:
name: cloudflare-cluster-issuer
kind: ClusterIssuer
dnsNames:
- prometheus.webux.dev
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: prometheus
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`prometheus.webux.dev`)
kind: Rule
middlewares:
- name: dashboard-auth
namespace: kube-system
- name: dashboard-allowlist
namespace: kube-system
services:
- name: kube-prom-kube-prometheus-prometheus
port: 9090
tls:
secretName: prometheus-tls
---
# Pyroscope
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: pyroscope-cert
namespace: monitoring
spec:
secretName: pyroscope-tls
issuerRef:
name: cloudflare-cluster-issuer
kind: ClusterIssuer
dnsNames:
- pyroscope.webux.dev
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: pyroscope
namespace: monitoring
spec:
entryPoints:
- websecure
routes:
- match: Host(`pyroscope.webux.dev`)
kind: Rule
middlewares:
- name: dashboard-auth
namespace: kube-system
- name: dashboard-allowlist
namespace: kube-system
services:
- name: pyroscope
port: 4040
tls:
secretName: pyroscope-tls
kubectl apply -f ingress.yaml
# Watch certificates get issued (~1-2 min)
kubectl get certificate -n monitoring -w
# Watch everything come up
kubectl get pods -n monitoring -w
Access (via WireGuard only)
| UI | URL |
|---|---|
| Grafana | https://grafana.webux.dev (admin + your password) |
| Alertmanager | https://alerts.webux.dev |
| Prometheus | https://prometheus.webux.dev |
| Pyroscope | https://pyroscope.webux.dev |
Connecting Apps to Alloy
Add these environment variables to any app deployment to send OTLP telemetry:
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://alloy.monitoring.svc.cluster.local:4317"
- name: OTEL_SERVICE_NAME
value: "your-app-name"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "deployment.environment=production"
To enable pprof scraping via Pyroscope, annotate the pod:
annotations:
profiles.grafana.com/scrape: "true"
profiles.grafana.com/port: "6060"
What Gets Collected Automatically
- All pod logs across every namespace
- Node metrics (CPU, RAM, disk, network)
- Kubernetes metrics (pod restarts, deployments, etc.)
- Traefik request metrics
- App traces, metrics, and logs via OTLP
- Go pprof profiles from annotated pods