一、 写作背景
近来工作需要,预将公司现有应用服务 Docker
化,并使用 Kubernetes
进行统一资源编排,现将部署过程记录一下。
Prometheus Operator 是 CoreOS 开发的基于 Prometheus 的 Kubernete s监控方案,也可能是目前功能最全面的开源方案。更多信息可以查看https://github.com/coreos/prometheus-operator
二、 系列文章
- 快速搭建Kubernetes高可用集群一 基础环境初始化
- 快速搭建Kubernetes高可用集群二 Kubeadm 初始化集群
- 快速搭建Kubernetes高可用集群三 Ingress、Dashboard、Metrics-server
- 快速搭建Kubernetes高可用集群四 Rook-Ceph
- 快速搭建Kubernetes高可用集群五 Harbor
- 快速搭建Kubernetes高可用集群六 Prometheus
- 快速搭建Kubernetes高可用集群七 ELK-stack
三、 项目部署
3.1 项目地址
大多数使用 helm 安装的服务,在这个项目里都有配置文件,可以直接使用,或做适当的修改后使用。
添加 repo 并下载部署配置文件
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# 下载项目文件
helm pull prometheus-community/prometheus-operator
# 解压项目文件
tar zxvf prometheus-operator-9.3.2.tgz
3.2 Docker Image 准备
部署所需镜像
quay.io/prometheus/alertmanager:v0.21.0
squareup/ghostunnel:v1.5.2
jettech/kube-webhook-certgen:v1.2.1
quay.io/coreos/prometheus-operator:v0.38.1
docker.io/jimmidyson/configmap-reload:v0.3.0
quay.io/coreos/prometheus-config-reloader:v0.38.1
k8s.gcr.io/hyperkube:v1.16.12
quay.io/prometheus/prometheus:v2.18.2
下载镜像
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/alertmanager:v0.21.0
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/ghostunnel:v1.5.2
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/kube-webhook-certgen:v1.2.1
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-operator:v0.38.1
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/configmap-reload:v0.3.0
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-config-reloader:v0.38.1
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus:v2.18.2
docker pull registry.cn-beijing.aliyuncs.com/fcu3dx/hyperkube:v1.16.12
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/alertmanager:v0.21.0 quay.io/prometheus/alertmanager:v0.21.0
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/ghostunnel:v1.5.2 squareup/ghostunnel:v1.5.2
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/kube-webhook-certgen:v1.2.1 jettech/kube-webhook-certgen:v1.2.1
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-operator:v0.38.1 quay.io/coreos/prometheus-operator:v0.38.1
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/configmap-reload:v0.3.0 docker.io/jimmidyson/configmap-reload:v0.3.0
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-config-reloader:v0.38.1 quay.io/coreos/prometheus-config-reloader:v0.38.1
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus:v2.18.2 quay.io/prometheus/prometheus:v2.18.2
docker tag registry.cn-beijing.aliyuncs.com/fcu3dx/hyperkube:v1.16.12 k8s.gcr.io/hyperkube:v1.16.12
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/alertmanager:v0.21.0
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/ghostunnel:v1.5.2
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/kube-webhook-certgen:v1.2.1
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-operator:v0.38.1
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/configmap-reload:v0.3.0
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus-config-reloader:v0.38.1
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/prometheus:v2.18.2
docker rmi registry.cn-beijing.aliyuncs.com/fcu3dx/hyperkube:v1.16.12
3.3 创建 secret
- 证书使用
cfssl
方式创建,详见快速搭建Kubernetes高可用集群三 Ingress、Dashboard、Metrics-server的** 4.3.2 创建证书** 小节,我在之前创建证书时使用了 *.kube.uat,所以所有以 kube.uat 为后缀的域名均可使用该证书。 - 使用证书创建 secret
需要创建以下四个 secret:
- alertmanager-general-tls
- grafana-general-tls
- thanos-gateway-tls
- prometheus-general-tls
# 采用 Helm 方式部署,需要先创建 namespace "monitoring",否则会部署到 namespace "default" 下。
kubectl create namespace monitoring
# 进入证书文件目录
cd /etc/kubernetes/pki/
# 创建证书,指定命名空间为 rook-ceph
kubectl -n monitoring create secret tls alertmanager-general-tls --key kube-uat-key.pem --cert kube-uat.pem
kubectl -n monitoring create secret tls grafana-general-tls --key kube-uat-key.pem --cert kube-uat.pem
kubectl -n monitoring create secret tls thanos-gateway-tls --key kube-uat-key.pem --cert kube-uat.pem
kubectl -n monitoring create secret tls prometheus-general-tls --key kube-uat-key.pem --cert kube-uat.pem
3.4 在 Rook-ceph 中创建 StorageClass
Harbor需要使用数据库等持久化存储,这里使用 pvc 的方式挂载磁盘,首先创建 StorageClass , pv 会在部署的时候自动创建在指定的 StorageClass 上。
创建文件 prometheus-storageclass.yaml,在原文件 ./rook/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml
的基础上修改而成。
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
# 这里更改为你需要创建的存储池名称,便于区别与其它存储池
name: prometheus-replicapool
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
requireSafeReplicaSize: true
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
# 这里设置要创建的StorageClass名称。prometheus的相关数据均存储在此StorageClass。
name: prometheus-rook-ceph-block
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
clusterID: rook-ceph
# 这里是 StorageClass 所在的存储池,是上面一步创建的。
pool: prometheus-replicapool
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
# 分区模式选择为 xfs (data-approve-charge7 以上都支持该分区模式)
csi.storage.k8s.io/fstype: xfs
allowVolumeExpansion: true
reclaimPolicy: Delete
3.5 修改配置文件并部署
3.5.1 修改配置文件
主要修改域名部分和持久化存储部分,其它部分暂不做修改。
nameOverride: ""
namespaceOverride: ""
kubeTargetVersionOverride: ""
fullnameOverride: ""
commonLabels: {}
defaultRules:
create: true
rules:
alertmanager: true
etcd: true
general: true
k8s: true
kubeApiserver: true
kubeApiserverAvailability: true
kubeApiserverError: true
kubeApiserverSlos: true
kubelet: true
kubePrometheusGeneral: true
kubePrometheusNodeAlerting: true
kubePrometheusNodeRecording: true
kubernetesAbsent: true
kubernetesApps: true
kubernetesResources: true
kubernetesStorage: true
kubernetesSystem: true
kubeScheduler: true
kubeStateMetrics: true
network: true
node: true
prometheus: true
prometheusOperator: true
time: true
appNamespacesTarget: ".*"
labels: {}
annotations: {}
additionalPrometheusRules: []
global:
rbac:
create: true
pspEnabled: true
pspAnnotations: {}
imagePullSecrets: []
alertmanager:
enabled: true
apiVersion: v2
serviceAccount:
create: true
name: ""
annotations: {}
podDisruptionBudget:
enabled: false
minAvailable: 1
maxUnavailable: ""
config:
global:
resolve_timeout: 5m
route:
group_by: ['job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'null'
routes:
- match:
alertname: Watchdog
receiver: 'null'
receivers:
- name: 'null'
tplConfig: false
templateFiles: {}
ingress:
enabled: true
annotations: {}
labels: {}
hosts:
- alertmanager.kube.uat
paths:
- /
tls:
- secretName: alertmanager-general-tls
hosts:
- alertmanager.kube.uat
secret:
annotations: {}
ingressPerReplica:
enabled: false
annotations: {}
labels: {}
hostPrefix: ""
hostDomain: ""
paths: []
tlsSecretName: ""
tlsSecretPerReplica:
enabled: false
prefix: "alertmanager"
service:
annotations: {}
labels: {}
clusterIP: ""
port: 9093
targetPort: 9093
nodePort: 30903
externalIPs: []
loadBalancerIP: ""
loadBalancerSourceRanges: []
type: ClusterIP
servicePerReplica:
enabled: false
annotations: {}
port: 9093
targetPort: 9093
nodePort: 30904
loadBalancerSourceRanges: []
type: ClusterIP
serviceMonitor:
interval: ""
selfMonitor: true
metricRelabelings: []
relabelings: []
alertmanagerSpec:
podMetadata: {}
image:
repository: quay.io/prometheus/alertmanager
tag: v0.21.0
sha: ""
useExistingSecret: false
secrets: []
configMaps: []
logFormat: logfmt
logLevel: info
replicas: 1
retention: 120h
storage: {}
volumeClaimTemplate:
spec:
storageClassName: prometheus-rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
externalUrl:
routePrefix: /
paused: false
nodeSelector: {}
resources: {}
podAntiAffinity: ""
podAntiAffinityTopologyKey: kubernetes.io/hostname
affinity: {}
tolerations: []
securityContext:
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
listenLocal: false
containers: []
priorityClassName: ""
additionalPeers: []
portName: "web"
grafana:
enabled: true
namespaceOverride: ""
defaultDashboardsEnabled: true
adminPassword: prom-operator
ingress:
enabled: true
annotations: {}
labels: {}
hosts:
- grafana.kube.uat
path: /
tls:
- secretName: grafana-general-tls
hosts:
- grafana.kube.uat
sidecar:
dashboards:
enabled: true
label: grafana_dashboard
annotations: {}
datasources:
enabled: true
defaultDatasourceEnabled: true
annotations: {}
createPrometheusReplicasDatasources: false
label: grafana_datasource
extraConfigmapMounts: []
additionalDataSources: []
service:
portName: service
serviceMonitor:
interval: ""
selfMonitor: true
metricRelabelings: []
relabelings: []
kubeApiServer:
enabled: true
tlsConfig:
serverName: kubernetes
insecureSkipVerify: false
relabelings: []
serviceMonitor:
interval: ""
jobLabel: component
selector:
matchLabels:
component: apiserver
provider: kubernetes
metricRelabelings: []
kubelet:
enabled: true
namespace: kube-system
serviceMonitor:
interval: ""
https: true
cAdvisor: true
probes: true
resource: true
resourcePath: "/metrics/resource/v1alpha1"
cAdvisorMetricRelabelings: []
probesMetricRelabelings: []
cAdvisorRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
probesRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
resourceRelabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
metricRelabelings: []
relabelings:
- sourceLabels: [__metrics_path__]
targetLabel: metrics_path
kubeControllerManager:
enabled: true
endpoints: []
service:
port: 10257
targetPort: 10257
serviceMonitor:
interval: ""
https: false
insecureSkipVerify: null
serverName: null
metricRelabelings: []
relabelings: []
coreDns:
enabled: true
service:
port: 9153
targetPort: 9153
serviceMonitor:
interval: ""
metricRelabelings: []
relabelings: []
kubeDns:
enabled: false
service:
dnsmasq:
port: 10054
targetPort: 10054
skydns:
port: 10055
targetPort: 10055
serviceMonitor:
interval: ""
metricRelabelings: []
relabelings: []
dnsmasqMetricRelabelings: []
dnsmasqRelabelings: []
kubeEtcd:
enabled: true
endpoints:
- 152.17.0.151
- 152.17.0.152
- 152.17.0.153
service:
port: 2379
targetPort: 2379
serviceMonitor:
interval: ""
scheme: http
insecureSkipVerify: false
serverName: ""
caFile: ""
certFile: ""
keyFile: ""
metricRelabelings: []
relabelings: []
kubeScheduler:
enabled: true
endpoints: []
service:
port: 10259
targetPort: 10259
serviceMonitor:
interval: ""
https: false
insecureSkipVerify: null
serverName: null
metricRelabelings: []
relabelings: []
kubeProxy:
enabled: true
endpoints: []
service:
port: 10249
targetPort: 10249
serviceMonitor:
interval: ""
https: false
metricRelabelings: []
relabelings: []
kubeStateMetrics:
enabled: true
serviceMonitor:
interval: ""
metricRelabelings: []
relabelings: []
kube-state-metrics:
namespaceOverride: ""
rbac:
create: true
podSecurityPolicy:
enabled: true
nodeExporter:
enabled: true
jobLabel: jobLabel
serviceMonitor:
interval: ""
scrapeTimeout: ""
metricRelabelings: []
relabelings: []
prometheus-node-exporter:
namespaceOverride: ""
podLabels:
jobLabel: node-exporter
extraArgs:
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
prometheusOperator:
enabled: true
manageCrds: true
tlsProxy:
enabled: true
image:
repository: squareup/ghostunnel
tag: v1.5.2
sha: ""
pullPolicy: IfNotPresent
resources: {}
admissionWebhooks:
failurePolicy: Fail
enabled: true
patch:
enabled: true
image:
repository: jettech/kube-webhook-certgen
tag: v1.2.1
sha: ""
pullPolicy: IfNotPresent
resources: {}
priorityClassName: ""
podAnnotations: {}
nodeSelector: {}
affinity: {}
tolerations: []
namespaces: {}
denyNamespaces: []
serviceAccount:
create: true
name: ""
service:
annotations: {}
labels: {}
clusterIP: ""
nodePort: 30080
nodePortTls: 30443
additionalPorts: []
loadBalancerIP: ""
loadBalancerSourceRanges: []
type: ClusterIP
externalIPs: []
createCustomResource: true
cleanupCustomResource: false
podLabels: {}
podAnnotations: {}
kubeletService:
enabled: true
namespace: kube-system
serviceMonitor:
interval: ""
scrapeTimeout: ""
selfMonitor: true
metricRelabelings: []
relabelings: []
resources: {}
hostNetwork: false
nodeSelector: {}
tolerations: []
affinity: {}
securityContext:
fsGroup: 65534
runAsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
image:
repository: quay.io/coreos/prometheus-operator
tag: v0.38.1
sha: ""
pullPolicy: IfNotPresent
configmapReloadImage:
repository: docker.io/jimmidyson/configmap-reload
tag: v0.3.0
sha: ""
prometheusConfigReloaderImage:
repository: quay.io/coreos/prometheus-config-reloader
tag: v0.38.1
sha: ""
configReloaderCpu: 100m
configReloaderMemory: 25Mi
hyperkubeImage:
repository: k8s.gcr.io/hyperkube
tag: v1.16.12
sha: ""
pullPolicy: IfNotPresent
prometheus:
enabled: true
annotations: {}
serviceAccount:
create: true
name: ""
service:
annotations: {}
labels: {}
clusterIP: ""
port: 9090
targetPort: 9090
externalIPs: []
nodePort: 30090
loadBalancerIP: ""
loadBalancerSourceRanges: []
type: ClusterIP
sessionAffinity: ""
servicePerReplica:
enabled: false
annotations: {}
port: 9090
targetPort: 9090
nodePort: 30091
loadBalancerSourceRanges: []
type: ClusterIP
podDisruptionBudget:
enabled: false
minAvailable: 1
maxUnavailable: ""
thanosIngress:
enabled: false
annotations: {}
labels: {}
servicePort: 10901
hosts:
- thanos-gateway.kube.uat
paths:
- /
tls:
- secretName: thanos-gateway-tls
hosts:
- thanos-gateway.kube.uat
ingress:
enabled: true
annotations: {}
labels: {}
hosts:
- prometheus.kube.uat
paths:
- /
tls:
- secretName: prometheus-general-tls
hosts:
- prometheus.kube.uat
ingressPerReplica:
enabled: false
annotations: {}
labels: {}
hostPrefix: ""
hostDomain: ""
paths: []
tlsSecretName: ""
tlsSecretPerReplica:
enabled: false
prefix: "prometheus"
podSecurityPolicy:
allowedCapabilities: []
serviceMonitor:
interval: ""
selfMonitor: true
scheme: ""
tlsConfig: {}
bearerTokenFile:
metricRelabelings: []
relabelings: []
prometheusSpec:
disableCompaction: false
apiserverConfig: {}
scrapeInterval: ""
evaluationInterval: ""
listenLocal: false
enableAdminAPI: false
image:
repository: quay.io/prometheus/prometheus
tag: v2.18.2
sha: ""
tolerations: []
alertingEndpoints: []
externalLabels: {}
replicaExternalLabelName: ""
replicaExternalLabelNameClear: false
prometheusExternalLabelName: ""
prometheusExternalLabelNameClear: false
externalUrl: ""
nodeSelector: {}
secrets: []
configMaps: []
query: {}
ruleNamespaceSelector: {}
ruleSelectorNilUsesHelmValues: true
ruleSelector: {}
serviceMonitorSelectorNilUsesHelmValues: true
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
podMonitorSelectorNilUsesHelmValues: true
podMonitorSelector: {}
podMonitorNamespaceSelector: {}
retention: 10d
retentionSize: ""
walCompression: false
paused: false
replicas: 1
logLevel: info
logFormat: logfmt
routePrefix: /
podMetadata: {}
podAntiAffinity: ""
podAntiAffinityTopologyKey: kubernetes.io/hostname
affinity: {}
remoteRead: []
remoteWrite: []
remoteWriteDashboards: false
resources: {}
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: prometheus-rook-ceph-block
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
volumes: []
volumeMounts: []
additionalScrapeConfigs: []
additionalScrapeConfigsSecret: {}
additionalPrometheusSecretsAnnotations: {}
additionalAlertManagerConfigs: []
additionalAlertRelabelConfigs: []
securityContext:
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
priorityClassName: ""
thanos: {}
containers: []
initContainers: []
portName: "web"
additionalServiceMonitors: []
additionalPodMonitors: []
3.5.2 部署
helm -n monitoring install kube prometheus-community/kube-prometheus-stack -f gitlab/helm-charts/charts/kube-prometheus-stack/values.yaml
3.5.3 查看
- https://alertmanager.kube.uat
- https://grafana.kube.uat
- https://thanos-gateway.kube.uat
- https://prometheus.kube.uat
转载请注明来源,欢迎对文章中的引用来源进行考证,欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论,也可以邮件至 long@longger.xin