为Prometheus实现全局视图和高可用性

作者:Ricardo Castro 编译:沈建苗

确保系统可靠运行是网站可靠性工程师的一项关键任务,主要是收集指标、创建警报和绘制数据图。下面这项工作至关重要:从多个位置和服务收集系统指标,并将它们关联起来,以了解系统功能并支持故障排除。

Prometheus是云原生计算基金会(CNCF)的一个项目,已成为最流行的应用程序和系统监控开源解决方案之一。单个实例可以处理数百万个时间序列,但是系统变得庞大后,Prometheus需要能够扩展并处理增加的负载。由于纵向扩展最终会遇到极限,你需要另一种解决方案。

本文逐步介绍将简单的Prometheus环境转换成Thanos部署环境。那样你就能从单个端点对多个Prometheus实例执行可靠的查询,从而无缝地处理高可用性的Prometheus环境。

实现全局视图和高可用性

Thanos提供了一系列组件,可以提供高可用性的度量系统,存储容量几乎无限制。它可以添加到现有的Prometheus部署环境上,提供全局查询视图、数据备份和历史数据访问等功能。此外,这些功能可彼此独立使用,这使得你只要在需要时引入Thanos功能。

初始集群设置

你将在Kubernetes集群中部署Prometheus,然后在其中模拟所需的场景。kind工具是在本地启动Kubernetes集群的好方法。你将使用以下配置。

# config.yaml

kind: Cluster

apiVersion: kind.x-k8s.io/v1alpha4

name: thanos-demo

nodes:

  - role: control-plane

   Image: kindest/node:v1.23.0@sha256:2f93d3c7b12a3e93e6c1f34f331415e105979961fcddbe69a4e3ab5a93ccbb35

  - role: worker

   Image: kindest/node:v1.23.0@sha256:2f93d3c7b12a3e93e6c1f34f331415e105979961fcddbe69a4e3ab5a93ccbb35

  - role: worker

   image: kindest/node:v1.23.0@sha256:2f93d3c7b12a3e93e6c1f34f331415e105979961fcddbe69a4e3ab5a93ccbb35

有了这个配置,你可以随时启动集群。

~ kind create cluster --config config.yaml

Creating cluster "thanos-demo" ...

✓ Ensuring node image (kindest/node:v1.23.0)

✓ Preparing nodes

✓ Writing configuration

✓ Starting control-plane

✓ Installing CNI

✓ Installing StorageClass

✓ Joining worker nodes

Set kubectl context to "kind-thanos-demo"

You can now use your cluster with:kubectl cluster-info --context kind-thanos-demoHave a nice day!

集群启动并运行后,你要检查安装,以确保可以随时启动Prometheus。你需要kubectl与Kubernetes集群进行交互。

~ kind get clusters

thanos-demo

~ kubectl get nodes

NAME                      STATUS   ROLES                AGE    VERSION

thanos-demo-control-plane Ready    control-plane,master 119s   v1.23.0

thanos-demo-worker        Ready    <none>                88s   v1.23.0

thanos-demo-worker2       Ready    <none>                88s   v1.23.0

~ kubectl get pods -o name -Apod/coredns-64897985d-mz8bv</p>

pod/coredns-64897985d-pxzkq

pod/etcd-thanos-demo-control-plane

pod/kindnet-27cdw

pod/kindnet-42kcv

pod/kindnet-5rlcj

pod/kube-apiserver-thanos-demo-control-plane

pod/kube-controller-manager-thanos-demo-control-plane

pod/kube-proxy-49mgg

pod/kube-proxy-nhvkm

pod/kube-proxy-z4fpn

pod/kube-scheduler-thanos-demo-control-plane

pod/local-path-provisioner-5bb5788f44-hj5c4

有了这个配置,你可以随时启动集群。

~ kind create cluster --config config.yaml

Creating cluster "thanos-demo" ...

 ✓ Ensuring node image (kindest/node:v1.23.0)

 ✓ Preparing nodes

 ✓ Writing configuration

 ✓ Starting control-plane

 ✓ Installing CNI

 ✓ Installing StorageClass

 ✓ Joining worker nodes 

Set kubectl context to "kind-thanos-demo"

You can now use your cluster with:

kubectl cluster-info --context kind-thanos-demo

Have a nice day!

集群启动并运行后,你要检查安装,以确保可以随时启动Prometheus。你需要kubectl与Kubernetes集群进行交互。

~ kind get clusters

thanos-demo

~ kubectl get nodes

NAME                        STATUS   ROLES                  AGE    VERSION

thanos-demo-control-plane   Ready    control-plane,master   119s   v1.23.0

thanos-demo-worker          Ready    <none>                 88s    v1.23.0

thanos-demo-worker2         Ready    <none>                 88s    v1.23.0

~ kubectl get pods -o name -A

pod/coredns-64897985d-mz8bv

pod/coredns-64897985d-pxzkq

pod/etcd-thanos-demo-control-plane

pod/kindnet-27cdw

pod/kindnet-42kcv

pod/kindnet-5rlcj

pod/kube-apiserver-thanos-demo-control-plane

pod/kube-controller-manager-thanos-demo-control-plane

pod/kube-proxy-49mgg

pod/kube-proxy-nhvkm

pod/kube-proxy-z4fpn

pod/kube-scheduler-thanos-demo-control-plane

pod/local-path-provisioner-5bb5788f44-hj5c4

初始Prometheus设置

你的目标是在现有的Prometheus安装环境上部署Thanos,并扩展其功能。考虑到这一点,需要从启动三个Prometheus服务器入手。拥有多个Prometheus实例出于多个原因,比如分片、高可用性或聚合来自多个位置的查询。

针对这种场景,不妨想象以下设置:你在美国的集群有一台Prometheus服务器,在欧洲有Prometheus服务器的两个副本,它们抓取同样的目标。

若要部署Prometheus,你将使用kube-prometheus-stack图,还需要Helm。安装Helm后,你需要添加kube-prometheus-stack存储库。

~ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

"prometheus-community" has been added to your repositories

~   helm repo update

Hang tight while we grab the latest from your chart repositories...

...Successfully got an update from the "prometheus-community" chart repository

Update Complete. ⎈Happy Helming!⎈

由于实际上你只有一个Kubernetes集群,所以你将通过在不同的命名空间中部署Prometheus来模拟多个区域。你将为europe创建一个命名空间,为united-states创建另一个命名空间。

~ kubectl create namespace europe

namespace/europe created

~ kubectl create namespace united-states

namespace/united-states created

你已有了区域,可以随时部署Prometheus。

# prometheus-europe.yaml

nameOverride: "eu"

namespaceOverride: "europe"

nodeExporter:

  enabled: false

grafana:

  enabled: false

alertmanager:

  enabled: false

kubeStateMetrics:

  enabled: false

prometheus:

  prometheusSpec:

    replicas: 2

    replicaExternalLabelName: "replica"

    prometheusExternalLabelName: "cluster"



# prometheus-united-states.yaml

nameOverride: "us"

namespaceOverride: "united-states"

nodeExporter:

  enabled: false

grafana:

  enabled: false

alertmanager:

  enabled: false

kubeStateMetrics:

  enabled: false

prometheus:

  prometheusSpec:

    replicaExternalLabelName: "replica"

    prometheusExternalLabelName: "cluster"

使用上述配置,你将在每个区域部署Prometheus实例。

~ helm -n europe upgrade -i prometheus-europe prometheus-community/kube-prometheus-stack -f prometheus-europe.yaml

Release "prometheus-europe" does not exist. Installing it now.

NAME: prometheus-europe

LAST DEPLOYED: Sat Jan 22 18:26:22 2022

NAMESPACE: europe

STATUS: deployed

REVISION: 1

TEST SUITE: None

NOTES:

kube-prometheus-stack has been installed. Check its status by running:

  kubectl --namespace europe get pods -l "release=prometheus-europe"





~ helm -n united-states upgrade -i prometheus-united-states prometheus-community/kube-prometheus-stack -f prometheus-united-states.yaml

Release "prometheus-united-states" does not exist. Installing it now.

NAME: prometheus-united-states

LAST DEPLOYED: Sat Jan 22 18:26:48 2022

NAMESPACE: united-states

STATUS: deployed

REVISION: 1

TEST SUITE: None

NOTES:

kube-prometheus-stack has been installed. Check its status by running:

  kubectl --namespace united-states get pods -l "release=prometheus-united-states"


Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

现在可以确保你的Prometheus按预期的方式运行。

~ kubectl -n europe get pods -l app.kubernetes.io/name=prometheus                                                                

NAME                                        READY   STATUS    RESTARTS   AGE

prometheus-prometheus-europe-prometheus-0   2/2     Running   0          18s

prometheus-prometheus-europe-prometheus-1   2/2     Running   0          18s

~ kubectl -n united-states get pods -l app.kubernetes.io/name=prometheus

NAME                                               READY   STATUS    RESTARTS   AGE

prometheus-prometheus-united-states-prometheus-0   2/2     Running   0          39s

你现在可以在每个单独的实例上查询任何指标,但无法执行多集群查询。

部署Thanos Sidecar

kube-prometheus-stack支持将Thanos部署为sidecar,这意味着它将与Prometheus本身一起部署。Thanos sidecar通过StoreAPI来公开Prometheus,而StoreAPI是一个通用的gRPC API,允许Thanos组件从诸多系统获取指标。

# prometheus-europe.yaml

nameOverride: "eu"

namespaceOverride: "europe"

nodeExporter:

  enabled: false

grafana:

  enabled: false

alertmanager:

  enabled: false

kubeStateMetrics:

   enabled: false

prometheus:

  prometheusSpec:

    replicas: 2

   replicaExternalLabelName: "replica"

   prometheusExternalLabelName: "cluster"

   thanos:

     baseImage: quay.io/thanos/thanos

     version: v0.24.0



# prometheus-united-states.yaml

nameOverride: "us"

namespaceOverride: "united-states"

nodeExporter:

  enabled: false

grafana:

  enabled: false

alertmanager:

  enabled: false

kubeStateMetrics:

  enabled: false

prometheus:

  prometheusSpec:

   replicaExternalLabelName: "replica"

   prometheusExternalLabelName: "cluster"

   thanos:

     baseImage: quay.io/thanos/thanos

     version: v0.24.0

有了更新后的配置,你可以随时升级Prometheus。

~ helm -n europe upgrade -i prometheus-europe prometheus-community/kube-prometheus-stack -f 2/prometheus-europe.yaml

Release "prometheus-europe" has been upgraded. Happy Helming!

NAME: prometheus-europe

LAST DEPLOYED: Sat Jan 22 18:42:24 2022

NAMESPACE: europe

STATUS: deployed

REVISION: 2

TEST SUITE: None

NOTES:

kube-prometheus-stack has been installed. Check its status by running:

  kubectl --namespace europe get pods -l "release=prometheus-europe"



~ helm -n united-states upgrade -i prometheus-united-states prometheus-community/kube-prometheus-stack -f 2/prometheus-united-states.yaml

Release "prometheus-united-states" has been upgraded. Happy Helming!

NAME: prometheus-united-states

LAST DEPLOYED: Sat Jan 22 18:43:06 2022

NAMESPACE: united-states

STATUS: deployed

REVISION: 2

TEST SUITE: None

NOTES:

kube-prometheus-stack has been installed. Check its status by running:

  kubectl --namespace united-states get pods -l "release=prometheus-united-states"



Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

你应核查Prometheus pod有一个额外的容器与它们一起运行。

~ kubectl -n europe get pods -l app.kubernetes.io/name=prometheus                                                                

NAME                                        READY   STATUS    RESTARTS   AGE

prometheus-prometheus-europe-prometheus-0   3/3     Running   0          48s

prometheus-prometheus-europe-prometheus-1   3/3     Running   0          65s

~ kubectl -n united-states get pods -l app.kubernetes.io/name=prometheus

NAME                                               READY   STATUS    RESTARTS   AGE

prometheus-prometheus-united-states-prometheus-0   3/3     Running   0          44s

部署Thanos Querier以实现全局视图

Querier实现Prometheus HTTP v1 API,以便通过PromQL查询Thanos集群中的数据。它将允许你从单个端点获取指标。它先从底层StoreAPI收集评估查询所需的数据,之后评估查询,最后返回结果。

你利用kube-prometheus-stack来部署Thanos sidecar。遗憾的是,该图不支持其他Thanos 组件。为此,你将利用Banzai Cloud Helm Charts存储库。与以前一样,你先从添加存储库开始,就跟之前的做法一样。

~ helm repo add banzaicloud https://kubernetes-charts.banzaicloud.com

"banzaicloud" has been added to your repositories

~ helm repo update

Hang tight while we grab the latest from your chart repositories...

...Successfully got an update from the "prometheus-community" chart repository

...Successfully got an update from the "banzaicloud" chart repository

Update Complete. ⎈Happy Helming!⎈

为了模拟集中式监控解决方案,你将创建monitoring命名空间。

~ kubectl create namespace monitoring

namespace/monitoring created

下列配置可配置Thanos Querier,并将它指向Prometheus实例。

# query.yaml

store: # https://thanos.io/tip/components/store/

  enabled: false

compact: # https://thanos.io/tip/components/compact.md/

  enabled: false

bucket: https://thanos.io/v0.8/components/bucket/

  enabled: false

rule: # https://thanos.io/tip/components/rule/

  enabled: false

sidecar: # https://thanos.io/tip/components/sidecar/

  enabled: false

queryFrontend: # https://thanos.io/tip/components/query-frontend.md/

  enabled: false

query: # https://thanos.io/tip/components/query/

  enabled: true

  replicaLabels:

    - replica

 stores:

   - "dnssrv+_grpc._tcp.prometheus-operated.europe.svc.cluster.local"

   - "dnssrv+_grpc._tcp.prometheus-operated.united-states.svc.cluster.local"

有了上述配置,你可以随时部署Querier。

~ helm -n monitoring upgrade -i thanos banzaicloud/thanos -f query.yaml

Release "thanos" does not exist. Installing it now.

NAME: thanos

LAST DEPLOYED: Sat Jan 22 18:48:03 2022

NAMESPACE: monitoring

STATUS: deployed

REVISION: 1

TEST SUITE: None



~ kubectl -n monitoring port-forward svc/thanos-query-http 10902:10902

Forwarding from 127.0.0.1:10902 -> 10902

Forwarding from [::1]:10902 -> 10902

使用port-forward,你可以连接到集群。应确保自己能执行多集群查询。你部署Prometheus后,设置replicaExternalLabelName: “replica”和prometheusExternalLabelName: “cluster”。重复数据删除功能将充分利用这些设置。启用该功能后,你可以确保对来自europe集群的指标执行重复数据删除。那是由于Thanos假设它们来自同一个高可用性组。之所以出现这种情况,是由于它们有相同的标签,除了副本标签外。

部署Thanos Query Frontend以提高可读性

最后一部分是部署Query Frontend(查询前端),这项服务可以放在Querier的前面,以提高可读性。它基于Cortex Query Frontend组件,支持拆分、重试、缓存和慢查询日志等功能。

# query.yaml

store:

  enabled: false

compact:

  enabled: false

bucket:

  enabled: false

rule:

  enabled: false

sidecar:

  enabled: false

queryFrontend:

  enabled: true

query:

  enabled: true

  replicaLabels:

    - replica

 stores:

   - "dnssrv+_grpc._tcp.prometheus-operated.europe.svc.cluster.local"

   - "dnssrv+_grpc._tcp.prometheus-operated.united-states.svc.cluster.local"

更新前面的配置以部署Query Frontend,你现在可以更新设置了。

~ helm -n monitoring upgrade -i thanos banzaicloud/thanos -f query.yaml

Release "thanos" has been upgraded. Happy Helming!

NAME: thanos

LAST DEPLOYED: Sat Jan 22 18:56:29 2022

NAMESPACE: monitoring

STATUS: deployed

REVISION: 2

TEST SUITE: None



~ kubectl -n monitoring port-forward svc/thanos-query-frontend-http 10902:10902

Forwarding from 127.0.0.1:10902 -> 10902

Forwarding from [::1]:10902 -> 10902

再次使用port-forward,你就能够访问Query Frontend了。

Query Frontend是向多个Prometheus实例发送查询的入口点。执行这类查询的服务(比如Grafana)应通过Query Frontend进行查询。

参考链接:

https://thenewstack.io/implement-global-view-and-high-availability-for-prometheus/

K8S中文社区微信公众号

评论 5

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
  1. #5

    Joe Engressia Thank you for Helping me in changing my grade and credit score in good shape, Now am a graduate finally, reach out to him of you need his service related to hacking service, His a very good one. reach him here
    DIGITALDAWGPOUNDHACKERGROUP@GMAIL.COM
    whatsapp no. : +1 732 639 1527

    Ivan Jefferey2个月前 (10-14)回复
  2. #4

    I was able to catch my cheating husband red handed with a lady he has been having a love affair with and this was made possible by Fred hacker that I met through a comment posted by Kimberly Jane on Reddit about his good and professional services. I started getting suspicious of my husband since he became too possessive of his phone which wasn’t the way he did prior before now. He used to be very carefree when it comes to his phone. but now he’s become obsessed and overtly possessive. I knew something was wrong somewhere which was why i did my search for a professional hacker online and contacted the hacker for help so he could penetrate his phone remotely and grant me access to his phones operating system, he got the job done perfectly without my husband knowing about it although it came quite expensive more than i thought of.i was marveled at the atrocities my husband has been committing. Apparently he is a chronic cheat and never really ended things with his ex.. contact him here. Fredvalcyberghost@gmail.com and you can text, call him on +14236411452 and you can WhatsApp him on +15177981808.

    Monica Regina1个月前 (10-28)回复
  3. #3

    My husband has been frequently deleting all messages for the last couple of days from his phone and he didn’t know i was peeping at him, then i asked him why he was deleting all messages from his phone but he claimed that his phone memory was full and needed more space. Immediately I went in search of a hacker who can get me deleted information and contents from my husband’s phone and luckily for me i came across this reputable ethical hacker Me Fred, this hacker got the job done for me and provided me with results and i saw that my husband has been lying to me. He was simply deleting all pictures, call logs, chats and text messages between him and his secret lover so i wont get to see what he has been doing at my back. Thank God for reputable hackers who are ready to help. I must say am really impressed with the services i got from The hacker Detective and am here to say a very big thank you: contact him on fredvalcyberghost@gmail.com and you can text, call him on +;;1- (;;4;;23)641 1452 and whatsapp him on +15177981808

    Stephanie Duran1个月前 (10-28)回复
  4. #2

    Tracking cellphones and getting an accurate report has been a bit worrisome, thanks to Jeffrey whose service helped me locate certain cellphones without breaking a sweat. His service made me know that the internet has become the most common method of cheating nowadays, either emotionally or physically. Taking a few extra steps in getting what would serve as proof for leaving a toxic relationship, to see who your spouse texts or chats with on social media isn’t a bad idea, I’d recommend you reach out to Jeffreyethicalhacker@gmail.com
    Text,call or whatsapp on: +1 (747)345-9036
    WED

    dda liey5541周前 (11-23)回复
  5. #1

    Imagine losing your life savings of almost half a million dollars to a group of African scammers thinking you are going to earn more from cryptocurrency investment, that would have been disaster but Cyber Genie made that not to happen thereby rescuing me from life of torture and regrets that could have led to suicide. Don’t ever give up on trying to recover your lost investment to those African crooks, write them on [Cybergenie AT cyberservices .com] Whatspp [+1-252-512-0391]

    Fernando6天前回复