一、测试目的
本次性能测试在正式环境下单台服务器上Kafka处理MQ消息能力进行压力测试。测试包括对Kafka写入MQ消息和消费MQ消息进行压力测试,根据10w、100w和1000w级别的消息处理结果,评估Kafka的处理性能是否满足项目需求。(该项目期望Kafka能够处理上亿级别的MQ消息)
二、测试范围及方法
2.1 测试范围概述
测试使用Kafka自带的测试脚本,通过命令对Kafka发起写入MQ消息和Kafka消费MQ消息的请求。模拟不同数量级的MQ消息写入和MQ消息消费场景,根据Kafka的处理结果,评估Kafka是否满足处理亿级以上的消息的能力。
2.2性能测试场景设计
2.2.1 Kafka写入消息压力测试
测试场景 | MQ消息数 | 每秒写入消息数 | 记录大小(单位:字节) |
Kafka消息写入测试 | 10W | 2000条 | 1000 |
10W | 5000条 | 1000 | |
100W | 5000条 | 1000 |
2.2.2 Kafka消费消息压力测试
测试场景 | 消费MQ消息数 |
Kafka消息消费测试 | 10W |
100W | |
1000W |
2.3测试方法简要描述
2.3.1测试目的
验证带台服务器上Kafka写入消息和消费消息的能力,根据测试结果评估当前Kafka集群模式是否满足上亿级别的消息处理能力。
2.3.2测试方法
在服务器上使用Kafka自带的测试脚本,分别模拟10w、100w和1000w的消息写入请求,查看Kafka处理不同数量级的消息数时的处理能力,包括每秒生成消息数、吞吐量、消息延迟时间。Kafka消息吸入创建的topic命名为test_perf,使用命令发起消费该topic的请求,查看Kafka消费不同数量级别的消息时的处理能力。
压测命令信息:
测试项 | 压测消息数(单位:W) | 测试命令 |
写入MQ消息 | 10 | ./kafka-producer-perf-test.sh --topic test_perf --num-records 100000 --record-size 1000 --throughput 2000 --producer-props bootstrap.servers=10.150.30.60:9092 |
100 | ./kafka-producer-perf-test.sh --topic test_perf --num-records 1000000 --record-size 2000 --throughput 5000 --producer-props bootstrap.servers=10.150.30.60:9092 | |
1000 | ./kafka-producer-perf-test.sh --topic test_perf --num-records 10000000 --record-size 2000 --throughput 5000 --producer-props bootstrap.servers=10.150.30.60:9092 | |
消费MQ消息 | 10 | ./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 100000 --threads 1 |
100 | ./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 1000000 --threads 1 | |
1000 | ./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 10000000 --threads 1 |
脚本执行目录:服务器上安装Kafka的bin目录;
三、测试环境
3.1 测试环境机器配置表
主 机 | 数量 | 资 源 | 操作系统 |
MQ消息服务/处理 | 1 | 硬件:1(核)-4(G)-40(G) 软件:Kafka单机(kafka_2.12-2.1.0) | ubuntu-16.04.5-server-amd64 |
3.2 测试工具
Kafka压测工具 | Kafka自带压测脚本 |
3.3 测试环境搭建
这里仅仅使用单机版的kakfa,为了快速搭建,使用自带的zk。
新建目录
mkdir /opt/kafka_server_test
dockerfile
FROM ubuntu:16.04# 修改更新源为阿里云ADD sources.list /etc/apt/sources.listADD kafka_2.12-2.1.0.tgz /# 安装jdkRUN apt-get update && apt-get install -y openjdk-8-jdk --allow-unauthenticated && apt-get clean allEXPOSE 9092# 添加启动脚本ADD run.sh .RUN chmod 755 run.shENTRYPOINT [ "/run.sh"]
run.sh
#!/bin/bash# 启动自带的zookeepercd /kafka_2.12-2.1.0bin/zookeeper-server-start.sh config/zookeeper.properties &# 启动kafkasleep 3bin/kafka-server-start.sh config/server.properties
sources.list
deb http://mirrors.aliyun.com/ubuntu/ xenial main restricteddeb http://mirrors.aliyun.com/ubuntu/ xenial-updates main restricteddeb http://mirrors.aliyun.com/ubuntu/ xenial universedeb http://mirrors.aliyun.com/ubuntu/ xenial-updates universedeb http://mirrors.aliyun.com/ubuntu/ xenial multiversedeb http://mirrors.aliyun.com/ubuntu/ xenial-updates multiversedeb http://mirrors.aliyun.com/ubuntu/ xenial-backports main restricted universe multiversedeb http://mirrors.aliyun.com/ubuntu xenial-security main restricteddeb http://mirrors.aliyun.com/ubuntu xenial-security universedeb http://mirrors.aliyun.com/ubuntu xenial-security multiverse
目录结构如下:
./├── dockerfile├── kafka_2.12-2.1.0.tgz├── run.sh└── sources.list
生成镜像
docker build -t kafka_server_test /opt/kafka_server_test
启动kafka
docker run -d -it kafka_server_test
四、测试结果
4.1测试结果说明
本次测试针对Kafka消息处理的能力 进行压力测试,对Kafka集群服务器中的一台进行MQ消息服务的压力测试,关注Kafka消息写入的延迟时间是否满足需求。对Kafka集群服务器中的一台进行MQ消息处理的压力测试,验证Kafka的消息处理能力。
4.2.1写入MQ消息
测试项 | 设置消息总数(单位:w) | 设置单个消息大小(单位:字节) | 设置每秒发送消息数 | 实际写入消息数/秒 | 95%的消息延迟(单位:ms) |
写入MQ消息 | 10 | 1000 | 2000 | 1999.84 | 1 |
100 | 1000 | 5000 | 4999.84 | 1 | |
1000 | 1000 | 5000 | 4999.99 | 1 |
压测结果
在上面已经启动了kafka容器,查看进程
root@ubuntu:/opt# docker psCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES5ced2eb77349 kafka_server_test "/run.sh" 34 minutes ago Up 34 minutes 0.0.0.0:2181->2181/tcp, 0.0.0.0:9092->9092/tcp youthful_bhaskara
进入kafka的bin目录
root@ubuntu:/opt# docker exec -it 5ced2eb77349 /bin/bashroot@5ced2eb77349:/# cd /kafka_2.12-2.1.0/root@5ced2eb77349:/kafka_2.12-2.1.0# cd bin/
1. 写入10w消息压测结果
执行命令
./kafka-producer-perf-test.sh --topic test_perf --num-records 100000 --record-size 1000 --throughput 2000 --producer-props bootstrap.servers=localhost:9092
输出:
records sent, 1202.4 records/sec (1.15 MB/sec), 1678.8 ms avg latency, 2080.0 max latency.records sent, 2771.8 records/sec (2.64 MB/sec), 1300.4 ms avg latency, 2344.0 max latency.records sent, 2061.6 records/sec (1.97 MB/sec), 17.1 ms avg latency, 188.0 max latency.records sent, 1976.6 records/sec (1.89 MB/sec), 10.0 ms avg latency, 177.0 max latency.records sent, 2025.2 records/sec (1.93 MB/sec), 15.4 ms avg latency, 253.0 max latency.records sent, 2000.8 records/sec (1.91 MB/sec), 6.1 ms avg latency, 163.0 max latency.records sent, 1929.7 records/sec (1.84 MB/sec), 3.7 ms avg latency, 128.0 max latency.records sent, 2072.0 records/sec (1.98 MB/sec), 14.1 ms avg latency, 163.0 max latency.records sent, 2001.6 records/sec (1.91 MB/sec), 4.5 ms avg latency, 116.0 max latency.records sent, 1997.602877 records/sec (1.91 MB/sec), 290.41 ms avg latency, 2344.00 ms max latency, 2 ms 50th, 1992 ms 95th, 2177 ms 99th, 2292 ms 99.9th.
2. 写入100w消息压测结果
执行命令
./kafka-producer-perf-test.sh --topic test_perf --num-records 1000000 --record-size 1000 --throughput 5000 --producer-props bootstrap.servers=localhost:9092
输出:
records sent, 2158.5 records/sec (2.06 MB/sec), 2134.9 ms avg latency, 2869.0 max latency.records sent, 7868.4 records/sec (7.50 MB/sec), 1459.2 ms avg latency, 2815.0 max latency.records sent, 4991.0 records/sec (4.76 MB/sec), 20.3 ms avg latency, 197.0 max latency.records sent, 4972.3 records/sec (4.74 MB/sec), 61.8 ms avg latency, 395.0 max latency.records sent, 4880.2 records/sec (4.65 MB/sec), 64.7 ms avg latency, 398.0 max latency.records sent, 5085.9 records/sec (4.85 MB/sec), 17.7 ms avg latency, 180.0 max latency.records sent, 5030.8 records/sec (4.80 MB/sec), 14.7 ms avg latency, 157.0 max latency.records sent, 5056.0 records/sec (4.82 MB/sec), 1.4 ms avg latency, 58.0 max latency.records sent, 5001.0 records/sec (4.77 MB/sec), 0.8 ms avg latency, 58.0 max latency.records sent, 5002.0 records/sec (4.77 MB/sec), 0.6 ms avg latency, 25.0 max latency.records sent, 5000.0 records/sec (4.77 MB/sec), 0.6 ms avg latency, 14.0 max latency.records sent, 5002.0 records/sec (4.77 MB/sec), 0.6 ms avg latency, 19.0 max latency.records sent, 5005.0 records/sec (4.77 MB/sec), 1.2 ms avg latency, 57.0 max latency.records sent, 5003.0 records/sec (4.77 MB/sec), 1.3 ms avg latency, 55.0 max latency.records sent, 5000.0 records/sec (4.77 MB/sec), 0.9 ms avg latency, 44.0 max latency.records sent, 5003.0 records/sec (4.77 MB/sec), 0.6 ms avg latency, 49.0 max latency.records sent, 4988.0 records/sec (4.76 MB/sec), 1.1 ms avg latency, 49.0 max latency.records sent, 5014.0 records/sec (4.78 MB/sec), 0.8 ms avg latency, 44.0 max latency.records sent, 5001.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 10.0 max latency.records sent, 5009.8 records/sec (4.78 MB/sec), 0.5 ms avg latency, 25.0 max latency.records sent, 5001.2 records/sec (4.77 MB/sec), 0.5 ms avg latency, 7.0 max latency.records sent, 5002.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 49.0 max latency.records sent, 5005.0 records/sec (4.77 MB/sec), 0.6 ms avg latency, 25.0 max latency.records sent, 5006.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 14.0 max latency.records sent, 5005.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 19.0 max latency.records sent, 4976.1 records/sec (4.75 MB/sec), 0.6 ms avg latency, 14.0 max latency.records sent, 5036.0 records/sec (4.80 MB/sec), 0.6 ms avg latency, 18.0 max latency.records sent, 4999.8 records/sec (4.77 MB/sec), 0.5 ms avg latency, 14.0 max latency.records sent, 4980.2 records/sec (4.75 MB/sec), 0.5 ms avg latency, 14.0 max latency.records sent, 5026.0 records/sec (4.79 MB/sec), 0.5 ms avg latency, 14.0 max latency.records sent, 5003.0 records/sec (4.77 MB/sec), 0.4 ms avg latency, 10.0 max latency.records sent, 5000.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 16.0 max latency.records sent, 5007.0 records/sec (4.78 MB/sec), 0.5 ms avg latency, 42.0 max latency.records sent, 5001.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 24.0 max latency.records sent, 5002.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 14.0 max latency.records sent, 5009.0 records/sec (4.78 MB/sec), 0.5 ms avg latency, 10.0 max latency.records sent, 5006.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 18.0 max latency.records sent, 5001.0 records/sec (4.77 MB/sec), 0.4 ms avg latency, 6.0 max latency.records sent, 5000.0 records/sec (4.77 MB/sec), 128.2 ms avg latency, 955.0 max latency.records sent, 4999.375078 records/sec (4.77 MB/sec), 88.83 ms avg latency, 2869.00 ms max latency, 1 ms 50th, 327 ms 95th, 2593 ms 99th, 2838 ms 99.9th.
3. 写入1000w消息压测结果
执行命令
./kafka-producer-perf-test.sh --topic test_perf --num-records 10000000 --record-size 1000 --throughput 5000 --producer-props bootstrap.servers=localhost:9092
输出:
records sent, 1053.0 records/sec (1.00 MB/sec), 1952.7 ms avg latency, 3057.0 max latency.records sent, 4173.8 records/sec (3.98 MB/sec), 4585.7 ms avg latency, 5256.0 max latency.records sent, 9765.2 records/sec (9.31 MB/sec), 2621.9 ms avg latency, 4799.0 max latency....records sent, 5000.8 records/sec (4.77 MB/sec), 0.6 ms avg latency, 79.0 max latency.records sent, 4999.2 records/sec (4.77 MB/sec), 0.5 ms avg latency, 54.0 max latency.records sent, 5003.0 records/sec (4.77 MB/sec), 0.5 ms avg latency, 19.0 max latency.records sent, 4996.445029 records/sec (4.76 MB/sec), 310.11 ms avg latency, 22474.00 ms max latency, 1 ms 50th, 1237 ms 95th, 7188 ms 99th, 20824 ms 99.9th.
kafka-producer-perf-test.sh 脚本命令的参数解析(以100w写入消息为例):
--topic topic名称,本例为test_perf--num-records 总共需要发送的消息数,本例为100000--record-size 每个记录的字节数,本例为1000--throughput 每秒钟发送的记录数,本例为5000--producer-props bootstrap.servers=localhost:9092 (发送端的配置信息,本次测试取集群服务器中的一台作为发送端,可在kafka的config目录,以该项目为例:/usr/local/kafka/config;查看server.properties中配置的zookeeper.connect的值,默认端口:9092)
MQ消息写入测试结果解析:
本例中写入100w条MQ消息为例,每秒平均向kafka写入了4.77MB的数据,大概是4999.375条消息/秒,每次写入的平均延迟为88.83毫秒,最大的延迟为2869毫秒。
4.2.2消费MQ消息
消费MQ消息 | 消费消息总数(单位:w) | 共消费数据(单位:M) | 每秒消费数据(单位:M) | 每秒消费消息数 | 消费耗时(单位:s) |
消费MQ消息 | 10 | 95.36 | 137 | 143899.3 | 0.695 |
100 | 953.66 | 177.19 | 185804.5 | 5.38 | |
1000 | 9536.73 | 198.25 | 207878.6 | 48.11 |
压测结果
1. 消费10w消息压测结果
./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 100000 --threads 1
注意:此脚本没有--zookeeper选项,参考链接有错误!
必须要执行写入10w消息之后,才能执行上面的命令,否则运行时,会报下面的错误!
[2018-12-06 05:47:52,832] WARN [Consumer clientId=consumer-1, groupId=perf-consumer-19548] Error while fetching metadata with correlation id 18 : {test_perf=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient)WARNING: Exiting before consuming the expected number of messages: timeout (10000 ms) exceeded. You can use the --timeout option to increase the timeout.
正常输出:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec2018-12-06 05:50:41:276, 2018-12-06 05:50:45:281, 95.3674, 23.8121, 100000, 24968.7890, 78, 3927, 24.2851, 25464.7313
2. 消费100w消息压测结果
./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 1000000 --threads 1
输出:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec2018-12-06 05:59:32:360, 2018-12-06 05:59:51:624, 954.0758, 49.5264, 1000421, 51932.1532, 41, 19223, 49.6320, 52042.9173
3. 消费1000w消息压测结果
./kafka-consumer-perf-test.sh --broker-list localhost:9092 --topic test_perf --fetch-size 1048576 --messages 10000000 --threads 1
输出:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec2018-12-06 06:35:54:143, 2018-12-06 06:38:05:585, 9536.9539, 72.5564, 10000221, 76080.8646, 39, 131403, 72.5779, 76103.4451
kafka-consumer-perf-test.sh 脚本命令的参数为:
--broker-list 指定kafka的链接信息,本例为localhost:9092--topic 指定topic的名称,本例为test_perf,即4.2.1中写入的消息;--fetch-size 指定每次fetch的数据的大小,本例为1048576,也就是1M--messages 总共要消费的消息个数,本例为1000000,100w 以本例中消费100w条MQ消息为例总共消费了954.07M的数据,每秒消费数据大小为49.52M,总共消费了1000421条消息,每秒消费51932.15条消息。五、结果分析
一般写入MQ消息设置5000条/秒时,消息延迟时间小于等于1ms,在可接受范围内,说明消息写入及时。
Kafka消费MQ消息时,1000W待处理消息的处理能力如果在每秒20w条以上,那么处理结果是理想的。
根据Kafka处理10w、100w和1000w级的消息时的处理能力,可以评估出Kafka集群服务,是否有能力处理上亿级别的消息。 本次测试是在单台服务器上进行,基本不需要考虑网络带宽的影响。所以单台服务器的测试结果,对评估集群服务是否满足上线后实际应用的需求,很有参考价值。
本文参考链接: