redis analyst (9)- redis cluster issues/puzzles on producation

上篇文章枚举了诸多互联网公司分享的应用redis cluster中遇到的问题,本文罗列所在公司上线后出现的一些问题,也包括一些小的困惑。

问题1:出现auto failover

现象:监控redis半个多月的时候,偶然发现其中1台master自动触发failover。
原因:

1.1 查看出现的时间点的日志:

Node 912a1efa1f4085b4b7333706e546f64d16580761 reported node 3892d1dfa68d9976ce44b19e532d9c0e80a0357d as not reachable.

从日志看到node not reachable,首先想到2个因素:
(1)redis是单进程和单线程,所以有任何一个耗时操作都会导致阻塞时间过长,最终导致failover.
(2)网络因素;
首先排除了可能原因(1),因为从系统应用分析,并无任何特殊操作。都是最普通的操作且请求量很小,可能原因(2)不能排除。

1.2 查看出现问题时间点的系统资源应用情况:

查看了所有的常见指标,除了最近1分钟的load异常外,均正常,锁定原因为:系统load过高,达到7,导致系统僵死。其他节点认为这个节点挂了。

解决:考虑到所有其他指标:cpu/memory/disk等都正常,以及其他2台master一直也正常,业务量非常小,这种情况偶发,所以归结问题原因是这台虚拟机有问题,所以反馈问题并迁移虚拟机,迁移后system load一直平稳无问题。也没有出现failover.

困惑2:slave的ops远小于master的ops

现象:已知所有操作都是删除操作,并无查询操作。所以很好奇,为什么master和slave的ops差别这么大:前3台为master,达到100ops,相反slave不到10.

解惑:CRUD中,不见得只要是CUD就肯定会“传播”命令到slave,还有一个条件是必须“库”发生了改变。例如当前的业务中,处于测试阶段,所有主流操作都是删除操作,而且这些删除操作都是删除一个没有key的操作。所以并没有发生改变(即下文中dirty为0)。
计算dirty值:

    /* Call the command. */
    c->flags &= ~(REDIS_FORCE_AOF|REDIS_FORCE_REPL);
    // 保留旧 dirty 计数器值
    dirty = server.dirty;
    // 计算命令开始执行的时间
    start = ustime();
    // 执行实现函数
    c->cmd->proc(c);
    // 计算命令执行耗费的时间
    duration = ustime()-start;
    // 计算命令执行之后的 dirty 值
    dirty = server.dirty-dirty;

只有dirty值发生改变:

  
        // 如果数据库有被修改,即判断dirty,那么启用 REPL 和 AOF 传播
        if (dirty)
            flags |= (REDIS_PROPAGATE_REPL | REDIS_PROPAGATE_AOF);

        if (flags != REDIS_PROPAGATE_NONE)
            propagate(c->cmd,c->db->id,c->argv,c->argc,flags);

所有的操作,不见得都会改变dirty值:

void delCommand(redisClient *c) {
    int deleted = 0, j;

    for (j = 1; j < c->argc; j++) {

        // 尝试删除键
        if (dbDelete(c->db,c->argv[j])) {
            //改变server.dirty
            server.dirty++;
        }
    }

 }

问题3:socket timeout

现象:查看最近1周数据访问量,有4个socket timeout错误:


redis.clients.jedis.exceptions.JedisConnectionException: java.net.SocketTimeoutException: Read timed out
at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:202)
at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
at redis.clients.jedis.Protocol.process(Protocol.java:151)
at redis.clients.jedis.Protocol.read(Protocol.java:215)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:265)
at redis.clients.jedis.Jedis.del(Jedis.java:197)
at redis.clients.jedis.JedisCluster$110.execute(JedisCluster.java:1205)
at redis.clients.jedis.JedisCluster$110.execute(JedisCluster.java:1202)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:120)
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:31)
at redis.clients.jedis.JedisCluster.del(JedisCluster.java:1207)
at com.webex.dsagent.client.redis.RedisClientImpl.deleteSelectedTelephonyPoolsInfo(RedisClientImpl.java:77)
at sun.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)

配置:
connectionTimeout=800
soTimeout=1000

BTW: 确实消耗了&gt;1000ms
componentType":"Redis","totalDurationInMS":1163

原因: 查看4个错误发生的时间,都发生在某天的一个时间点,且用key计算分段,也处于同一个机器上,所以归结到网络原因或虚拟机问题,无法重现。

redis analyst (8)- redis cluster issues in other companies

Redis Cluster发布历史:

3.0Beta1(2014.2) ->
3.0.0rc(2014.10) ->
3.0GA(2015.4) ->
3.2.11(2017.9)

本文主要分两部分:一部分是综述下一些互联网公司总结的问题;另外是个人总结使用redis的规则

(1)互联网公司实践

a.美团

目前(2016)百余集群,数千节点,主要用于缓存和存储,单个集群内存容量1TB+,15亿+的键,百万级QPS吞吐量

b.唯品会

目前(2016)在线有生产几十个cluster集群,约2千个instances,单个集群最大达到250+instances,主要是后端业务的存储,没有作为cache使用的场景。
c.大街网
command/day 20亿+, Instance 300+, Servers: 几十台, Memory:1T

d.饿了么

e.南航

分享的问题:
(1)主从节点不要在同一个机器部署:不跨机房、要跨机架、可以在一个机柜
(2)误判节点Fail进行切换: 因为单进程单线程,所以单个耗时操作时间,超过cluster-node-timeout或许会引发误判从而进行failover.
(3)某个节点内存占用飙升
(4)不响应请求或停止的情况
(5)安全漏洞
(6)周期性connection timeout
(7)内存被挤爆,系统除了报Cluster down外, 无明显错误提示
(8)AOF文件过大占满磁盘空间
(9)主库重启冲掉从库所有数据

点评:
(1)容易被忽视
(2)考虑单个耗时操作包括哪些?命令本身随着keys数目增多而耗时时间增大,例如flushall,在4.0之前是同步操作。所以存储数据特别大时,可能执行时间较长,导致引发failover。
(3)这个节点被执行了monitor命令, 定位线索的方法:info信息里面的client_longest_output_list是否太大?
(4)磁盘空间不足: 假设仅仅使用rdb,原则上rdb不会超过内存的max memory设置,但是开启aof后,不在受限于此。所以关闭rdb/aof仅使用cache时,其实还是会产生rdb,因为默认全盘同步时,先要产生rdb,然后同步。后期提出的socket直接传输,测试还是产生了rdb,这个问题没有细查。但是保险起见,仅开启rdb时,> max memory些应该问题不大。
(5)分享提到:一个是Redis未添加认证,第二个是Redis以root用户启动,一般不会出现这种问题。
(6)类似(2): 单进程单线程,遇到慢操作
(7)查出为设置max memory,没啥好说的了。
(8)磁盘要不小了,要不aof重写规则设置的不够合理。
(9)redis官方说明也提到了这点,自动拉起挂了的禁用rdb的redis server,重启很快,还没有来得及cluster-node-timeout检测认为fail,所以不会主动切换,从而重新和slave连接后,自己的数据由于没有落地所以为空,直接冲掉backup的所有数据。

结合最近的研究和体会,总结下使用redis的一些“规则”:
(1)部署上避免鸡蛋放一箩筐:master与master之间,master与slave之间
(2)设置cluster-require-full-coverage no:避免某个master挂了,引发整个cluster瘫痪
(3)避免一切耗时操作:keys, flushall, monitor, transaction, etc.
(4)保护好自身:一定要设置max memory,同时设置好daemon,确保自动恢复。
(5)确保内存充足,避免使用swap:设置max memory < 1/2 hardware memory,以保证dump时的copy of write(cow)
(6)不要在产线上,随便执行操作,例如monitor命令
(7)使用尽量多的机器(<1000),而不是使用更大的内存:机器越少,内存越大导致每个机器的重要性更大,同时大内存不易于管理。
(8)不要在同一集群里混用存储和缓存,除非可控大小,否则不要当存储。
(9)考虑“黑马”: 热点数据或者巨大数据
(10)节约内存:尽量短的ttl;尽早删除数据保证内存释放,而不依赖“随机删除”;压缩;数据结构调整
(11)制定好规则:key命名规则;数据库(0-16,cluster不支持)使用规则
(12)使用SSD,加快全复制时间,加快启动恢复速度等等
(13)Redis Cluster一出,别想读写分离了。
(14)做好“真实”的容量评估:内存10G时,dump出的数据一定是某个值么?使用的业务数据到底是什么结构和数量级
(15)自动daemon重启+无落地的Redis,重启过快会冲掉所有数据
(16)准备好系统:关闭Transparent HugePages(在运行时动态分配内存的,所以会带来在运行时内存分配延误);ulimit -n 65535;vm.overcommit_memory = 1(对内存申请来者不拒)

最后总结下trouble shooting的一些方法(不考虑操作系统和应用层):

(1)日志:

loglevel verbose
logfile “/var/redis/log/redis.log”

(2)工具:

2.1 command: https://redis.io/commands#server

cluster info;

info;

role;

127.0.0.1:7001> role
1) "slave"
2) "10.224.91.234"
3) (integer) 7001
4) "connected"
5) (integer) 23871

cluster keyslot “key”;

127.0.0.1:7001> cluster keyslot “key”;
(integer) 13592

cluster nodes;

cluster slaves 2776d678f92e1a5ee5cdefd4c81df7615c799265

127.0.0.1:7001> cluster slaves 2776d678f92e1a5ee5cdefd4c81df7615c799265
1) "45f14c44358f77d0e1a754863000460dad847fae 10.224.91.233:7001 slave 2776d678f92e1a5ee5cdefd4c81df7615c799265 0 1519982493781 6 connected"

slowlog

latency

monitor


127.0.0.1:7001>monitor
OK
1519981033.980566 [0 10.224.91.234:7001] "PING"
1519981044.015475 [0 10.224.91.234:7001] "PING"
1519981044.838077 [0 127.0.0.1:37727] "AUTH" "P@ss123"

client list

127.0.0.1:7001> CLIENT LIST
id=8 addr=10.224.91.234:7001 fd=7 name= age=16484 idle=3 flags=M db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
id=277 addr=127.0.0.1:37725 fd=18 name= age=475 idle=3 flags=O db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=monitor
id=278 addr=127.0.0.1:37727 fd=19 name= age=454 idle=369 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping
id=281 addr=127.0.0.1:37741 fd=20 name= age=331 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client

 

2.2 trouble shooting tool:

redis-check-rdb

./redis-check-rdb /opt/redis/dump.rdb
[offset 0] Checking RDB file /opt/redis/dump.rdb
[offset 26] AUX FIELD redis-ver = ‘3.2.8’
[offset 40] AUX FIELD redis-bits = ’64’
[offset 52] AUX FIELD ctime = ‘1519965014’
[offset 67] AUX FIELD used-mem = ‘2511240’
[offset 76] Checksum OK
[offset 76] \o/ RDB looks OK! \o/
[info] 0 keys read
[info] 0 expires
[info] 0 already expired

redis-check-aof

redis-trib.rb (info/check)

 

2.3 debug tool for learn:

 
debug

127.0.0.1:7001> debug sleep 1
OK
(1.00s)

redis analyst (7)- redis server/client source code analyst

redis/jedis虽然代码比较少,但是很难一篇文章概况所有设计及实现细节,所以还是抓取日常工作中可能的疑问,带着问题去翻代码:

1 假设当前内存占用是10G,那么落到磁盘rdb时,是否一定也是10G?

不一定,因为一方面可以设置压缩(默认开启:rdbcompression yes),另外一方面,假设内存中存在大量已过期数据,则也过滤掉这部分数据, dump的实现上,会遍历所有数据库(0-默认16),遍历所有数据然后保存:

int rdbSaveKeyValuePair(rio *rdb, robj *key, robj *val,
                        long long expiretime, long long now)
{
    /* Save the expire time */
    if (expiretime != -1) {
        /* If this key is already expired skip it */
        if (expiretime < now) return 0;
        if (rdbSaveType(rdb,RDB_OPCODE_EXPIRETIME_MS) == -1) return -1;
        if (rdbSaveMillisecondTime(rdb,expiretime) == -1) return -1;
    }

    /* Save type, key, value */
    if (rdbSaveObjectType(rdb,val) == -1) return -1;
    if (rdbSaveStringObject(rdb,key) == -1) return -1;
    if (rdbSaveObject(rdb,val) == -1) return -1;
    return 1;
}

2 monitor实现

void monitorCommand(client *c) {
    /* ignore MONITOR if already slave or in monitor mode */
    if (c->flags & CLIENT_SLAVE) return;

    c->flags |= (CLIENT_SLAVE|CLIENT_MONITOR);
    listAddNodeTail(server.monitors,c); //把整个客户端增加到server.monitors里面去
    addReply(c,shared.ok);
}

执行命令时:

void call(client *c, int flags) {
    long long dirty, start, duration;
    int client_old_flags = c->flags;

    /* Sent the command to clients in MONITOR mode, only if the commands are
     * not generated from reading an AOF. */
    if (listLength(server.monitors) &&
        !server.loading &&
        !(c->cmd->flags & (CMD_SKIP_MONITOR|CMD_ADMIN)))
    {
        replicationFeedMonitors(c,server.monitors,c->db->id,c->argv,c->argc); //把命令发到client去
    }

可以根据info来查询输出list的最大值。或者根据client list也可以排查问题

client_longes_output_list信息获取的方法,遍历所有client,获许对应的值,取最大值。

void getClientsMaxBuffers(unsigned long *longest_output_list,
                          unsigned long *biggest_input_buffer) {
    client *c;
    listNode *ln;
    listIter li;
    unsigned long lol = 0, bib = 0;

    listRewind(server.clients,&li);
    while ((ln = listNext(&li)) != NULL) {
        c = listNodeValue(ln);

        if (listLength(c->reply) > lol) lol = listLength(c->reply);
        if (sdslen(c->querybuf) > bib) bib = sdslen(c->querybuf);
    }
    *longest_output_list = lol;
    *biggest_input_buffer = bib;
}

redis analyst (6)- simple performance test for mini cluster

其实测试这种分布式系统意义不是非常大,因为通过监控metric发现性能瓶时,直接增加节点即可,在很大范围内,节点数不是非常多的话,基本上处理能力呈线形增长,直接引用源作者的话:

“High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.”

但是为什么要做一个简单的性能测试,因为第一次部署redis cluster,到底部署什么样的规模才能满足需求是无法回避的问题。所以还是简单测试下,为什么称为简单测试,因为实际环境中的软硬件配置、实际应用操作的读写比例、读写内容大小、读写波动范围、读写的机器数、线程数等实际场景,远不是本地可以简单真实模拟的。所以这里只是简单测试下(关注点在速度),以有个感性的认识,以回答部署需要什么样的规模等问题。


127.0.0.1:7001>cluster nodes
87903653548352f6c3fdc0fa1ad9fc68de147fcd 10.224.2.142:7001 myself,slave 798f74b21c15120517d44bacfc7b5319b484244b 0 0 2 connected
3892d1dfa68d9976ce44b19e532d9c0e80a0357d 10.224.2.141:7001 slave b2b98976dfbc9f85bf714b385e508b3441c51338 0 1517983027812 17 connected
798f74b21c15120517d44bacfc7b5319b484244b 10.224.2.145:7001 master - 0 1517983028313 5 connected 5461-10922
b2b98976dfbc9f85bf714b385e508b3441c51338 10.224.2.144:7001 master - 0 1517983027813 17 connected 10923-16383
b71b412857c43e05a32a796fbc0de2e7d667cb67 10.224.2.143:7001 slave 912a1efa1f4085b4b7333706e546f64d16580761 0 1517983026306 6 connected
912a1efa1f4085b4b7333706e546f64d16580761 10.224.2.146:7001 master - 0 1517983026809 6 connected 0-5460

 

Redis Cluster  Node number Mini Redis cluster (6 nodes:  3*2 )
 Configure  Hardware  Cpu: 2 * Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

Memory:  4G(3924392)

 Software  #disable save cache content to disk
save “”
appendonly no# memory limit
maxmemory 1G
maxmemory-policy allkeys-lru
 make sure never reach max memroy
 Test Machine   Node number  15
 Configure Hardware configures Cpu: 2 * Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

Memory:  8G(8180636)

Software configures Jedis connection pool :

DEFAULT_MAX_TOTAL = 8

 Test method  Operation  write Test1: 10(machine) * 50(thread per machine) * 20K(write operation)= 10000K

Test2: 15(machine) * 50(thread per machine) * 20K(write operation)= 15000K

 String set(final String key, final String value)(1) key size: about 30

(2) value: size:

about 30

read  Test1: 10(machine) * 50(thread per machine) * 20K( read operation)= 10000K. data base existed 10000K records

Test2: 15(machine) * 50(thread per machine) * 20K( read operation)= 15000K,  data base exixted 15000K records

String get(final String key)

(1) no hit

(2) key size:about 30

  Test result  OPS  Read  100K
 Write  50K
 Success Ratio Read  100%
 Write  100%

通过下图可知:测试中的key分布比较均匀,所有的压力平坦到3个master中,虽然测试的总tps很高,但是由于三个人干活,导致每个节点其实也还好,凸显分布式优点。

operation per second

测试结论: redis cluster的最小集(3*2),单纯考虑速度,不考虑容量限制,已经能支持很高TPS的需求了,另外需要注意的事,这是理想的测试,所以实际评估中仅供参考。

题外: 将读写测试1:1混合,会怎样,将两种测试同时并发执行,还能达到10万/S读,5万每S写么?测试结果如下:

总的OPS,只能达到45Kops, 然后读写吞吐量基本差不多1:1。这种数据比单纯只测一种更具有实际参考价值。

concurrent write/read

redis analyst (5)- redis configure/os configure

1 OS Configure:

正常启动后会有一些warning需要解决,这需要去修改操作系统的参数:

13538:M 16 Sep 23:37:48.481 * Increased maximum number of open files to 10032 (it was originally set to 1024).
13538:M 16 Sep 23:37:48.485 # Not listening to IPv6: unsupproted
13538:M 16 Sep 23:37:48.486 * Node configuration loaded, I'm 41a1d429a927a35e70c80a8945549ac0bf390c6d
13538:M 16 Sep 23:37:48.489 # Not listening to IPv6: unsupproted
13538:M 16 Sep 23:37:48.490 # Server started, Redis version 3.2.8
13538:M 16 Sep 23:37:48.490 # &lt;span style="color: #ff0000;"&gt;WARNING&lt;/span&gt; overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
13538:M 16 Sep 23:37:48.490 # &lt;span style="color: #ff0000;"&gt;WARNING&lt;/span&gt; you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never &gt; /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.

主要有以下一些参数需要修改:

echo "ulimit -n 65535"
ulimit -n 65535
ulimit -n

echo "change vm.overcommit"
sed -i "/vm.overcommit_memory/d" /etc/sysctl.conf
echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf
sysctl vm.overcommit_memory=1
	  
echo "Disable Transparent Huge Pages (THP) support "
echo never > /sys/kernel/mm/transparent_hugepage/enabled
sed -i "/transparent_hugepage/d" /etc/rc.local
echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local

2 Redis Configure:

https://github.com/antirez/redis/blob/3.2/redis.conf

redis3.2大约50+配置,如果不设置,也会有默认值,例如对于slave-priority:


#define CONFIG_DEFAULT_SLAVE_PRIORITY 100 	

server.slave_priority = CONFIG_DEFAULT_SLAVE_PRIORITY;


 else if (!strcasecmp(argv[0],"slave-priority") && argc == 2) {
            server.slave_priority = atoi(argv[1]);

所以可以只设置需要设置的,其他不显示设置,这样配置会少很多。

原始默认配置:

bind 127.0.0.1
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
supervised no
pidfile /var/run/redis_6379.pid
loglevel notice
logfile ""
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir ./
slave-serve-stale-data yes
slave-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
slave-priority 100
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

2.1 关闭rdb:
shutdown后:
[root@redis005 redis]# ll
total 8
drwxr-xr-x 2 wbx-redis wbx-group 4096 Sep 12 12:13 bin
-rw-r—– 1 root root 76 Sep 12 12:24 dump.rdb
[root@redis005 redis]# more dump.rdb
REDIS0007 redis-ver3.2.8
redis-bitseúѷused-memx4

重新启动后:
[root@redis005 redis]# ll
total 34056
drwxr-xr-x 2 wbx-redis wbx-group 4096 Sep 12 12:13 bin
-rw——- 1 root root 34866203 Sep 13 14:52 dump.rdb

5148:S 13 Sep 14:52:29.035 * Full resync from master: e7d510571e12e91919a06113f5ebe75bb41e62c5:1
5148:S 13 Sep 14:52:32.091 * MASTER SLAVE sync: receiving 34866203 bytes from master
5148:S 13 Sep 14:52:33.853 * MASTER SLAVE sync: Flushing old data
5148:S 13 Sep 14:52:33.853 * MASTER SLAVE sync: Loading DB in memory
5148:S 13 Sep 14:52:35.298 * MASTER SLAVE sync: Finished with success
5148:S 13 Sep 14:52:35.801 - DB 0: 338506 keys (0 volatile) in 524288 slots HT.
5148:S 13 Sep 14:52:35.801 - 1 clients connected (0 slaves), 96852208 bytes in use
5148:S 13 Sep 14:52:39.615 - Accepted 127.0.0.1:20931

3 open ports:

假设redis.conf中配置:
port 7001

则另外的bus port为 10000+7001

The Cluster bus
Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port is at a fixed offset from the normal TCP port used to receive incoming connections from clients. To obtain the Redis Cluster port, 10000 should be added to the normal commands port. For example, if a Redis node is listening for client connections on port 6379, the Cluster bus port 16379 will also be opened.

那么netstat结果:

[root@redis001 ~]# netstat -nap|grep redis
tcp        0      0 0.0.0.0:17001               0.0.0.0:*                   LISTEN      29780/redis-server  
tcp        0      0 0.0.0.0:7001                0.0.0.0:*                   LISTEN      29780/redis-server  
tcp        0      0 10.224.2.141:35973          10.224.2.145:17001          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:17001          10.224.2.144:48823          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:17001          10.224.2.145:28201          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:48422          10.224.2.144:7001           ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:37238          10.224.2.143:17001          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:17001          10.224.2.146:40126          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:20402          10.224.2.146:17001          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:47831          10.224.2.142:17001          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:17001          10.224.2.143:35657          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:26000          10.224.2.144:17001          ESTABLISHED 29780/redis-server  
tcp        0      0 10.224.2.141:17001          10.224.2.142:59406          ESTABLISHED 29780/redis-server