how to decide how many Cassandra nodes for deploy

每次都有评估产线需要多少Cassandra节点的需求,所以写几个简单的公式:

For Write:

Peak WPS For Application  * (Applications keyspace replicator factor / Total Cassandra Nodes Per DC)  >  Max WPS Capacity For Cassandra Single Node:  

For Read:

Precondition:

(1) Read Consistency level Take: LocalQuorum

(2) Every table take default configure:

Peak QPS For Application  * ((Applications keyspace replicator factor * 0.1 + (Applications keyspace replicator factor/2+1)*0.9 ) / Total Cassandra Nodes Per DC)  >  Max QPS Capacity For Cassandra Single Node:   

Take example:

For One Application On 7 nodes.

For Write:

Peak WPS For Application * (3 / 7 )  <  Max WPS Capacity For Cassandra Single Node:  

For Read:

Peak QPS For Application  * (2.1 / 7 )  <  Max QPS Capacity For Cassandra Single Node:  

In Short
So if deployed new Casandra DC:

The Cassandra node numbers > max (3* Peak WPS For Application/Max WPS Capacity For Cassandra Single Node, 2.1* Peak QPS For Application/Max QPS Capacity For Cassandra Single Node)

redis analyst (2)- use cluster

1 At least 6 nodes are required(3 masters and 3 slaves):

if use followed configure

# Settings
PORT=30000
TIMEOUT=2000
NODES=4 //all nodes number include master
REPLICAS=1

it will popup error:

[root@wbxperf001 create-cluster]# ./create-cluster start
Starting 30001
Starting 30002
Starting 30003
Starting 30004
[root@wbxperf001 create-cluster]# ./create-cluster create
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 4 nodes and 1 replicas per node.
*** At least 6 nodes are required.

root cause:
redis-trib.rb

    def check_create_parameters
        masters = @nodes.length/(@replicas+1)
        if masters < 3
            puts "*** ERROR: Invalid configuration for cluster creation."
            puts "*** Redis Cluster requires at least 3 master nodes."
            puts "*** This is not possible with #{@nodes.length} nodes and #{@replicas} replicas per node."
            puts "*** At least #{3*(@replicas+1)} nodes are required."
            exit 1
        end
    end

2 pkill master, slave will on, then start the old master, it will be slave

[root@wbxperf001 ~]# cluster nodes
1fbaae0a5e98f2f4bb299139ffd126811d68fbbe 127.0.0.1:<span style="color: #0000ff;">30005</span> <span style="color: #0000ff;">slave</span> b067d0613418238a688f18ebd6a8e3612c1bb54b 0 1496327104008 5 connected
d8fd8928d10e11c9eddb0e80f40a5345f092d40c 127.0.0.1:30004 slave 7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 0 1496327104008 4 connected
c5a97527cdaa550d89e7282c0e888ea3b0d53a29 127.0.0.1:30006 slave b934ab47ef37a374498067f864a067a36674debc 0 1496327103909 6 connected
b934ab47ef37a374498067f864a067a36674debc 127.0.0.1:30003 myself,master - 0 0 3 connected 10923-16383
7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 127.0.0.1:30001 master - 0 1496327104009 1 connected 0-5460
b067d0613418238a688f18ebd6a8e3612c1bb54b 127.0.0.1:<span style="color: #ff0000;">30002</span> <span style="color: #ff0000;">master</span> - 0 1496327104008 2 connected 5461-10922

[root@wbxperf001 ~]# ps -ef|grep redis
root 9590 1 0 06:22 ? 00:01:33 ../../src/redis-server *:30001 [cluster]
root <span style="color: #ff0000;">9592</span> 1 0 06:22 ? 00:01:31 ../../src/redis-server *:<span style="color: #ff0000;">30002</span> [cluster]
root 9598 1 0 06:22 ? 00:01:32 ../../src/redis-server *:30003 [cluster]
root 9602 1 0 06:22 ? 00:01:32 ../../src/redis-server *:30004 [cluster]
root 9606 1 0 06:22 ? 00:01:27 ../../src/redis-server *:30005 [cluster]
root 9610 1 0 06:22 ? 00:01:31 ../../src/redis-server *:30006 [cluster]

[root@wbxperf001 ~]# kill -1 <span style="color: #ff0000;">9592</span>

[root@wbxperf001 ~]# cluster nodes
1fbaae0a5e98f2f4bb299139ffd126811d68fbbe 127.0.0.1:<span style="color: #0000ff;">30005 master </span>- 0 1496327180237 7 connected 5461-10922
d8fd8928d10e11c9eddb0e80f40a5345f092d40c 127.0.0.1:30004 slave 7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 0 1496327180237 4 connected
c5a97527cdaa550d89e7282c0e888ea3b0d53a29 127.0.0.1:30006 slave b934ab47ef37a374498067f864a067a36674debc 0 1496327180237 6 connected
b934ab47ef37a374498067f864a067a36674debc 127.0.0.1:30003 myself,master - 0 0 3 connected 10923-16383
7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 127.0.0.1:30001 master - 0 1496327180237 1 connected 0-5460
b067d0613418238a688f18ebd6a8e3612c1bb54b 127.0.0.1:<span style="color: #ff0000;">30002</span> <span style="color: #ff0000;">master</span>,<span style="color: #ff0000;">fail</span> - 1496327177426 1496327177227 2 disconnected

//after startup 30002, it will always be slave.

so if want to set it as master instead of keep slave, use cluster failover:


[root@wbxperf001 src]# ./redis-cli  -p 30002
127.0.0.1:30002> cluster failover

127.0.0.1:30002> cluster nodes
c5a97527cdaa550d89e7282c0e888ea3b0d53a29 127.0.0.1:30006 slave b934ab47ef37a374498067f864a067a36674debc 0 1496328987578 6 connected
b934ab47ef37a374498067f864a067a36674debc 127.0.0.1:30003 master - 0 1496328987578 3 connected 10923-16383
7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 127.0.0.1:30001 master - 0 1496328987578 1 connected 0-5460
b067d0613418238a688f18ebd6a8e3612c1bb54b 127.0.0.1:30002 myself,master - 0 0 8 connected 5461-10922
d8fd8928d10e11c9eddb0e80f40a5345f092d40c 127.0.0.1:30004 slave 7cf0fd6e2dce793161f0764ff7ef6e7f83b54c05 0 1496328987578 4 connected
1fbaae0a5e98f2f4bb299139ffd126811d68fbbe 127.0.0.1:30005 slave b067d0613418238a688f18ebd6a8e3612c1bb54b 0 1496328987578 8 connected

3 if one of masters down, if cluster down dependent on parameter: cluster-require-full-coverage

cluster-require-full-coverage <yes/no>: If this is set to yes, as it is by default, the cluster stops accepting writes if some percentage of the key space is not covered by any node. If the option is set to no, the cluster will still serve queries even if only requests about a subset of keys can be processed.

4 Cluster doesn’t support ?
(1) multi-key operation.
(2) select db, just use db 0:

Redis Cluster does not support multiple databases like the stand alone version of Redis. There is just database 0 and the SELECT command is not allowed.

5 some command can’t run any nodes:

127.0.0.1:7001> cluster failover
(error) ERR You should send CLUSTER FAILOVER to a slave

6 is redis-benchmark support cluster test?

no. but if using default parameter with -e. It won’t popup any error, so you may believe it can support cluster’s test

" -e                 If server replies with errors, show them on stdout.\n"
"                    (no more than 1 error per second is displayed)\n"

it is interesting for the error number control.

                if (config.showerrors) {
                    static time_t lasterr_time = 0;
                    time_t now = time(NULL);
                    redisReply *r = reply;
                    if (r->type == REDIS_REPLY_ERROR && lasterr_time != now) {
                        lasterr_time = now;
                        printf("Error from server: %s\n", r->str);
                    }
                }

7 cluster上任意节点执行keys *

返回的是当前节点负责的数据区域(slots),而不是整个cluster的数据。

另外set key value时,如果当前的数据不应该由自己负责存储,则返回moved error异常:

127.0.0.1:7001> keys *
1) "fujian1234"
127.0.0.1:7001> 
127.0.0.1:7001> 
127.0.0.1:7001> set fujian1234 value
(error) MOVED 15336 10.224.2.144:7001

加-c后可以自动redirect:
The redis-cli utility in the unstable branch of the Redis repository at GitHub implements a very basic cluster support when started with the -c switch.

[root@redis001 ~]# redis-cli  -p 7001 -a P@ss123 -c
127.0.0.1:7001> get fujian1234
-> Redirected to slot [15336] located at 10.224.2.144:7001
"value"

此外,在cluster slave上执行读请求,即使数据在所在的数据区域,也可能moved error.同样可以加-c,这种情况下,也可以执行readonly。

slave moved error:
[root@redis004 bin]# ./redis-cli  -p 7001 -a P@ss123 
127.0.0.1:7001> get fujian1234
(error) MOVED 15336 10.224.2.141:7001
127.0.0.1:7001> readonly
OK
127.0.0.1:7001> get fujian1234
"xinxiu"

Read queries against a Redis Cluster slave node are disabled by default, but you can use the READONLY command to change this behavior on a per- connection basis. The READWRITE command resets the readonly mode flag of a connection back to readwrite.
READWRITE Disables read queries for a connection to a Redis Cluster slave node.

所以引发另外一个问题的思考: 配置里面的slave_read-only和READONLY的关系和区别:

答案引用一段话:


Please take note that slave-read-only config refers to replication and READONLY refers to the redis-cluster command.

If you are not using redis-cluster, you can safely ignore the READONLY command documentation. Refer to https://raw.githubusercontent.com/antirez/redis/2.8/redis.conf instead. Writes should not replicate nor require lookups to the master. My wireshark dumps on redis with slave-read-only no shows no indication of any communication with master as a consequence of writes to the slave itself.

If you are using redis-cluster on the other hand, and referring to the READWRITE behavior: Cluster nodes' communication with each other for hash slot updates and other cluster specific messages are optimized to use minimal bandwidth and the least processing time. Communicating hash slot updates most likely do not happen for every write on the slave.

slave-read-only实际测试对redis cluster而言并无作用。

how to trouble shooting http request latency

Curl

1 Create one file such as curl-format.txt

[root@001 jiafu]# cat curl-format.txt 
 time_namelookup:  %{time_namelookup}\n
       time_connect:  %{time_connect}\n
    time_appconnect:  %{time_appconnect}\n
   time_pretransfer:  %{time_pretransfer}\n
      time_redirect:  %{time_redirect}\n
 time_starttransfer:  %{time_starttransfer}\n
                    ----------\n
         time_total:  %{time_total}\n

2 Send Curl command with file with format

[root@001 jiafu]#  curl -w "@curl-format.txt" -o /dev/null -s https://www.baidu.com  
 time_namelookup:  0.001
       time_connect:  0.044
    time_appconnect:  0.295
   time_pretransfer:  0.296
      time_redirect:  0.000
time_starttransfer:  0.345
                    ----------
         time_total:  0.345

3 Analyst the result

time_appconnect The time, in seconds, it took from the start until the SSL/SSH/etc connect/handshake to the remote host was completed. (Added in 7.19.0)

time_connect The time, in seconds, it took from the start until the TCP connect to the remote host (or proxy) was completed.

time_namelookup The time, in seconds, it took from the start until the name resolving was completed.

time_pretransfer The time, in seconds, it took from the start until the file transfer was just about to begin. This includes all pre-transfer commands and negotiations that are specific to the particular protocol(s) involved.

time_redirect The time, in seconds, it took for all redirection steps include name lookup, connect, pretransfer and transfer before the final transaction was started. time_redirect shows the complete execution time for multiple redirections. (Added in 7.12.3)

time_starttransfer The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.

time_total The total time, in seconds, that the full operation lasted.

tcpDump

如下图:

附上https过程:

traceroute

[sbalabrahman@host ~]$ traceroute www.baidu.com
traceroute to www.baidu.com (145.20.193.63), 30 hops max, 60 byte packets
1  sjc02-wxp00-csw02-vl4085.xxx.com (10.255.29.5)  2.085 ms  2.156 ms  2.289 ms
2  sjc02-wxp00-srt01-vl4000.yyy.com (10.255.3.242)  0.930 ms  1.084 ms  0.878 ms

jvm startup parameters for trouble shooting

总是忘,然后每次都是现查现用,所以汇总记录下吧:

1.enable heapdump:

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof -XX:OnOutOfMemoryError =”sh ~/cleanup.sh”

2.enable remote debug:

-Xdebug -Xrunjdwp:server=y,transport=dt_socket,address=4000,suspend=n

3.enable GC log:
-Xloggc:/logs/`date +%F_%H-%M-%S`-gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCCause
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=2M

4.enable JMX:
-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=8091 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.password.file=/conf/jmxremote.password -Dcom.sun.management.jmxremote.access.file=/conf/jmxremote.access

5.enable Jprofiler:
-agentpath:/opt/jprofiler/bin/linux-x64/libjprofilerti.so=port=8849

6.enable EMMA:
-Demma.rt.control=true -Xverify:none -Demma.coverage.out.file=/opt/codecoverage_emma.ec

known issues involved by MDC

使用MDC时,希望每个请求的日志都能绑定到对应的trackingid上,但是往往事与愿违,存在以下两种不可控情况:

(1)现象: 一些日志始终绑定某个trackingid

使用第三方或者其他人提供的包时,其他人采用的是异步线程去实现的,这个时候,第一个请求会触发第一个线程建立起来,而第一个线程的trackingid会和第一个请求一样(创建的线程的threadlocal会继承创建者):

Thread的构造器实现

        if (parent.inheritableThreadLocals != null)
            this.inheritableThreadLocals =
                ThreadLocal.createInheritedMap(parent.inheritableThreadLocals);

这样导致,只要这个线程一直存在,就一直是和第一个请求一致。

因为callable或runnable的task内容不是自己可以控制的范畴,导致再无机会去修改。

	private static final ExecutorService pool= Executors.newFixedThreadPool(3);

	public static final String checkAsync(String checkItem) {
  
			checkFuture= pool.submit(new Callable(checkItem) {
				public String call() throws Exception {
					......  //第三方库,无法修改,如果是自己库,直接MDC.put("TrackingID", trackingID)既可修改,或者更标准的搞法(slf4j支持):

“In such cases, it is recommended that MDC.getCopyOfContextMap() is invoked on the original (master) thread before submitting a task to the executor. When the task runs, as its first action, it should invoke MDC.setContextMapValues() to associate the stored copy of the original MDC values with the new Executor managed thread.”

				}
			});

代码示例的情况没有太大危险,因为线程一旦创建,就不会消亡,所以最多某个首次请求,查询到的日志特别多,后面的请求对不上号。但是如果某个线程池是有timeout回收的,则有可能导致很多次请求查询到的trackingid日志都特别多。

解决方案,不想固定死某个trackingid,则调用那个api前clean掉mdc里的trackingid,这样创建的线程就不会带有,即既然不属于我一个人,干脆放弃。调用完再找回。但是这样修改后,调用过程的log就都没有trackingid了。所以很难完美解决,要么有很多且对不上号的,要么一个都没有。

(2)现象:某个请求中,tracking中途丢失了或者变成别的了。

这是因为调用了第三方库,而第三库做了一些特殊处理,比如


	public String call(String checkItem) { 
              call(checkItem, null)
        }
	public String call(String checkItem, Map config) { 
                        String trackingID = config.get("TrackingID");
                        if(trackingID == null)
                              trackingID = "";
                        MDC.put("TrackingID", trackingID);  //因为没有显示trackingid来调用,导致后面的这段逻辑把之前设置的trackingid给清空了(="")。
                        ......
        }
        

解决方案: 方案(1)显式传入trackingid。而不是直接调用call(String checkItem); 方案(2)既然使用mdc,为什么不去check下mdc里面实现是不是有值,如果有,也算传入了,而不是直接覆盖掉。

以上问题很容易出现在第三方库的调用上,且如果不看代码,很难预知会出现什么清空或一直绑定某个。不管哪种情况,都要意识到所以使用mdc不是完美的,因为很多第三库的调用对于你而言都是不透明且不可修改的。