Redis Cluster 101
原文地址:https://raw.githubusercontent.com/antirez/redis/3.0/00-RELEASENOTES Redis 3.0 release notes --[ Redis 3.0.1 ] Release date: 5 May 2015 --[ Redis 3.0.0 ] Release date: 1 Apr 2015 --[ Redis 3.0.0 RC6 (version 2.9.106) ] Release date: 24 mar 2015 --[ Redis 3.0.0 RC5 (version 2.9.105) ] Release date: 20 mar 2015 --[ Redis 3.0.0 RC4 (version 2.9.104) ] Release date: 13 feb 2015 --[ Redis 3.0.0 RC3 (version 2.9.103) ] Release date: 30 jan 2015 --[ Redis 3.0.0 RC2 (version 2.9.102) ] Release date: 13 jan 2015 --[ Redis 3.0.0 RC1 (version 2.9.101) ] Release date: 9 oct 2014 官方地址:http://redis.io/topics/cluster-tutorial Redis cluster tutorialThis document is a gentle introduction to Redis Cluster,that does not use complex to understand distributed systems concepts. It provides instructions about how to setup a cluster,test,and operate it,without going into the details that are covered in the??but just describing how the system behaves from the point of view of the user. Note that if you plan to run a serious Redis Cluster deployment,the more formal specification is an highly suggested reading. Redis cluster is currently alpha quality code,please get in touch in the Redis mailing list or open an issue in the Redis Github repository if you find any issue. Redis Cluster 101Redis Cluster provides a way to run a Redis installation where data is?automatically sharded across multiple Redis nodes. Commands dealing with multiple keys are not supported by the cluster,because this would require moving data between Redis nodes,making Redis Cluster not able to provide Redis-alike performances and predictable behavior under load. Redis Cluster also provides?some degree of availability during partitions,that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. So in practical terms,what you get with Redis Cluster?
Redis Cluster data shardingRedis Cluster does not use consistency hashing,but a different form of sharding where every key is conceptually part of what we call an?hash slot. There are 16384 hash slots in Redis Cluster,and to compute what is the hash slot of a given key,we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible of a subset of the hash slots,so for example you may have a cluster with 3 nodes,where:
This allows to add and remove nodes in the cluster easily. For example if I want to add a new node D,I need to move some hash slot from nodes A,B,C to D. Similarly if I want to remove node A from the cluster I can just move the hash slots served by A to B and C. When the node A will be empty I can remove it from the cluster completely. Because moving hash slots from a node to another does not require to stop operations,adding and removing nodes,or changing the percentage of hash slots hold by nodes,does not require any downtime. Redis Cluster master-slave modelIn order to remain available when a subset of nodes are failing or are not able to communicate with the majority of nodes,Redis Cluster uses a master-slave model where every node has from 1 (the master itself) to N replicas (N-1 additional slaves). In our example cluster with nodes A,C,if node B fails the cluster is not able to continue,since we no longer have a way to serve hash slots in the range 5501-11000. However if when the cluster is created (or at a latter time) we add a slave node to every master,so that the final cluster is composed of A,C that are masters,and A1,B1,C1 that are slaves,the system is able to continue if node B fails. Node B1 replicates B,so the cluster will elect node B1 as the new master and will continue to operate correctly. However note that if nodes B and B1 fail at the same time Redis Cluster is not able to continue to operate. Redis Cluster consistency guaranteesRedis Cluster is not able to guarantee?strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will forget a write that was acknowledged by the system. The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. This means that during writes the following happens:
As you can see B does not wait for an acknowledge from B1,B2,B3 before replying to the client,since this would be a prohibitive latency penalty for Redis,so if your client writes something,B acknowledges the write,but crashes before being able to send the write to its slaves,one of the slaves can be promoted to master losing the write forever. This is?very similar to what happens?with most databases that are configured to flush data to disk every second,so it is a scenario you are already able to reason about because of past experiences with traditional database systems not involving distributed systems. Similarly you can improve consistency by forcing the database to flush data on disk before replying to the client,but this usually results into prohibitively low performances. Basically there is a trade-off to take between performances and consistency. Note: Redis Cluster in the future will allow users to perform synchronous writes when absolutely needed. There is another scenario where Redis Cluster will lose writes,that happens during a network partition where a client is isolated with a minority of instances including at least a master. Take as an example our 6 nodes cluster composed of A,A1,C1,with 3 masters and 3 slaves. There is also a client,that we will call Z1. After a partition occurs,it is possible that in one side of the partition we have A,and in the other side we have B and Z1. Z1 is still able to write to B,that will accept its writes. If the partition heals in a very short time,the cluster will continue normally. However if the partition lasts enough time for B1 to be promoted to master in the majority side of the partition,the writes that Z1 is sending to B will be lost. Note that there is a maximum window to the amount of writes Z1 will be able to send to B: if enough time has elapsed for the majority side of the partition to elect a slave as master,every master node in the minority side stops accepting writes. This amount of time is a very important configuration directive of Redis Cluster,and is called the?node timeout. After node timeout has elapsed,a master node is considered to be failing,and can be replaced by one if its replicas. Similarly after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes,it enters an error state and stops accepting writes. Creating and using a Redis ClusterTo create a cluster,the first thing we need is to have a few empty Redis instances running in?cluster mode. This basically means that clusters are not created using normal Redis instances,but a special mode needs to be configured so that the Redis instance will enable the Cluster specific features and commands. The following is a minimal Redis cluster configuration file:
As you can see what enables the cluster mode is simply the? Note that the?minimal cluster?that works as expected requires to contain at least three master nodes. For your first tests it is strongly suggested to start a six nodes cluster with three masters and three slaves. To do so,enter a new directory,and create the following directories named after the port number of the instance we'll run inside any given directory. Something like:
Create a? Now copy your redis-server executable,?compiled from the latest sources in the unstable branch at Github,into the? Start every instance like that,one every tab:
As you can see from the logs of every instance,since no?
This ID will be used forever by this specific instance in order for the instance to have an unique name in the context of the cluster. Every node remembers every other node using this IDs,and not by IP or port. IP addresses and ports may change,but the unique node identifier will never change for all the life of the node. We call this identifier simplyNode ID. Creating the clusterNow that we have a number of instances running,we need to create our cluster writing some meaningful configuration to the nodes. This is very easy to accomplish as we are helped by the Redis Cluster command line utility called? The?
The command used here is?create,since we want to create a new cluster. The option? Obviously the only setup with our requirements is to create a cluster with 3 masters and 3 slaves. Redis-trib will propose you a configuration. Accept typing?yes. The cluster will be configured and?joined,that means,instances will be bootstrapped into talking with each other. Finally if everything went ok you'll see a message like that:
This means that there is at least a master instance serving each of the 16384 slots available. Playing with the clusterAt this stage one of the problems with Redis Cluster is the lack of client libraries implementations. I'm aware of the following implementations:
An easy way to test Redis Cluster is either to try and of the above clients or simply the?
The redis-cli cluster support is very basic so it always uses the fact that Redis Cluster nodes are able to redirect a client to the right node. A serious client is able to do better than that,and cache the map between hash slots and nodes addresses,to directly use the right connection to the right node. The map is refreshed only when something changed in the cluster configuration,for example after a failover or after the system administrator changed the cluster layout by adding or removing nodes. Writing an example app with redis-rb-clusterBefore goign forward showing how to operate the Redis Cluster,doing things like a failover,or a resharding,we need to create some example application or at least to be able to understand the semantics of a simple Redis Cluster client interaction. In this way we can run an example and at the same time try to make nodes failing,or start a resharding,to see how Redis Cluster behaves under real world conditions. It is not very helpful to see what happens while nobody is writing to the cluster. This section explains some basic usage of redis-rb-cluster showing two examples. The first is the following,and is the
The application does a very simple thing,it sets keys in the form?
The program looks more complex than it should usually as it is designed to show errors on the screen instead of exiting with an exception,so every operation performed with the cluster is wrapped by? The?line 7?is the first interesting line in the program. It creates the Redis Cluster object,using as argument a list of?startup nodes,the maximum number of connections this object is allowed to take against different nodes,and finally the timeout after a given operation is considered to be failed. The startup nodes don't need to be all the nodes of the cluster. The important thing is that at least one node is reachable. Also note that redis-rb-cluster updates this list of startup nodes as soon as it is able to connect with the first node. You should expect such a behavior with any other serious client. Now that we have the Redis Cluster object instance stored in the?rc?variable we are ready to use the object like if it was a normal Redis object instance. This is exactly what happens in?line 11 to 19: when we restart the example we don't want to start again with? However note how it is a while loop,as we want to try again and again even if the cluster is down and is returning errors. Normal applications don't need to be so careful. Lines between 21 and 30?start the main loop where the keys are set or an error is displayed. Note the? Normally writes are slowed down in order for the example application to be easier to follow by humans. Starting the application produces the following output:
This is not a very interesting program and we'll use a better one in a moment but we can already try what happens during a resharding when the program is running. Resharding the clusterNow we are ready to try a cluster resharding. To do this please keep the example.rb program running,so that you can see if there is some impact on the program running. Also you may want to comment the? Resharding basically means to move hash slots from a set of nodes to another set of nodes,and like cluster creation it is accomplished using the redis-trib utility. To start a resharding just type:
You only need to specify a single node,redis-trib will find the other nodes automatically. Currently redis-trib is only able to reshard with the administrator support,you can't just say move 5% of slots from this node to the other one (but this is pretty trivial to implement). So it starts with questions. The first is how much a big resharding do you want to do:
We can try to reshard 1000 hash slots,that should already contain a non trivial amount of keys if the example is still running without the sleep call. Then redis-trib needs to know what is the target of the resharding,that is,the node that will receive the hash slots. I'll use the first master node,127.0.0.1:7000,but I need to specify the Node ID of the instance. This was already printed in a list by redis-trib,but I can always find the ID of a node with the following command if I need:
Ok so my target node is 97a3a64667477371c4479320d683e4c8db5858b1. Now you'll get asked from what nodes you want to take those keys. I'll just type? After the final confirmation you'll see a message for every slot that redis-trib is going to move from a node to another,and a dot will be printed for every actual key moved from one side to the other. While the resharding is in progress you should be able to see your example program running unaffected. You can stop and restart it multiple times during the resharding if you want. At the end of the resharding,you can test the health of the cluster with the following command:
All the slots will be covered as usually,but this time the master at 127.0.0.1:7000 will have more hash slots,something around 6461. A more interesting example applicationSo far so good,but the example application we used is not very good. It writes acritically to the cluster without ever checking if what was written is the right thing. From our point of view the cluster receiving the writes could just always write the key? So in the reids-rb-cluster repository,there is a more interesting application that is called? However instead of just writing,the application does two additional things:
What this means is that this application is a simple?consistency checker,and is able to tell you if the cluster lost some write,or if it accepted a write that we did not received acknowledgement for. In the first case we'll see a counter having a value that is smaller than the one we remember,while in the second case the value will be greater. Running the consistency-test application produces a line of output every second:
The line shows the number of?Reads and?Writes performed,and the number of errors (query not accepted because of errors since the system was not available). If some inconsistency is found,new lines are added to the output. This is what happens,for example,if I reset a counter manually while the program is running:
When I set the counter to 0 the real value was 144,so the program reports 144 lost writes (?commands that are not remembered by the cluster). This program is much more interesting as a test case,so we'll use it to test the Redis Cluster failover. Testing the failoverNote: during this test,you should take a tab open with the consistency test application running. In order to trigger the failover,the simplest thing we can do (that is also the semantically simplest failure that can occur in a distributed system) is to crash a single process,in our case a single master. We can identify a cluster and crash it with the following command:
Ok,so 7000,7001,and 7002 are masters. Let's crash node 7002 with the?DEBUG SEGFAULT?command:
Now we can look at the output of the consistency test to see what it reported.
As you can see during the failover the system was not able to accept 578 reads and 577 writes,however no inconsistency was created in the database. This may sound unexpected as in the first part of this tutorial we stated that Redis Cluster can lost writes during the failover because it uses asynchronous replication. What we did not said is that this is not very likely to happen because Redis sends the reply to the client,and the commands to replicate to the slaves,about at the same time,so there is a very small window to lose data. However the fact that it is hard to trigger does not mean that it is impossible,so this does not change the consistency guarantees provided by Redis cluster. We can now check what is the cluster setup after the failover (note that in the meantime I restarted the crashed instance so that it rejoins the cluster as a slave):
Now the masters are running on ports 7000,7001 and 7005. What was previously a master,that is the Redis instance running on port 7002,is now a slave of 7005. The output of the?
Manual failoverSometimes it is useful to force a failover without actually causing any problem on a master. For example in order to upgrade the Redis process of one of the master nodes it is a good idea to failover it in order to turn it into a slave with minimal impact on availability. Manual failovers are supported by Redis Cluster using the? Manual failovers are special and are safer compared to failovers resulting from actual master failures,since they occur in a way that avoid data loss in the process,by switching clients from the original master to the new master only when the system is sure that the new master processed all the replication stream from the old one. This is what you see in the slave log when you perform a manual failover:
Basically clients connected to the master we are failing over are stopped. At the same time the master sends its replication offset to the slave,that waits to reach the offset on its side. When the replication offset is reached,the failover starts,and the old master is informed about the configuration switch. When the clients are unblocked on the old master,they are redirected to the new master. Adding a new nodeAdding a new node is basically the process of adding an empty node and then moving some data into it,in case it is a new master,or telling it to setup as a replica of a known node,in case it is a slave. We'll show both,starting with the addition of a new master instance. In both cases the first step to perform is?adding an empty node. This is as simple as to start a new node in port 7006 (we already used from 7000 to 7005 for our existing 6 nodes) with the same configuration used for the other nodes,except for the port number,so what you should do in order to conform with the setup we used for the previous nodes:
At this point the server should be running. Now we can use?redis-trib?as usually in order to add the node to the existing cluster.
As you can see I used the?addnode?command specifying the address of the new node as first argument,and the address of a random existing node in the cluster as second argument. In practical terms redis-trib here did very little to help us,it just sent a? Now we can connect to the new node to see if it really joined the cluster:
Note that since this node is already connected to the cluster it is already able to redirect client queries correctly and is generally speaking part of the cluster. However it has two peculiarities compared to the other masters:
Now it is possible to assign hash slots to this node using the resharding feature of? Adding a new node as a replicaAdding a new Replica can be performed in two ways. The obivous one is to use redis-trib again,but with the --slave option,like this:
Note that the command line here is exactly like the one we used to add a new master,so we are not specifiying to which master we want to add the replica. In this case what happens is that redis-trib will add the new node as replica of a random master among the masters with less replicas. However you can specifiy exactly what master you want to target with your new replica with the following command line:
This way we assign the new replica to a specific master. A more manual way to add a replica to a specific master is to add the new node as an empty master,and then turn it into a replica using the? For example in order to add a replica for the node 127.0.0.1:7005 that is currently serving hash slots in the range 11423-16383,that has a Node ID 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e,all I need to do is to connect with the new node (already added as empty master) and send the command:
That's it. Now we have a new replica for this set of hash slots,and all the other nodes in the cluster already know (after a few seconds needed to update their config). We can verify with the following command:
The node 3c3a0c... now has two slaves,running on ports 7002 (the existing one) and 7006 (the new one). Removing a nodeTo remove a slave node just use the?
The first argument is just a random node in the cluster,the second argument is the ID of the node you want to remove. You can remove a master node in the same way as well,?however in order to remove a master node it must be empty. If the master is not empty you need to reshard data away from it to all the other master nodes before. An alternative to remove a master node is to perform a manual failover of it over one of its slaves and remove the node after it turned into a slave of the new master. Obviously this does not help when you want to reduce the actual number of masters in your cluster,in that case,a resharding is needed. Replicas migrationIn Redis Cluster it is possible to reconfigure a slave to replicate with a different master at any time just using the following command:
However there is a special scenario where you want replicas to move from one master to another one automatically,without the help of the system administrator. The automatic reconfiguration of replicas is called?replicas migration?and is able to improve the reliability of a Redis Cluster. Note: you can read the details of replicas migration in the (Redis Cluster Specification)[/topics/cluster-spec],here we'll only provide some information about the general idea and what you should do in order to benefit from it. The reason why you may want to let your cluster replicas to move from one master to another under certain condition,is that usually the Redis Cluster is as resistant to failures as the number of replicas attached to a given slave. For example a cluster where every master has a single replica can't continue operations if the master and its replica fail at the same time,simply because there is no other instance to have a copy of the hash slots the master was serving. However while netsplits are likely to isolate a number of nodes at the same time,many other kind of failures,like hardware or software failures local to a single node,are a very notable class of failures that are unlikely to happen at the same time,so it is possible that in your cluster where every master has a slave,the slave is killed at 4am,and the master is killed at 6am. This still will result in a cluster that can no longer operate. To improve reliability of the system we have the option to add additional replicas to every master,but this is expensive. Replica migration allows to add more slaves to just a few masters. So you have 10 masters with 1 slave each,for a total of 20 instances. However you add,3 instances more as slaves of some of your masters,so certain masters will have more than a single slave. With replicas migration what happens is that if a master is left without slaves,a replica from a master that has multiple slaves will migrate to the?orphaned?master. So after your slave goes down at 4am as in the example we made above,another slave will take its place,and when the master will fail as well at 5am,there is still a slave that can be elected so that the cluster can continue to operate. So what you should know about replicas migration in short?
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |