I've been knee deep in caching lately since I've been looking into an issue revolving around AWS Elasticache and Enyim. I've learned a lot so I'll share some things. Let's start at the beginning -
Enyim is a client for memcached which is a cache server - basically an in-memory key value store with a timer on each value so it can be expired after a certain amount of time to preserve memory.
memcached itself doesn't support clustering but Enyim can be configured to allow clusters. So you can have 2 or more memcached instances on one or more servers and use Enyim to talk to them all as if they were one cache...sort of.
Since abstractions leak, Enyim as an abstraction of multiple cache nodes, leaks in its own way. When a node goes down, you get nothing back in the client. Well that's not true, you get a Success == false and StatusCode == null that's what you get. And that's the default values for the Result object. And when you do get that, it means a cache node went down in your cluster - but not to worry!
Interesting thing about those cache clusters and how Enyim manages to use them. Enyim uses a hashing algorithm (actually several you can select from,) which shards based on the key and the number of nodes in the cluster. It picks a different one based on the key. Additionally, provided you have the servers are added in the same order to the cache client, it will be the same no matter where you are calling from. You could be in any number of producers and consumers of the cache and it will pick the right node.
Let's say you've got 4 nodes and you have a key 'my data key'.
Let's say you use Enyim to Store("my data key", someData, ...) and hypothetically that key hashes to a match for node 3.
Now when your consumer of that data - could be on a different server in a different part of world for all it cares - calls Get("my data key"), that key will also hash to match server 3 and you'll get 'someData' in the response!
Internally, Enyim calls those nodes servers. It marks each server as IsActive when it makes a successful socket connection to the cache instance. When you use the Enyim CacheClient to make a call to the cache and the socket cannot connect, it marks the server/node as inactive (IsActive = false) and adds it to the dead server queue.
Don't worry though, it tries to reconnect to the servers in the dead queue. When the connection is re-established it automatically becomes available in the cache cluster again. There is a caveat when it fails though. When your client calls and the node is dead, the node isn't marked dead BEFORE the call, but after. It's up to you to test for those conditions (null StatusCode) and retry the call so that it WILL use the next node in the hashring.
In the same scenario above, let's say node 3 is unreachable. Well you could see how network connectivity could be an issue if they aren't co-located. In that case it could be down for only some clients but not others. Let's ignore that issue for this scenario and say the server is being rebooted for patching or something.
Here's what will happen...your consumer will make the first call and the key will resolve to node 3. Enyim will attempt to establish a connection to that node. When the socket connection times out it will mark the server as "dead". It will return Success == false and not set a StatusCode in the response object. The VERY next call will resolve to the next server for that cache key while the node is down.
With Elasticache in the mix, there's another layer of abstraction in play. Elasticache offers its own grouping and node discovery so that you don't have to specify each node address manually. This means you can add nodes without re-configuring the client. When using this option, you should definitely use Amazon's cluster configuration for Enyim, it handles the node discovery for you meaning you only configure 1 endpoint per cluster. This is a good thing, but what they haven't put in the mix is handling node failures on their end. Would be nice if they did, but they don't. So just think about that...
Still, when you are using cache clusters in Elasticache, the best way to handle a node reboot is to simply make the call and the amazing Enyim will do its alchemy and switch to a live node until the dead one is back up and running again. Works the same with adding and removing nodes too. It's all automagic!
When using Enyim, just put some retry logic in your code - preferably in your own wrapper class so you aren't repeating yourself everywhere and forget to do it somewhere.