I remember first using Cassandra in production in 2012. I think the version was 0.7. There was no CQL interface and I had to do everything from the Cassandra CLI (Command Line Interface). I asked for access to the system and I was provided a link to OpsCenter and an IP address for one of the nodes. My first question was what is my login? Silly question! This was a sub 1.0 version after all. Anyone with the IP address to the node could get unrestricted, clear text access to customer data in Cassandra. Things have changed a lot since then.
DSE Security Options
The current version (DSE 4.8 and OpsCenter 5.2 as of this blog) of DSE contains a host of security features:
- Cassandra Authentication
- Cassandra Object Authorization
- Data Encryption
- At Rest
- In Flight
- Client to Node
- Node to Node
- Audit Logging
- Encryption of Sensitive Configuration Values
- OpsCenter Authentication
- Cassandra Authentication
- OpsCenter Role Based Authorization
- OpsCenter HTTPS
- OpsCenter SSL
I recently worked for a client who wanted to enable as much of the above security as possible. I very quickly learned a lot about certificates and that Secure Socket Layer (SSL) encrypted communication is very complicated in distributed systems. The complexity is that every Cassandra node can be a client and a server and every node communicates with every other node. Additionally, there are other sets of components communicating on the cluster that also need to be encrypted. That is a LOT of pipes to encrypt and is illustrated below.
... and the blowup for the OpsCenter Agent SSL connections.
I know that the diagrams are very busy - busy is the point. On top of all that, this is a simple diagram that only illustrates the following:
- 4 Cassandra Nodes
- 1 Cassandra Client
- 1 OpsCenter Web Client
- 1 OpsCenter Machine
- 4 OpsCenter Agents (One on each node)
- OpsCenter collects data from one cluster and stores it in another
In a real production system there is likely to be many more nodes, many more clients and multiple data centers. With this simple example we have at least 39 encrypted connections:
Let me know if you find any more!
I explicitly list out each directional connection because it determines how you setup for certificates and keys for SSL encryption. It depends on who is the client and who is the server for the communication. My point is to illustrate that there are a lot of communication channels that are impacted by SSL encryption. There are a lot of moving parts which makes it difficult to configure, maintain and troubleshoot when something goes wrong.
I am by no means a security expert; however, I have learned enough about how it works to set it up in our cluster. Below is a simple glossary to get us started.
- Secure Socket Layer (SSL): A protocol that supports encrypted communications between a client and a server.
- Client: The machine that initiates communication.
- Server: The machine that responds to the client request.
- Certificate: A certificate is a file that contains a public key and identifier.
- Public Key: The key made available by a server to all clients that will access it.
- Private Key: A key kept confidential to the owner.
- Keystore: A file that contains private keys and certificates with their corresponding public keys.
- Truststore: A file that contains certificates with their corresponding public keys.
The general process for establishing encrypted connections is as follows:
- A certificate is created on a server. The certificate is self-signed which eliminates the need for a 3rd party Certificate Authority.
- The public key and certificate are exported from the server and distributed to all clients.
- The server's public key is imported into the client's truststore.
- The client contacts the server and establishes protocol standards.
- The server responds with its public key and certificate.
- The client verifies that it trusts the server by verifying the certificate against its truststore.
- The client generates a random session key for encryption, and encrypts it using the public key and sends it to the server.
- The server decrypts the session key using its private key.
- The client sends a request to the server. The request is encrypted with the session key.
- The server decrypts the message using the session key.
- The server responds to the client. The response is encrypted with the session key.
- The client decrypts the message using the session key.
In the end, the client did not go live with the Proof of Concept because they realized that they had a few more decisions to make around security. For example, what impact does SSL have on the transaction throughput of the application and how is this impacted by intra and inter datacenter communications. In addition, the infrastructure team had to figure out how to manage certificates in a distributed environment. However, it was most definitely a learning experience to get all of the connections in DSE encrypted with SSL. Below is a highlight of lessons learned.
- cqlsh uses port 9042. It does not use port 9160 as stated in the documentation.
- The encrypted LDAP password for OpsCenter cannot contain "$". It will not be processed properly by OpsCenter even if you escape the value. There is no fix other than changing the password.
- The cassandra-stress tool does not work with SSL. There are two known bugs: CASSANDRA-9325 and CASSANDRA-10445.
- You must enable SSL client-to-node encryption on both clusters if you are monitoring one cluster with OpsCenter and storing data in a separate cluster. This is another known issue that you will not find documented anywhere.
- Maintaining SSL certificates is a fairly maintenance intensive process because it is decentralized security. Consider using Kerberos which provides centralized security.
- Make sure to change the default expiration time for certificates if you are following the DataStax instructions. If you don't you will end up redoing all your work in 90 days.
- Consider creating one keystore and truststore and distributing it to all nodes. If this is not acceptable from a security perspective, then maintain one truststore that you update as you add new nodes and then distribute the truststore to all nodes.
- Don't forget that all your application clients have to be configured to use SSL or they will not be able to connect to the cluster. This includes cqlsh, OpsCenter and OpsCenter Agents.