Published Tue 30th Jan 2024
This post is a continuation of a story about our road to Senceive.io. If you’ve missed the first part which covers the genesis of the project and challenges associated with its architecture, you can find it here.
A cloud-native architecture
Like practically all modern software, Senceive.io stands on the shoulders of giants. It depends on a wide variety of software services, frameworks, and packages, so that we can solve the problem in an efficient way.
We had set ourselves a goal for Senceive.io Cloud of maintaining data flow even through single datacentre outages and through the majority of system upgrades. Nobody likes having to schedule or receive notice of major downtime. Nor do they like seeing that a whole datacentre has gone offline for the last hour, taking down the service without warning!
Practically speaking, this requires most of the components in the system to operate in triplicate – everything from the load balancers connecting the servers to the internet through to the database containing the configuration you applied yesterday.
This is quite fiddly to manage; ensuring that servers are deployed in the correct locations with the correct resources, that they are talking to each other efficiently, and that the system recovers from an interruption correctly without intervention. Upgrades need to be very carefully managed to ensure that the replicas aren’t allowed to get too far out of step or out of sync with each other. The list goes on, and I’ll gladly admit we’re not experts at this level of detail.
Conceptual diagram of availability zones and how they relate to Regions, physical datacentres, and data/service replicas.
When taking a cloud-native architecture approach, the general industry approach to achieve these goals is to maximise the use of the ‘serverless’ platform services offered by most cloud hosts to minimise total operating cost.
The idea is that the service ‘just works’ natively on the host platform, so while it might cost a little more than manually setting up your own servers, fewer hours should need to be spent maintaining the service, reducing overall running costs. (Somewhat counter to this was an interesting article from Amazon Prime which was published earlier this year but I digress…)
If we were to take the maximum-Azure approach with Senceive.io, we would have very rapidly found ourselves building Senceive.io on top of products like including:
- Azure IoT Hub,
- Blob Storage,
- CosmosDB,
- Azure Functions,
As well as and other services which are either not available or too expensive when deployed on premises to be feasible for anything but the largest projects.
The most critical core component we needed to choose was the message broker. This sits at the core of Senceive.io and also forms the basis of the event stream API.
Using different brokers for the Cloud and Local products would have significantly complicated matters for Consumers (event stream clients), and probably would have also resulted in fragmentation where many would end up only supporting one or the other Senceive.io product. After evaluating a few different options, we settled on RabbitMQ.
RabbitMQ This is a very mature (v1.0 shipped in 2007) and widely used message broker which scales very well, from high performance, high availability three/five node clusters, down to single node embedded systems.
The main configuration database
The other major critical component to be considered was the main configuration database. This won’t see a high transactional load, and can afford to aim for eventual-consistency; more important that it is highly available to support data handling (message routing, device authentication, etc) at all times.
This database doesn’t face the outside world but does support the fairly complicated structures of the API. Using a different database for the Cloud and Local products would have increased the risk of subtle bugs due to differences in behaviour, some being minor, others requiring separate code paths in the API which would need to be tested separately.
In this case we settled on the MongoDB document database. Once again, this is a mature product (v1.0 shipped in 2009) which is widely used and well supported.
Critically for our cloud platform, both of these components are available in Platform-as-a-Service form from companies who are experienced with the underlying software and running highly available clusters. The companies we chose were also able to deploy in the same Azure region as the core processing components, eliminating network latency and traffic cost concerns.
The core processing components
Of course, Senceive.io is not just the off the shelf software packages; it’s also composed of a set of in-house developed components, which must also be highly available and scalable.
It was clear from very early in the design stages that deploying the components in containers within a Kubernetes cluster was going to be the most sensible (not to mention portable!) approach to deploying and maintaining the cloud system. The Azure Kubernetes Service product is well established, inexpensive, and minimises the need for us to spend time deploying and maintaining the cluster.
Simplified block diagram of Senceive.io Cloud showing internal components and hosted services
Developing and deploying containers is hardly a new approach and offers a huge amount of flexibility. We really take advantage of their isolated nature to use a mix of different languages and runtimes in the system: Golang containers running on a minimal ‘scratch’ image for maximum efficiency of the backend components which need to do most of the data processing work, mixed with Python containers for easier development of the API and management parts.
By keeping these containers as stateless as possible, scaling the system up is as simple as increasing the replica count on the Kubernetes configuration.
As a further example of a seemingly minor decision affecting portability between cloud and on-premises, instead of using an Azure-only static HTTP web hosting product to serve the web user interface application, we have decided to build a simple container for this purpose which can then be used in both cloud and local environments.
But Kubernetes is a cloudy thing, isn’t it?
One concern around basing the system on Kubernetes was packaging and shipping the Senceive.io Local product.
We recognise that not all of our customers will have staff experienced in Kubernetes administration. We also recognise that we need to be able to remotely support Senceive.io Local without direct remote administration access as it will usually be deployed on networks which are isolated from the public internet, or at least on networks where access is tightly restricted.
Simplifying support
To simplify support, we decided on a ‘batteries included’ approach to packaging.
The software bundle for Senceive.io Local needed to include all software packages required to get the system working on an air-gapped Linux machine, including the Kubernetes base platform and all other dependencies like RabbitMQ and MongoDB.
Scripts and documentation were needed to take any moderately-experienced Linux administrator through the process of set up and common troubleshooting topics.
Simplified block diagram of Senceive.io Local highlighting components in common with the Cloud version
To achieve these goals, we again looked to what others were doing.
There were certainly not many options available when we started development in 2019, and the one we chose and implemented originally (called ‘gravity’) happened to be deprecated some time before we reached market. This led us to rework the packaging arrangement based on another packaging system (‘kurl’) to ensure that we shipped a product we could properly support into the future. (Zarf looks like a new and interesting option in this space which we might keep an eye on as it matures.)
So in the end…
Just like every other product design, there are engineering compromises that have to be navigated.
I hope that we’ve managed to hit the right balance between the competing requirements of the cloud and local environments. By using components that are at home in both environments and taking advantage of the ecosystem continually growing around Kubernetes, we have been able to package up a full set of software which can work in any corporate or industrial networking environment.
And it goes without saying that we will be listening carefully to feedback from our customers so that we can learn from their experiences and continually improve Senceive.io for both Cloud and Local users.