Browsing Category

Cloud Strategy

Cloud Strategy, Practical examples

App scaling with operational excellence

June 14, 2020

This is a continuation of the app scaling series showing motivations, clear ways, trade-offs, and pitfalls for a cloud strategy. The last post is an overview of why to prepare and how to start. It can be found here.

As soon as your application starts scaling and more actions are needed everyday to evolve the app to the new scenarios, the automation will come in handy and also in need.

Benefits of automating infrastructure

  • Fastest-possible solution for deploying a new workflow environment. It saves you time for deploying multiple environments like PROD, DEV, and QA.
  • Ensures PROD, QA, and DEV are exactly the same. This will help your engineers to narrow problems and solve issues faster.
  • Immutable infrastructure – the old dark days when nobody knew how a server was still working is gone. With immutable infrastructure, stop using human interference to fix things, and use it only to hot-fixes.
  • Define your workflows as code. Code is more reliable than anyone’s memory.
  • Easily track changes over time (you also achieve more coverage for auditing with this step).
  • Your infrastructure-as-code is documentation you can review and ask for support if needed.

Automating infrastructure

There are a few tools to automate infrastructure creation and each cloud provider has its own. CloudFormation on AWS, Resource Templates on Azure, and Cloud Deployment on Google. But you may be in a organization that wants as least lock-in as possible due to past experience. Then HashiCorp’s stack, specifically Terraform here, comes in handy.

Use cases + Tactics

  1. Saving costs with environments – have your DEV and QA environments shut down at the end of every day to save costs on cloud.
  2. Watching infrastructure – since your infrastructure may change during execution (upscaling, downscaling, termination, etc), you can have a job looking for specific parts of your app that should be kept in a certain configuration. Example: for scaling systems, you can use Terraform to always have one instance pre-heated and ready to be attached to an auto-scaling (AS) group when the application requires, instead of waiting for the time of warm-up of every instance’s configuration. Once you app needs to scale, that instance will be added to the AS group, and some time after that Terraform will provision a new instance proactively for when the load suffer another spike.
  3. Configuration management – applying the immutable infrastructure concept here, you will have one single source of truth for your environment. Example: you had to apply one hot-fix in production to prevent an error to happen. Right after the hot-fix, update your infrastructure-as-code to include that fix so you won’t forget to replicate it to new environments.
  4. Orchestration – let’s say you have your infrastructure primarily on AWS but are using Google for ML. Terraform will orchestrate the creation of them all for you. It saves you the time of going to each cloud and activating CloudFormation, Cloud Deployment, and so on.
  5. Security and Compliance – having your infrastructure as code will make easier for your team to ensure they are following all the security and compliance definitions. The code is also versionable, ensuring auditing capabilities.

Example with Terraform

The code found here will deploy the above infrastructure in a matter of few minutes. It is an example of Terraform provisioning the AWS best practices for infrastructure when still using EC2 instances as your option for computing.

Do not forget to add CloudFront and Route 53 to your stack if you are going to use it in a real environment.

Cloud Strategy, IT is business, Practical examples

App scaling with fast and reliable architecture

May 9, 2020

The pandemic reality gave a huge push on digital businesses. The companies succeeding right now have strong strategies for digital interaction. More on: McKinsey, McKinsey, The Economist, and CNN Business. But how are they scaling their business (and their applications) so fast in a reliable manner? Prepare your applications (aka our Digital Products) with a fast and reliable architecture. This is a paradigm shift for most companies.

Why should we prepare to scale

Costs. An application not prepared to scale accordingly to its demand will cost more to be kept than others. The cloud advantage must be used at most in this scenario. An example: the websites to buy tickets for concerts spend a huge portion of their time working under low or regular demand. But when a well-known group announces a new concert, thousands of people rush to their environment to get a ticket. (find a similar reading here). A useful analogy: if you don’t prepare your application to scale according to demand, it’s like you are always driving an RV even if you are just going to the supermarket instead of going to vacations. You don’t need to carry 5 or 6 people with you to go to the supermarket. As well as your app doesn’t need to be fully-armed at 3 am.

As we can see in the image below, people use the internet with different intensity according to time each day.

Seattle internet consumption on the first week of January 2020 (from CloudFlare)

Peeks of usage. Maintaining a portal with around 100 visits per day (like my own) is fine. But a different approach will be needed to another one with One Million views on the same timeframe. But more important than that, be prepared for peeks of usage to maintain the brand’s reliability and company growth. Zoom is an excellent successful example of application scaling. But they are the minority amid hundreds of bad examples that are impacting our lives. I.E: check New Jersey’s government ask for help with a very very old application).

How to prepare a fast and reliable architecture

Architecting for scalability

Use the advantages of the cloud’s existing tools. All cloud players have efficient tools for load balancing the application’s requests. Microsoft’s Load Balancer, Google Load Balancing, and AWS Elastic Load Balancing are very easy to set up. Once defined rules for load balancing, auto-scaling groups improve the application power to handle requests. Using the auto-scaling groups you can set different behaviors for your app. Both based on demand from users and also patterns you already know they exist (3 am driving an RV). If all of this is new for you, keep in mind that new solutions bring new challenges. Listed below are few things you have to take a look when setting up auto-scaling behavior:

  • Speed to startup a new server – When you need to scale you probably will need to scale FAST. To help with that, have pre-built images (AWS AMIs) to speed up your new servers boot time. Kubernetes orchestrating your app will also help with this.
  • Keeping database consistency – Luckily the big cloud players have solutions to keep the databases synchronized between your different availability zones almost seamless. But once you start working with multiple regions, this will become one more thing to establish a plan and handle.
  • Keep low latency between different regions – Multiple regions can solve latency for your users, but will bring the latency to you. Once again talking about multiple regions (either if you are building a disaster/recovery plan or just moving infrastructure closer to your users to reduce latency). The latency between regions has to be mitigated both on databases and on your app communications.

The attention points above pay-off. Once you have all set, the cloud can keep itself. Looking for alerts on CPU, memory, network, and other usages, and triggering self-healing actions will be part of its day.

Architecting for reliability

To increase your app reliability, I list two good strategies to apply:

  • On the infrastructure and the app level. Adding several layers of tests and health checks is the most basic action for reliability.
  • Architecting for multi-region. Using pilot-light (slower), passing by warm standby and active/active multi-region (faster) architecture solutions for failover and disaster/recovery plans are good approaches. The faster one (active/active) requires the same infrastructure to be deployed exactly the same in two regions. Also, an intelligent DNS routing rule has to be set.
  • Reducing risk with infrastructure deploy automation. Examples of services like CloudFormation (AWS), Resource Templates (MS), and Cloud Deployment (Google). It helps you to create a single point of infrastructure description to be used across multiple regions.

Architecture is a living subject, just like digital products are. Looking for scalability and reliability on the same environment will make you achieve a fast and reliable architecture.