Browsing Category

Cloud Strategy

Cloud Strategy, IT is business, Practical examples

Make your Jenkins as code and gain speed

November 8, 2020

TL;DR: example of Jenkins as code. There’s a step by step to configure your Jenkins as code using Ansible tool, and configuration-as-code plug-in at the end of the article. The final version of the OS will have Docker, Kubectl, Terraform, Liquibase, HAProxy (for TLS), Google SSO instructions, and Java installed for running the pipelines.

Why having our Jenkins coded?

One key benefit from having infrastructure and os level coded is the safety it gives to the software administrators. Think with me: what happens if your Jenkins stops working suddenly? What if something happens and nobody can log into it anymore? If these questions make you chill, let’s code our Jenkins!

What we will cover

  1. This article covers the tools presented in the image above:
    • Vagrant for local tests.
    • Packer tool for creating your SO image with your Jenkins ready to use
    • Ansible for installing everything you need on your SO image (Jenkins, Kubectl, Terraform, etc).
    • JCasC (Jenkins Configuration as Code) to configure your Jenkins after it is installed.
    • You can also find some useful content for the Terraform part here and here.

See all the code for this article here: https://github.com/guisesterheim/Jenkins-As-Code

Special thanks to many Ansible roles I was able to found on GitHub and geerlingguy for many of the playbooks we’re using here. 

1. How to run it

Running locally with Vagrant to test your configuration

The Vagrantfile is used for local tests only, and it is a pre-step before creating the image on your cloud with Packer

Vagrant commands:

  1. Have (1) Vagrant installed (sudo apt install vagrant) and (2) Oracle’s VirtualBox
  2. How to run: navigate to the root of this repo and run sudo vagrant up. After everything is complete, it will create a Jenkins accessible from your host machine at localhost:5555 and localhost:6666. This will create a virtual machine and will install everything listed on the Vagrantfile
  3. How to SSH into the created machine: run sudo vagrant ssh
  4. How to destroy the VM: run sudo vagrant destroy

Using packer to build your AMI or Az VM Image

Packer is a tool to create an OS image (VM on Azure OR AMI on AWS)

Running packer:

  1. packer build -var 'client_id=<client_id>' -var 'client_secret=<client_secret>' -var 'subscription_id=<subscription_id>' -var 'tenant_id=<tenant_id>' packer_config.json
  2. Once you have your AMI or Az VM Image created, go for your cloud console and create a new machine pointing to the newly created image.

Checkout the file packer_config.json to see how packer will create your SO image and Azure instructions for it

PS: This specific packer_config.json file is configured to create an image on Azure. You can change it to run on AWS if you have to.

2. Let’s configure our Jenkins as Code!

I’m listing here a few key configurations among the several you will find in each of these Ansible playbooks:

  1. Java version: on ansible_config/site.yml
  2. Liquibase version: on ansible_config/roles/ansible-role-liquibase/defaults/main.yml
  3. Docker edition and version
  4. Terraform version
  5. Kubectl packages (adding kubedm or minikube as an example) on ansible_config/roles/ansible-role-kubectl/tasks/main.yml
  6. Jenkins configs (I will comment further)
  7. HAProxy for handling TLS (https) (will comment further)

3. Configuring your Jenkins

Jenkins pipelines and credentials files

This Jenkins is configured automatically using the Jenkins plugin configuration as code. All the configuration is listed on file jenkins.yaml in this root. On that file, you can add your pipelines and credentials for those pipelines to consume. Full documentation and possibilities can be found here: https://www.jenkins.io/projects/jcasc/

Below is the example you will find on the main repo:

  1. You can define your credentials on block one. There are a few possible credential types here. Check them all on the plugin’s docs
  2. With this, we create a folder
  3. Item 3 creates one pipeline job as example fetching it from a private GitLab repo that uses the credentials defined in item 1

Jenkins configuration

The plugins that this Jenkins will have installed can be found at: ansible_config/roles/ansible-role-jenkins/defaults/main.yml. If you need to get your current installed plugins, you can find how-to here: https://stackoverflow.com/questions/9815273/how-to-get-a-list-of-installed-jenkins-plugins-with-name-and-version-pair

On the imag below we can see:

  1. Your hostname: change it to a permanent hostname instead of localhost once you are configuring TLS
  2. The plugins list you want to have installed on your Jenkins

You can change Jenkins default admin password on file ansible_config/roles/ansible-role-jenkins/defaults/main.yml attribute “jenkins_admin_password”. Check the image below:

  1. You can change admin user and password
  2. Another configuration you will change when activating TLS (https)

Jenkins’ configuration-as-code plug-in:

For JCasC to work properly, the file jenkins.yml in the project root must be added to Jenkins’ home (default /var/lib/jenkins/). This example has the keys to be used on pipelines and the pipelines as well. There are a few more options on JCasC docs.

Activating TLS (https) and Google SSO

  1. As shown on step “Jenkins Configuration”‘s images: Go for ansible_config/roles/ansible-role-jenkins/defaults/main.yml. Uncomment line 15 and change it to your final URL. Comment line 16
  2. Go for ansible_config/roles/ansible-role-haproxy/templates/haproxy.cfg. Change line 33 to use your final organization’s URL
  3. Rebuild your image with Packer (IMPORTANT! Your new image won’t work locally because you changed Jenkins configuration)
  4. Go for your cloud and deploy a new instance using your just created image
3.1 – TLS: Once you have your machine up and running, connect through SSH to perform the last manual steps: TLS and SSO Google authentication:
  1. Generate the .pem certificate file with the command cat STAR.mycompany.com.crt STAR.mycompany.com.key > fullkey.pem. Remember to remove the empty row that is kept inside the generated fullkey.pem between the two certificates. To look at the file use cat fullkey.pem
  2. Move the generated file to your running instance’s folder /home/ubuntu/jenkins/
  3. Restart HAProxy with sudo service haproxy restart

Done! Your Jenkins is ready to run under https with valid certificates. Just point your DNS to the running machine and you’re done.

3.2 – Google SSO:

  1. Log in to Jenkins using regular admin credentials. Go to “Manage Jenkins” > “Global Security”. Under “Authentication” select “Login with Google” and fill in like below:
  • Client id = client_id generated on your G Suite account.
  • Client secret = client_secret
  • Google Apps Domain = mycompany.com

PS: More information on how to generate a client ID and client secret on the plugin’s page: https://github.com/jenkinsci/google-login-plugin

Cloud Strategy, Practical examples

Build Azure Service Bus Queues using Terraform

September 12, 2020

TL;DR: 7 resources will be added to your Azure account. 1 – Configure Terraform to save state lock files on Azure Blob Storage. 2 – Use Terraform to create and keep track of your Service Bus Queues

You can find all the source code for this project on this GitHub repo: https://github.com/guisesterheim/TerraformServiceBusQueues

Azure Service Bus has two ways of interacting with it: Queues and Topics (SQS and SNS on AWS respectively). Take a look at the docs on the difference between them and check which one fits your needs. This article covers Queues only.

What are we creating?

The GRAY area on the image above shows what this Terraform repo will create. The retry queue automation on item 4 is also created by this Terraform. Below is how the information should flow in this infrastructure:

  1. Microservice 1 generates messages and posts them to the messagesQueue.
  2. Microservice 2 listens to messages from the Queue and process them. If it fails to process, post back to the same queue (for up to 5 times).
  3. If it fails for more than 5 times, post the message to the Error Messages Queue.
  4. The Error Messages Queue automatically posts back the errored messages to the regular queue after one hour (this parameter can be changed on file modules/queue/variables.tf)
  5. Whether there’s an error or success, Microservice 2 should always post log information to Logging Microservice

Starting Terraform locally

To keep track of your Infrastructure with Terraform, you will have to let Terraform store your tfstate file in a safe place. The command below will start Terraform and store your tfstate in Azure Blob Storage. Use the following command to start your Terraform repo:

terraform init \
    -backend-config "container_name=<your folder inside Azure Blob Storage>" \
    -backend-config "storage_account_name=<your Azure Storage Name>" \
    -backend-config "key=<file name to be stored>" \
    -backend-config "subscription_id=<subscription ID of your account>" \
    -backend-config "client_id=<your username>" \
    -backend-config "client_secret=<your password>" \
    -backend-config "tenant_id=<tenant id>" \
    -backend-config "resource_group_name=<resource group name to find your Blob Storage>"

If you don’t have the information for the variables above, take a look at this post to create your user for your Terraform+Azure interaction.

Should everything goes well you should get a screen similar to the one below and we are ready to plan our infrastructure deployment!

Planning your Service Bus deploy

The next step is to plan your deployment. Use the following command so Terraform can prepare to deploy your resources:

terraform plan \
     -var 'client_id=<client id>' \
     -var 'client_secret=<client secret' \
     -var 'subscription_id=<subscription id>' \
     -var 'tenant_id=<tenant id>' \
     -var-file="rootVars.tfvars" \
     -var-file="rootVars-<environment>.tfvars" \
     -out tfout.log

Some of the information above are the some as we used in Terraform init. So go ahead and copy them. The rest of them are:

  • -VAR-FILE – The first var file one has common variables for all our environments.
  • -VAR-FILE – The second var file has a specific value for the current environment. Take a look at the rootVars-<all>.tfvars files.
  • TFOUT.LOG – This is the name of the file to which Terraform will store the plan to achieve your Terraform configuration

Should everything goes well you’ll have a screen close to the one below and we’ll be ready to finally create your Service Bus Queues!

Take a look at the “outputs” section. These are the information Terraform is gonna retrieve us so our DEV team can use it.

Deploying your Service Bus infrastructure

All the hard work is done. Just run the command below and wait for about 10 minutes and your AKS will be running

terraform apply tfout.log

Once the deployment is done you should see a screen like this:

Once you are done you have the connection strings so the DEV team can configure the microservices to use your Queue.

To read more on how to integrate your applications to the Queues, Microsoft has the docs for Java, Node, PHP, and a few others.

Cloud Strategy, Practical examples

Build and configure an AKS on Azure using Terraform

September 9, 2020

TL;DR: 3 resources will be added to your Azure account. 1 – Configure Terraform to save state lock files on Azure Blob Storage. 2 – Use Terraform to create and keep track of your AKS. 3 – How to configure kubectl locally to set up your Kubernetes.

This article follows best practices and benefits of infrastructure automation described here. Infrastructure as code, immutable infrastructure, more speed, reliability, auditing and documentation are the concepts you will be helped to achieve after following this article.

You can find all the source code for this project on this GitHub repo: https://github.com/guisesterheim/TerraformAKS

Creating a user for your Azure account

Terraform has a good how to for you to authenticate. In this link you’ll find how to retrieve the following needed authentication data:

subscription_id, tenant_id, client_id, and client_secret.

To find the remaining container_name, storage_account_name, key and resource_group_name, create your own Blob Storage container in Azure. And use the names as the suggestion below:

  • The top red mark is your storage_account_name
  • In the middle you have your container_name
  • The last one you have your key (file name)

Starting Terraform locally

To keep track of your Infrastructure with Terraform, you will have to let Terraform store your tfstate file in a safe place. The command below will start Terraform and store your tfstate in Azure Blob Storage. So navigate to folder tf_infrastructure and use the following command to start your Terraform repo:

terraform init \
    -backend-config "container_name=<your folder inside Azure Blob Storage>" \
    -backend-config "storage_account_name=<your Azure Storage Name>" \
    -backend-config "key=<file name to be stored>" \
    -backend-config "subscription_id=<subscription ID of your account>" \
    -backend-config "client_id=<your username>" \
    -backend-config "client_secret=<your password>" \
    -backend-config "tenant_id=<tenant id>" \
    -backend-config "resource_group_name=<resource group name to find your Blob Storage>"

Should everything goes well you should a screen similar to the one below and we are ready to plan our infrastructure deployment!

Planning your deploy – Terraform plan

The next step is to plan your deploy. Use the following command so Terraform can prepare to deploy your resources:

terraform plan \
    -var 'client_id=<client_id>' \
    -var 'client_secret=<secret_id>' \
    -var 'subscription_id=<subscription_id>' \
    -var 'tenant_id=<tenant_id>' \
    -var 'timestamp=<timestamp>' \
    -var 'acr_reader_user_client_id=<User client ID to read ACR>' \
    -var 'acr_reader_user_secret_key=<User secret to read ACR>' \
    -var-file="<your additional vars file name. Suggestion: rootVars-dev.tfvars>" \
    -out tfout.log

Some of the information above are the some as we used in Terraform init. So go ahead and copy them. The rest of them are:

  • TIMESTAMP – this is the timestamp of when you are running this terraform plan. It is intended to help with the blue/green deployment strategy. The timestamp is a simple string that will be added to the end of your resource group name. The resource group name will have the following format: “fixedRadical-environment-timestamp”. You can check how it’s built on file tf_infrastructure/modules/common/variables.tf
  • ACR_READER_USER_CLIENT_ID – This is the client_id used by your Kubernetes to go and read the ACR (Azure Container Registry) to retrieve your docker images for deployment. You should use a new one with fewer privileges than the main client_id we’re using.
  • ACR_READER_USER_SECRET_KEY – This is the client secret (password) of the above client_id.
  • -VAR-FILE – Terraform allows us to add variables in a file instead of on the command line like we’ve been using. Do not store sensitive information inside this file. You have an example on tf_infrastructure/rootVars-dev.tfvars file
  • TFOUT.LOG – This is the name of the file to which Terraform will store the plan to achieve your Terraform configuration

Should everything goes well you’ll have a screen close to the one below and we’ll be ready to finally create your AKS!

Take a look at the “node_labels” tag on AKS and also on the additional node pool. We will use this in the Kubernetes config file below to tell Kubernetes in which node pool to deploy our Pods.

Deploying the infrastructure – Terraform apply

All the hard work is done. Just run the command below and wait for about 10 minutes and your AKS will be running

terraform apply tfout.log

Once the deployment is done you should see a screen like this:

Configuring kubectl to work connected to AKS

Azure CLI does the heavy lifting on this part. So run the command below to make your Kubectl command-line tool to easily point to the newly deployed AKS:

az aks get-credentials --name $(terraform output aks_name) --resource-group $(terraform output resource_group_name)

If you don’t have the Azure CLI configured yet, follow the instructions here.

Applying our configuration to Kubernetes

Now navigate back on your terminal to the folder kubernetes_deployment. Let’s apply the commands and then run through the files to understand what’s going on:

1. PROFILE=dev
2. kubectl apply -f k8s_deployment-dev.yaml
3. kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/cloud/deploy.yaml

PROFILE=DEV

PROFILE=dev – it is setting an environment variable on your terminal to be read by kubectl and applied to the docker containers. I used a spring application, so you can see it being used on k8s_deployment-dev.yaml here:

  1. Kubernetes will grab our PROFILE=dev environment variable and pass on to Spring Boot.
  2. The path where Kubernetes will pull our images from using ACR credentials.
  3. Liveness probe teaches Kubernetes how to understand if that container is running or not.
  4. NodeSelector tells Kubernetes in which node pool (using the node_labels we highlighted above) where the Pods should be run.

Configure K8S

kubectl apply -f k8s_deployment-dev.yaml

Kubernetes allows us to store all our configuration in a single file. This is the file. You will see two deployments (pods instructions): company and customer. Also, you will see one service that exposes each of them: company-service and customer-service.

  • The services (example below) use the ClusterIP strategy. It will tell Kubernetes to create an internal Load Balancer to balance requests to your pods. The port tells which port receives requests and the targetPort tells which port in the service will handle requests. More info here.
Services example
  • Ingress strategy is the most important part:
  1. nginx is the class for your ingress strategy. It uses nginx implementation to load balance requests internally.
  2. /$1$2$3 is what Kubernetes should forward as the request URL to our pods. $1 means (api/company) highlighted in item 5. $2 means (/|$) and $3 means (.*)
  3. /$1/swagger-ui.html this is the default app root for our Pods
  4. Redirect from www – true – self-explanatory
  5. Path is the URL structure to pass on as variables to item 2
  • To add TLS yo our Kubernetes you have to generate your certificate and past key and crt on the highlighted areas below on base64 format. An example on Linux is like first image below. When adding the info to the file remember to past it as a single row without spaces, line breaks or others. Second image shows where to put the crt and key respectivelly.

Apply nginx Load Balancer

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v0.34.1/deploy/static/provider/cloud/deploy.yaml 

This will apply nginx version 0.34.1 to handle our ingress instrategy.

Testing our Kubernetes deployment

After all this configuration run the command below to wait for Kubernetes to assign an IP to our ingress strategy:

kubectl get ingress --watch

You will get an output like this:

Once you have the IP, you can paste it to Chrome, add the path to your specific service and you will get your application output.

Cloud Strategy, IT is business

A Decision Matrix for Public Cloud Adoption

August 3, 2020

Every cloud journey has an important point: which cloud are we going to? This is a decision matrix recently developed as the first step for an important cloud journey about to start. There are three main areas in this article: the Goals of the journey, the Adopted Criteria, and the Final Decision.

1 – Goals for the cloud migration

  • Overall company speed – essential for keeping competitive time to market.
  • Teams autonomy – one more important move to keep time-to-market as fast as possible and foster DevOps adoption.
  • Cost savings – use the cloud benefit of the pay-as-you-go.
  • Security – improve security while handing over a few of the key concerns to the cloud provider.
  • Better infrastructure costs management
  • Keep auditing key aspects valid – eg.: PCI compliant.

2 – Criteria list

The following items are those important for this scenario’s migration. They are a total of Fourteen criteria analyzed to achieve a better overall understanding.

Five is the highest possible score. One is the lowest. Any other between those are valid

CriteriaWeight
Cost5
Feature count1
Oracle migration ease2
Available SDKs1
DDoS protection1
Overall security5
Machine Learning and Data Science features1
Community support3
Professionals availability3
Professionals cost5
Companies that already are in each cloud (benchmark)1
Internal team knowledge5
Auditing capabilities5
Cloud transition supporting products5
Dedicated links with specific protocol availability5
GDPR and LGPD compliance3
Cloud support3

2.1. Cost

The values were converted from US dollar to Brazilian Real in an exchange rate of BRL 5.25 to USD 1.00. RI = Reserved Instance. OD = On-demand instance

Why this criterion is important: Since the cloud move is an already taken decision, the goal of this criterion is to evaluate which cloud is the cheapest for this specific scenario need.

CloudScore givenScore comments
AWS5AWS has higher values in smaller machines and lower values in bigger machines
Azure5Azure has higher values in bigger machines and lower values for smaller machines
GCP3There are some lacking machine types.

2.2. Feature count

Why this criterion is important: innovation appetite of each cloud provider.

CloudServices QtySourceGiven scoreScore comments
AWS212TechRadar1Is the most mature cloud. Has count method simmilar to Google’s
Azure600Azure docs1Has a smaller overall feature count than AWS, but counts it in a different granularity
GCP90GCP docs0Has more basic features and has great benefits for companies that are born in the cloud

2.3. Oracle migration ease

Why this criterion is important: needless to say.

CloudAvailabilitySourceGiven ScoreScore comments
AWSAvailableAWS Docs2There’s a tool to migrate and convert the database
AzureNot availableAzure Docs1There’s a tool to migrate only
GCPNot available0There are no tools to help in this criterion

2.4. Available SDKs

Why this criterion is important: SDKs are important for applications under development.

CloudAvailabilitySourceGiven scoreScore comments
AWSAvailableAWS PHP
AWS Java
1SDK for the main needed languages are present
AzureAvailableAzure Docs1SDK for the main needed languages are present
GCPAvailableGCP PHP
GCP Java
1SDK for the main needed languages are present

2.5. DDoS protection

Why this criterion is important: DDoS is a common attack for digital products. This is an important feature thinking about the future.

CloudAvailabilitySourceGiven scoreScore comments
AWSAvailableAWS Shield1There is standard and advanced protection
AzureAvailableAzure Docs1There is standard and advanced protection
GCPAvailableGCP Armor1There is standard protection

2.6. Security overall

Why this criterion is important: there are some key security features my company is audited by third-party partners to which we must keep compliant.

Source: Three main sources from security experts blogs were used to this evaluation:

Sub criterionCloudGiven scoreScore comments
Overall SecurityAWS1.25AWS gets the higher score according to specialists due to the granularity it allows
Overall SecurityAzure1
Overall SecurityGCP1
Ease to configure securityAWS0.5
Ease to configure securityAzure0.75
Ease to configure securityGCP1.25Google gets a higher score due to ease to configure and abstraction capacity
Security InvestmentAWS1.25AWS is the one that invests the most on security
Security InvestmentAzure1
Security InvestmentGCP1
Security community supportAWS1.25AWS has a bigger community
Security community supportAzure1
Security community supportGCP0.75

2.7. Machine Learning and Data Science features

Why this criterion is important: looking for the future, it’s important to think about new services to be consumed. This feature received a low maximum score because it is not something critical for this stage of the cloud adoption.

CloudAvailabilitySourceGiven scoreScore comments
AWSAvailableMachine Learning as a service comparison1They all have pros and cons and specific ML/DS initiatives
AzureAvailable1They all have pros and cons and specific ML/DS initiatives
GCPAvailable1They all have pros and cons and specific ML/DS initiatives

2.8. Community

Why this criterion is important: a strong community makes easier to find solutions for the problems that will come in the future.

CloudSourceGiven scoreScore comments
AWSCommunity comparison AWS vs Azure vs Google3Biggest and most mature community
Azure2More than 80% of Fortune 500 uses it
GCP1It’s growing

2.9. Professionals availability

Why this criterion is important: the ability to hire qualified professionals for the specific cloud vendor is crucial for the application lifecycle. This research was performed on LinkedIn with the query “certified cloud architect <vendor>”.

CloudSourceGiven scoreScore comments
AWSLinkedIn3183k people found
AzureLinkedIn290k people found
GCPLinkedIn122k people found

2.10. Professionals cost

Why this criterion is important: as important as professionals availability, the cost involved in hiring each of these professionals is also something important to keep in mind.

CloudSourceGiven scoreScore comments
AWSGlassdoor5There was no difference found between each professional
AzureComputerworld (portuguese only)5There was no difference found between each professional
GCP5There was no difference found between each professional

2.11. Companies already present in each cloud

Why this criterion is important: taking a look at companies help to understand where the biggest and most innovative companies are heading to. And if they are doing so, there must be a good reason for that.

CloudBrands foundSourceGiven scoreScore commends
AWSFacebook, Amazon, Disney, Netflix, TwitterWho is using AWS1
AzurePixar, Dell, BMW, AppleWho is using Azure1
GCPSpotify, Natura, SBT1

2.12. Internal team knowledge

Why this criterion is important: the more internal knowledge for a cloud adoption, the faster it will be to achieve a good level of maturity.

CloudSourceGiven scoreScore comments
AWSInternal knowledge4Developers know AWS better
AzureInternal knowledge4Infrastructure team knows better Azure
GCPInternal knowledge0Nobody have ever worked with GCP

2.13. Auditing capabilities

Why this criterion is important: auditing capabilities are important to keep compliant to some existing contracts.

CloudAvailabilitySourceGiven score
AWSAvailableAWS Config5
AzureAvailableAzure Docs5
GCPAvailableGCP Audit5

2.14. Cloud migration products

Why this criterion is important: since this is intended to be a company wide adoption, some areas will have more or less maturity to migrate to a new paradigm of cloud native software development. The more the cloud provider can assist with simpler migration strategies such as an “AS IS”, the better for this criterion.

CloudSourceGiven scoreScore comments
AWSAWS CAF4There are more manual work to perform to achieve data sync
AzureAzure Migration Journey5Due to the company having a big number of Windows-based services, Microsoft native tools have an advantage
GCP3No resources to keep both cloud and on-premises workloads working together were found

3 – The final result

Below is presented the final result for this comparison. Having reached this, I intend to help you cloud journey adoption decisions, but please do not stick to these criteria presented here. Always take a look at what will make sense to your company and business cases.

This adoption must also come hand-by-hand with an internal plan to improve people’s knowledge of the selected cloud. The cloud brings several benefits compared to on-premises services, and like everything in life there are trade-offs and new challenges will appear.

CriterionAWSAzureGCP
Cost553
Feature count110
Oracle migration ease210
Available SDKs111
DDoS protection111
Security overall434
Machine Learning and Data Science features111
Community321
Professionals available321
Professionals cost555
Companies already present in each cloud111
Internal team knowledge440
Auditing capabilities555
Cloud migration products453
Grand total403726
Cloud Strategy, Practical examples

App scaling with operational excellence

June 14, 2020

This is a continuation of the app scaling series showing motivations, clear ways, trade-offs, and pitfalls for a cloud strategy. The last post is an overview of why to prepare and how to start. It can be found here.

As soon as your application starts scaling and more actions are needed everyday to evolve the app to the new scenarios, the automation will come in handy and also in need.

Benefits of automating infrastructure

  • Fastest-possible solution for deploying a new workflow environment. It saves you time for deploying multiple environments like PROD, DEV, and QA.
  • Ensures PROD, QA, and DEV are exactly the same. This will help your engineers to narrow problems and solve issues faster.
  • Immutable infrastructure – the old dark days when nobody knew how a server was still working is gone. With immutable infrastructure, stop using human interference to fix things, and use it only to hot-fixes.
  • Define your workflows as code. Code is more reliable than anyone’s memory.
  • Easily track changes over time (you also achieve more coverage for auditing with this step).
  • Your infrastructure-as-code is documentation you can review and ask for support if needed.

Automating infrastructure

There are a few tools to automate infrastructure creation and each cloud provider has its own. CloudFormation on AWS, Resource Templates on Azure, and Cloud Deployment on Google. But you may be in a organization that wants as least lock-in as possible due to past experience. Then HashiCorp’s stack, specifically Terraform here, comes in handy.

Use cases + Tactics

  1. Saving costs with environments – have your DEV and QA environments shut down at the end of every day to save costs on cloud.
  2. Watching infrastructure – since your infrastructure may change during execution (upscaling, downscaling, termination, etc), you can have a job looking for specific parts of your app that should be kept in a certain configuration. Example: for scaling systems, you can use Terraform to always have one instance pre-heated and ready to be attached to an auto-scaling (AS) group when the application requires, instead of waiting for the time of warm-up of every instance’s configuration. Once you app needs to scale, that instance will be added to the AS group, and some time after that Terraform will provision a new instance proactively for when the load suffer another spike.
  3. Configuration management – applying the immutable infrastructure concept here, you will have one single source of truth for your environment. Example: you had to apply one hot-fix in production to prevent an error to happen. Right after the hot-fix, update your infrastructure-as-code to include that fix so you won’t forget to replicate it to new environments.
  4. Orchestration – let’s say you have your infrastructure primarily on AWS but are using Google for ML. Terraform will orchestrate the creation of them all for you. It saves you the time of going to each cloud and activating CloudFormation, Cloud Deployment, and so on.
  5. Security and Compliance – having your infrastructure as code will make easier for your team to ensure they are following all the security and compliance definitions. The code is also versionable, ensuring auditing capabilities.

Example with Terraform

The code found here will deploy the above infrastructure in a matter of few minutes. It is an example of Terraform provisioning the AWS best practices for infrastructure when still using EC2 instances as your option for computing.

Do not forget to add CloudFront and Route 53 to your stack if you are going to use it in a real environment.

Cloud Strategy, IT is business, Practical examples

App scaling with fast and reliable architecture

May 9, 2020

The pandemic reality gave a huge push on digital businesses. The companies succeeding right now have strong strategies for digital interaction. More on: McKinsey, McKinsey, The Economist, and CNN Business. But how are they scaling their business (and their applications) so fast in a reliable manner? Prepare your applications (aka our Digital Products) with a fast and reliable architecture. This is a paradigm shift for most companies.

Why should we prepare to scale

Costs. An application not prepared to scale accordingly to its demand will cost more to be kept than others. The cloud advantage must be used at most in this scenario. An example: the websites to buy tickets for concerts spend a huge portion of their time working under low or regular demand. But when a well-known group announces a new concert, thousands of people rush to their environment to get a ticket. (find a similar reading here). A useful analogy: if you don’t prepare your application to scale according to demand, it’s like you are always driving an RV even if you are just going to the supermarket instead of going to vacations. You don’t need to carry 5 or 6 people with you to go to the supermarket. As well as your app doesn’t need to be fully-armed at 3 am.

As we can see in the image below, people use the internet with different intensity according to time each day.

Seattle internet consumption on the first week of January 2020 (from CloudFlare)

Peeks of usage. Maintaining a portal with around 100 visits per day (like my own) is fine. But a different approach will be needed to another one with One Million views on the same timeframe. But more important than that, be prepared for peeks of usage to maintain the brand’s reliability and company growth. Zoom is an excellent successful example of application scaling. But they are the minority amid hundreds of bad examples that are impacting our lives. I.E: check New Jersey’s government ask for help with a very very old application).

How to prepare a fast and reliable architecture

Architecting for scalability

Use the advantages of the cloud’s existing tools. All cloud players have efficient tools for load balancing the application’s requests. Microsoft’s Load Balancer, Google Load Balancing, and AWS Elastic Load Balancing are very easy to set up. Once defined rules for load balancing, auto-scaling groups improve the application power to handle requests. Using the auto-scaling groups you can set different behaviors for your app. Both based on demand from users and also patterns you already know they exist (3 am driving an RV). If all of this is new for you, keep in mind that new solutions bring new challenges. Listed below are few things you have to take a look when setting up auto-scaling behavior:

  • Speed to startup a new server – When you need to scale you probably will need to scale FAST. To help with that, have pre-built images (AWS AMIs) to speed up your new servers boot time. Kubernetes orchestrating your app will also help with this.
  • Keeping database consistency – Luckily the big cloud players have solutions to keep the databases synchronized between your different availability zones almost seamless. But once you start working with multiple regions, this will become one more thing to establish a plan and handle.
  • Keep low latency between different regions – Multiple regions can solve latency for your users, but will bring the latency to you. Once again talking about multiple regions (either if you are building a disaster/recovery plan or just moving infrastructure closer to your users to reduce latency). The latency between regions has to be mitigated both on databases and on your app communications.

The attention points above pay-off. Once you have all set, the cloud can keep itself. Looking for alerts on CPU, memory, network, and other usages, and triggering self-healing actions will be part of its day.

Architecting for reliability

To increase your app reliability, I list two good strategies to apply:

  • On the infrastructure and the app level. Adding several layers of tests and health checks is the most basic action for reliability.
  • Architecting for multi-region. Using pilot-light (slower), passing by warm standby and active/active multi-region (faster) architecture solutions for failover and disaster/recovery plans are good approaches. The faster one (active/active) requires the same infrastructure to be deployed exactly the same in two regions. Also, an intelligent DNS routing rule has to be set.
  • Reducing risk with infrastructure deploy automation. Examples of services like CloudFormation (AWS), Resource Templates (MS), and Cloud Deployment (Google). It helps you to create a single point of infrastructure description to be used across multiple regions.

Architecture is a living subject, just like digital products are. Looking for scalability and reliability on the same environment will make you achieve a fast and reliable architecture.