The Cloud Native foundation as a good source to check new moves from the cloud industry
The content
Cloud-Native Computing Foundation: For an application to be considered truly Cloud Native they need to be:
Built for fault tolerance
Horzontally scalable
Written in a manner that takes full advantage of what cloud providers have to offer.
Cloud Native Applications prioritize the following:
Speed
Short cycles
Microservices
Loosely coupled
DevOps
Pet vs cattle way of handling our servers:
As a developer, you care about the application being hand-cared for − when it is sick, you care of it, and if it dies, then it is not easy to replace. It’s like when you name a pet and take care of it; if one day it is missing, everyone will notice. In the case of cattle, however, you expect that there will always be sick and dead cows as part of daily business; in response, you build redundancies and fault tolerance into the system so that ‘sick cows’ do not affect your business. Basically, each server is identical and if you need more, you create more so that if any particular one becomes unavailable, no one will notice.
Cloud native action spectrum:
Cloud native roadmap of adoption (the majority of companies are on step 4):
There’s a landscape map listing tons of vendors on the cloud native foundation for each specific need: http://landscape.cncf.io
Exercises and Assignments
Assignment: Create a presentation showing the push you are planning for your company. Think about steps, risks, mitigations, and how you plan to lead the journey. Think about the presentation as if you were presenting it to your CEO or a client.
Several cases of Agile adoption in a set of big and mid-size companies. Also presented key benefits, challenges and outputs of an agile adoption
The content
Today, over 50% of the Fortune 500 companies from the year 2000 no longer exist. GE is stumbling. BlackBerry (RIM) is gone, and so is most of Nokia, raised to a $150 billion corporation. (…) John Boyd developed a methodology for operating in such situations, called the OODA Loop. The speed of executing the loop is the essential element of survival. It involves testing one’s premises by actual Observation, Orienting your corporation with respect to the situation. Then Deciding on a course of action, and then executing that plan by Acting. This is the meaning of being Agile. (…) Data is the new gold.
MIT – Cloud & DevOps course – 2020
Agile Adoption
Pros of agile software development:
Customers have frequent and early opportunities to see the work being delivered and to make decisions and changes throughout the development of the project.
The customer gains a strong sense of ownership by working extensively and directly with the project team throughout the project.
If time to market is a greater concern than releasing a full feature set at initial launch, Agile is best. It will quickly produce a basic version of working software that can be built upon in successive iterations.
Development is often more user-focused, likely a result of more and frequent direction from the customer
Cons of Agile Software Development:
Agile will have a high degree of customer involvement in the project. It may be a problem for some customers who simply may not have the time or interest for this type of participation.
Agile works best when the development team are completely dedicated to the project.
The close working relationships in an Agile project are easiest to manage when the team members are located in the same physical space, which is not always possible.
The iterative nature of Agile development may lead to frequent refactoring if the full system scope is not considered in the initial architecture and design. Without this refactoring, the system can suffer from a reduction in overall quality. This becomes more pronounced in larger-scale implementations, or with systems that include a high level of integration.
Managing Complexity of Organizations and operations
As companies grow, their complexity grows. And they have to manage that complexity, otherwise, it’s gonna turn into chaos. The problem is that they usually manage that putting processes in place: you have to sign X docs, follow Y procedures, etc. The problem is that we tail employee freedom, and the side effect is that the high performing employees tend to leave our company.
Netflix’s solution to this scenario was different. They decided to let the smart workers manage the complexity instead of putting processes in place.
The problem for the traditional approach is that when the market shifts we’re unable to move fast. We have had so many processes and fixed culture that our teams won’t adapt and innovative people won’t stick to these environments.
That leads us to three bad options of managing our growing organizations:
Stay creative and small company (less impact)
Avoid rules (and suffer the chaos)
Use process (and cripple flexibility and ability to thrive when the market changes)
Back to Netflix case: they believed that high performing people can contain the chaos. With the right people, instead of a culture of process adherence, you have a culture of creativity and self-discipline, freedom, and responsibility.
Comparing waterfall and agile software development model
Assignment: Write a summary about two articles suggested by the MIT that highlight the complexity of turning Agile that some companies faced and how they are thriving.
Introduced the serverless paradigm, pros and cons, limits and the evolution to reach it
The content
Serverless computing is a Cloud-based solution for a new computing execution model in which the manager of the server architecture and the application developers are distinctly divided. A frictionless connection exists in that the application does not need to know what is being run or provisioned on it; in the same way that the architecture does not need to know what is being run.
The journey that led us to serverless (image below).
A true microservice:
Does not share data structure and database schema
Does not share internal representation of objects
You must be able to update it without notifying the team
Serverless implications:
Your functions become Stateless: you have to assume your function will always run a new recently deployed container.
Cold starts: since every time your function will run in a new container, you have to expect some latency for the container to be spun up. After the first execution, the container is kept for a while, and then the call will become a “warm start”.
Serverless pros:
Cloud provider takes care of most back-end services
Autoscaling of services
Pay as you go and for what you use
Many aspects of security provided by cloud provider
Patching and library updates
Software services, such as user identity, chatbots, storage, messaging, etc
Shorter lead times
Serverless cons:
Managing state is difficult (leads to difficult debug)
Complex message routing and event propagation (harder to track bugs)
Introduced the regular high-ground phases of a Digital Transformation (and cases for exploring them), which are:
1 – Initial Cloud Project
2 – Foundation
3 – Massive Migration
4 – Reinvention
The content
Cloud computing services are divided in three possible categories:
IaaS – using the computational power of cloud computing data centers to run your previous on-prem workloads.
PaaS – using pre-built components to speed up your software development. Examples: Lambda, EKS, AKS, S3, etc.
SaaS – third-party applications allowing you to solve business problems. Examples: Salesforce, Gmail, etc.
XaaS – Anything as a service.
An abstraction of Overall Phases of adoption:
1 – Initial Cloud Project – Decide and execute the first project
2 – Foundation – Building blocks: find the next steps to solve the pains of the organization. Provide an environment that makes going to the cloud more attractive to the business units. Examples: increase security, increase observability, reduce costs.
1st good practice: During this phase, you can create a “Cloud Center of Excellence” committee to start creating tools to make the cloud shift more appealing to the rest of the organization.
2nd good practice: Build reference architectures to guide people with less knowledge.
3rd good practice: Teach best practices to other engaging business units.
3 – Migration – Move massively to the cloud
One possible strategy is to move As Is and then modernize the application in the future (the step below).
4 – Reinvention – modernize the apps (here you start converting private software to open source, Machine Learning, Data Science, etc).
See the picture below for an illustration of these 4 steps:
Phases of Digital Transformation, and time and value comparison
The pace of adoption is always calm. Even for aggressive companies, like Netflix that took 7 years to become a cloud-first company.
Principles for their “shift left” on number and coverage of tests:
Tests should be written at the lowest level possible.
Write once, run anywhere including the production system.
The product is designed for testability.
Test code is product code, only reliable tests survive.
Testing infrastructure is a shared Service.
Test ownership follows product ownership.
See below two pictures of (1) how Microsoft evolved their testing process model and (2) the results they achieved.
1 – How Microsoft evolved its testing model
2 – Microsoft results
Layers of test (based on Microsoft example):
L0 – Broad class of rapid in-memory unit tests. An L0 test is a unit test to most people — that is, a test that depends on code in the assembly under test and nothing else.
L1 – An L1 test might require the assembly plus SQL or the file system.
L2 – Functional tests run against ‘testable’ service deployment. It is a functional test category that requires a service deployment but may have key service dependencies stubbed out in some way.
L3 – This is a restricted class of integration tests that run against production. They require full product deployment.
“The best way to avoid failure is to fail constantly”
“The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most in the event of an unexpected outage”
The DevOps revolution: importance of continuous feedback, data-driven decisions, pillars of DevOps and metrics
Main quote
Today, software development is no longer characterized by designers throwing their software ‘over-the-wall’ to testers repeating the process with software operations. These roles are now disappearing: today software engineers design, develop, test and deploy their software by leveraging powerful Continuous Integration and Continuous Delivery (CI/CD) tools
Delivery lead time (measured in hours) – e.g.: how much time is taken between the task registered on the management tool until it reaches production?
Deployment frequency – how many deploys to the Production environment we make weekly.
Time to restore service – how many minutes we take to put the service back to work when something breaks.
Change fail rate – how many of our deploys to the Production environment cause a failure.
Importance of information flow. Companies have to foster an environment of continuous feedback and empowerment. It allows everybody to solve problems and suggest innovation within their area of work.
Data-driven decision making
Pillars of well designed DevOps:
Security
Reliability
Performance Efficiency
Cost Optimization
Operational Excellence
A good example of a well-designed pipeline abstraction:
Version control – this is the step when we retrieve the most recent code of versioning control.
Build – building the optimized archive to be used to deploy.
Unit test – running automated unit tests (created by the same developer that created the feature).
Deploy – deploy to an instance or environment that allows it to receive a new load of tests.
Autotest – running other layers of the test (stress, chaos, end to end, etc)
Deploy to production – deploy to the final real environment.
Measure & Validate – save the metrics of that deploy.
There are companies that are up to 400x times faster on having an idea and deploying it to production than traditional organizations.
Several analogies between Toyota Production system and cases (below) and DevOps:
Just in Time
Intelligent Automation
Continuous Improvement
Respect for People
Theory of Constraints:
You must focus on your constraint
It addresses the bottlenecks on your pipeline
Lean Engineering:
Identify the constraint
Expĺoit the constraint
Align and manage the systems around the constraint
Elevate the performance of the constraint
Repeat the process
DevOps is also about culture. Ron Westrum’s categories for culture evolution:
Typical CI Pipeline:
Exercises and Assignments
Assignment: Creating a CircleCI automated pipeline for CI (Continuous Integration) to checkout, build, install dependencies (Node app) and run tests.
Docker, Containers Orchestration and Public Key Infrastructure (PKI)
The content
How the stack of software components used to run an application got more and more complex when compared to past years.
In past years a huge number of web applications used to run on top of LAMP (Linux, Apache, MySQL, and PHP/Pearl). Nowadays we have several different possible approaches for each one of the layers of this acronym.
Containers are the most recent evolution we have for running our apps. They followed these steps:
The dark age: undergoing painful moments to run your app on a new machine (probably using more time to run the app than actually writing it).
Virtualizing (using VMs) to run our apps, but having the trade-off of VMs’ slowness.
Containers – They are a lightweight solution that allows us to write our code in any operating system and then rerun it easily in another operating system.
The difference between Virtual Machines and Docker:
The analogy between the evolution of how humanity solved the problem of transporting goods across the globe using containers (real and physical containers) compared to how software developers used the containers abstraction to make our lives way easier when trying to run an application for the first time.
Kubernetes is introduced and the benefits approached:
Less work for DevOps teams.
Easy to collect metrics.
Automation of several tasks like metrics collecting, scaling, monitoring, etc.
Public key infrastructure:
More and more needed interaction machine-to-machine requires more sophisticated methods of authentication rather than user and password.
Private and public keys are used to hash and encrypt/decrypt messages and communications.
Exercises and Assignments
Exercise 1: Running a simple node app with docker (building and running the app)
The course has two big parts: (1) technical base and (2) business applications and strategies
The second module introduces benefits, trade-offs, and new problems of developing applications for scaling.It also covers the complexity of asynchronous development.
One more technical assignment is present, and it’s based on Node to focus on Javascript since it’s the most used programming language nowadays.
The content
To start with, they approached the whole web concept (since its creation by Tim Berners Lee).
The Javascript creation (the most used language for web applications worldwide).
How Google changed the game creating Chrome and the V8 engine.
The creation of Node.JS.
Implementing a simple webserver at Digital Ocean.
The evolution of complexity between the web first steps and the day we are right now: Open Source, JSON protocol, IoT and Big Data and Machine Learning more recently.
The world of making computation in an asynchronous world/architecture.
Exercises and Assignments
Exercise 1: forking a project at GitHub.com and sending a pull request back.
Assignment 1: Running a simple node application locally (a PacMan game) to understand the communication between the client (browser) and server (Node.JS), and also retrieving metrics through an endpoint using JSON as a communication pattern.
The first module is putting everybody’s knowledge up to date about the internet and software development practices evolution.
Assignments of the first module are simple when technically speaking
The content
Disclaimer: I won’t post the course content and deeper details here for obvious reasons. Everything mentioned here is my learning and key takeaways from each class/content.
The first module is very introductory. Concepts like the internet creation and explanations about how the information flow evolved from the first internet connection to the cloud are approached very briefly.
More than being introductory, it is very straightforward and hands-on (which I consider great). There are forum discussions for the participants to get to know each other, and an open Q&A about the exercises and assignments.
Exercises and Assignments
Exercise 1: examining a small JSON file at the Chrome console to understand the JSON pattern and Javascript key concepts.
Exercise 2: examining a BIG JSON at the Chrome console to show how things can get complex eventually.
Exercise 3: running a Node simple app to analyze the BIG JSON file from exercise 2.
Assignment 2: Creating a simple static personal website at GitHub.io. For this one, I went a bit further and added a small set of free static CSS and HTML pre-built to reach something better than just the “hello world”: https://guisesterheim.github.io/