The cloud journey, which is a subject still very common on IT areas of all companies that grew with on-premise systems, may have different complexity levels. To recent companies, with a small number of applications, the projects are commonly simple. But for companies with a long history of existence, it is usual for the project to take more than one year, just to migrate the applications and routines. When we start talking about updating the apps to use the advantages of being on cloud, this date gets longer. For both scenarios, many aspects must be analyzed, observed and monitored to guarantee the success. Below I list 7 important points to help on planning these projects:
Security: set politics and protect applications
As on the majority of system attacks, the social engineering applied to cloud context is the biggest concern for the majority of professionals. Weak passwords and lack of attention when defining politics of which user will have access to which type of actions on the environment are the main problems. Then, create well-defined politics and responsibilities for every single user that will have access to the environment. Each user must have permission to do JUST what is expected from him. The superadmin users are for one or two people on the whole company. Only they will have trust and maturity to orchestrate the whole team.
Another important thing of cloud security is the definition, with the development and architects team, of the company’s APIs access politics. One API which has capacity to make changes to sensitive data must have access, protocols and security designed and restricted to its purpose only.
Validation to avoid data loss
For data migration projects, such as databases, or files of any kind, validations must be done to ensure data integrity. This kind of move may have the intention of keeping backup, high availability, improve performance, etc.
For large volume migrations, such as TB of data, it’s important to check if the file number in the origin matches the destination. Another check is total size (counting bytes here): is it identical on both ends, for each one of the files? For databases, the same verification is useful at first sight, but since the database will probably be changed to work properly at the new environment (either changing to a self-managed service or not), the application integrity and its work validation are imperative, and must be done by people with deep knowledge on its behavior.
Legal: check if it allows the business
Talking about Brazil specifically, the legal part has been breaking governamental migrations for years. It also affects directly financial companies, because it forces them to keep their data physically stored at brazilian territory. Even with Brazil datacenters, the migrations sometimes are still not possible, because the cloud providers keep the automatic data replication, and some of them cannot guarantee that they will be located only at the selected region.
For other applications, not related to government and financial sector, the group of laws “marco civil da internet” allow the majority of companies to be free of those restrictions, and is a good starting point for consulting.
Transfer applications from on-premise scenario to cloud on AS-IS format is an alternative when business requires agressive deadlines. But just changing the location of the server won’t solve problems related to the application architecture, such as performance, scalability, stability, telemetry, etc. When the cloud journey start to look at the way applications were developed, a lot of new benefits can be explored:
Architecture: coupling/lock-in – use self-managed services or not?
A very helpful alternative is the use of self-managed services. Services like AWS Lambda, Amazon RDS, Google App Engine, Google Cloud Spanner, and others, bring at least two strong benefits:
- More speed to the development team.
Using the Amazon RDS as example, the time invested on development to connect an application to this service, instead of a regular native installed database is reduced drastically. It’s allowed by the native features this service has. Examples: automatic backup, fast restoring of the service in case of failure, automatic replication, etc. This benefits don’t have to be developed because they are available within the service. And it can be done through simple configurations.
- Costs saving
This services’ abstraction, about how they operate, allow the costs with operation and configuration to be reduced significantly. The execution strategy for these services will be the cloud provider’s responsibility. It will use the resources the best possible way, because it has knowledge about the entire system.
This and other points must be evaluated considering the important trade-off of coupling and lock-in. The more self-managed services are used, the more dependency that application will have. Also bigger will be the lock-in with that cloud provider. The company who runs the application must have this scenarios mapped to know that, when a requisition of moving the application to a new place come (eg. another cloud provider), the time invested will be bigger, because an AS-IS move won’t be possible.
Architecture: costs – understand the application and find the best service to that business requirement
Understanding the application work and setting the best service to be used to solve a kind of problem is a way to reduce costs significantly. One easy example is the use of hardware by demand using automation. They are available at administrator console level and can be also triggered by APIs, that allow code to handle hardware, to reduce costs of that application.
As shown on the image above, the majority of applications has consumption peaks. The architecture and application planning will allow the elasticity of hardware usage to provide the features according to the time needed.
Another useful example is the usage of more than one cloud to the same application or service. We can find benefits by provisioning hardware at AWS, through its available services that reduce development time, and make them to consume Machine Learning APIs available at Google to handle images, as instance. This concept of hibrid cloud must always be considered aiming to find the best scenario of trade-off between cost and business needs.
Architecture: telemetry – having all registered and automatized to understand “where” is every application and keep tracking
When the applications start growing, and the hardware and resources to be operated increase, telemetry practices will be mandatory. An example: an application who runs 10 microservices, and each one of them is distributed on 5 machines. When a problem happens, its impossible to access each one of the 50 machines looking for where the failure happened. To solve that problem, some proactive and reactive actions can be taken. Below are some examples:
- Reactive: programmatically make the application to detect the problem and create a ticket for the DBA team to analyze;
- Proactive: programmatically make the application to detect one software failure and take actions to solve it. Example: check the service health that are inside one bigger, and detail it until finding that the problem root is a database who stopped responding. Restarting this database and creating a register to a deeper analysis by a human, can be something automatic.
- Reactive: register, in one single location, all the application logs. This way, when it’s needed to add a software engineer to investigate the problem, he will access one single place, and not those 50 instances.
- Reactive: identify that the hardware usage of one specific service is beyond its regular average and warn about what can be a hacker attack.
Whenever a cloud move is planned or is on going, the latency subject is at the table. It will always exist. But its necessary to check the business adherence and ways to avoid the latency to affect the system:
- Latency between request and response, for AS-IS migrations:
For systems which only distribute information, a CDN configuration may be the complete solution. For systems who have transaction situations, maybe its needed to keep them at close datacenters or check architecture practices.
- Latency within the application:
If there is a lot of available microservices, in many instances, different databases with decentralized information, and more than on cloud involved, beyond the client-server latency, there will be latency internally on the application. The latency has to be checked through architecture practices. Cache, queues, etc, can be used to prevent it from frustrating the end user.