Browsing Category

Practical examples

Avoiding problems, IT is business, Practical examples

5 big companies mistakes moving to cloud without a good plan

February 21, 2018

A lot was discussed, and still have been, about cloud journey. The worldwide big companies already have their strategy turned to cloud ever since the applications start being planned. Potentials of security, scalability, autonomy and many other related subjects are not approached on time of project discussion anymore. The cloud doesn’t generate doubts or lack of trust anymore. Once going to cloud is an old subject, beside the attention points already brought, I discuss here some errors I’ve already seen, as a way to support and contribute to new moves:


Face cloud as a side-need

The cloud is the main actor on applications. It’s very common to see companies starting their journey with simple backup or storage routines on cloud. This way they can start dismissing their own datacenters. It’s an important step, many times needed to show safety to a skeptical to changes board. But rollback, backup, disaster recovery, and many other routines, which used to be hard to implement to OPS teams, are now trivial to big cloud players. Using cloud having only this objective is to underestimate cloud’s potential.

Self-managed services have a very high level of automation on the points shown above. Get the benefits from processes automation, and don’t waste time and money researching, planning and implementing. The not self-managed services also have a lot of automation built-in. Backup, rollback and DR are now the basics.


Decision matrix negligence

The option for one or another cloud provider is something very important and for many times underestimated. Selecting a cloud provider is like a marriage and can bring troubled relations! I’ve seen companies trying to run away from its 10 or 20 years traumatic contracts with giants like Oracle or IBM, neglecting cloud decision matrix. They ended up signing new contracts leading to new 10 or 20 years contracts with provider X or Y.

Big players like Google, SAP, IBM and SAP allow a very high level of customization. Keep it in your mind during the decision. Those will allow to configure everything needed to a big application. They cover dependencies, integrations and relationships with other systems, specific engineering practices and etc. It’s also very common that complex applications can aggregate benefits from multi-cloud operations. As an instance: infrastructures that has lower prices on provider X can get benefits from FaaS Bigdata services from provider Y, which is faster treating large data volumes. This way one single application can be distributed over more than one cloud provider and also over more than one geographic region. It allows business objectives like cost saving, delay reduction and access to specific features from niche players.

Beside the big players, there are many niche providers of PaaS and FaaS. Digital Ocean, Heroku, Umbler and Elastichosts as instances, are useful for not so robust applications. Those platforms have high abstraction levels, and it cuts the development/operations team learning path.


Low knowledge of the potential of the chosen cloud

The savings that can be reached using cloud resources, like rollback, DR, backup (mentioned above), monitoring, alerts and automated actions, are very high. Consider it when executing your migration project!

Evaluate PaaS services to run applications or FaaS for network tasks, Machine Learning, deploy, tests and other. As an example, I’ve seen that every company that had software running over on premise infrastructure, ended up at sometime developing something to improve its deploy and testing tasks. It means that, if 10 companies developed similar software, costing 500h each, they would have spent 5000h creating almost equal software. Having services like CodeDeploy, Device Farm, Fastlane and Firebase, this kind of development is not needed anymore. Fast access to these services saves many development hours. It gives more speed to companies and it means they will be able to answer faster to their customer needs.


Low tracking on investment

Do not treat cloud budget as a black box. Track the investment via capable people and separate it by application/business/whatever making sense on that reality. This way true decisions will be taken, contracting or refusing some evaluated service. This is one more practice to prevent the “marriage” with a cloud provider get similar to those contracts of many years with Oracle, IBM and etc, which all big company try to run away nowadays. We have already made mistakes in the past. Let’s learn with them!


No updates

As an example, it’s still common for companies to invest amounts of money buying different mobile devices to test apps. Lack of knowledge about services that allow to automatize testing on those devices make that money to keep being wasted and the benefits of automation not reaching the development process.

With more and more niche competitors growing, the lack of updates to the responsible team can be a big problem. Not knowing good alternatives to current services forbids you to discover and use new benefits.

Customers, Digital transformation, IT is business, Practical examples

Software buying process on IT evolution

February 14, 2018

The more different views you have about the same subject, the more information you will have to take your conclusions and make decisions if needed. This first article about IT evolution had a sight from developers and a very superficial business view. This new article aims to look from the perspective of clients buying software and how they are different even inside the same industries.


How software used to be bought

In past years, companies used to buy software like any other commodity: I want 5 cars with 4 wheels each, manual transmission and white painted. They didn’t mind on how their software is produced, or any kind of good practices to apply on the process. Yes, inside the software requirement documents were sections regarding security, protocols and things like that, but at the end of everything, the lowest price would win.


Old (not too much) decision matrix

The decision matrix (example below using a car buying process) were very simple when selecting the criteria. They were also superficial. The high level criteria were easily like: security levels, performance, proposed schedule, price, scope coverage, success cases, knowledge on selected technology, etc. With this decision matrix, the weights are given according to each project. But the price and schedule, after all the criteria evaluation would still determine who wins.

On the example above: what exactly is comfort and styling for this buyer? Safety can be evaluated using governamental public data. This is a parallel to show how none of the criteria is detailed.

Anyone saying that could cover the entire scope, under the asked schedule having a fixed amount of money could win the fight for the projects. If they didn’t have knowledge enough, it wouldn’t make huge difference. After everything, software was treated like any other product. It doesn’t matter how it’s made. It just matter that it works.


Recent decision matrix

Leaving behind government industry, and few others not affected even a bit by digital transformation, the reality has been changing.

What I’ve seen lately is an increasing spend of time on detailing the criteria. For each of the criteria, the buyers want to know what the supplier have already done, how they plan to conduct the solution and also want to evaluate alternatives for everything.

  • Security: ensure the information exchange will occur under a controlled environment? Tactics for code obfuscation. Platform natural security concerns. Cloud provider certifications, etc;
  • Schedule: from “a detailed schedule” (which is never achieved) to “a macro view of time and a mix of agile methodologies to be followed”;
  • Success cases: show where/when similar solutions were already created;
  • UX:  from “the system has to have good usability” (completely subjective) to “it must follow UX guidelines from Google and be conducted by a specific professional, not by developers”;
  • Performance: from “the system has to load under 10 seconds” to “each endpoint must answer under 2 seconds”;
  • Scope: from “do everything desired” to “let’s do the most we can inside the schedule”;

The companies want to know how their project will be conducted in details, and they also want to put their opinion on it. Since the digital transformation is making companies to turn their core into IT, the IT is not a support anymore. So now they are able to suggest things and to talk at the same level of the consultancies.

With this comparision and evolution scene, the MVP mindset is very clear. Also the goals to achieve faster time-to-market and faster revenues.


Different requests inside the same industry

The report above is true for 100% of companies on the most mature industries in Brazil. But for the rest of industries, using retail as an example, it’s not:

There are stores who are already evolving their ecommerces for security, performance, scalability, stability and the most important: user experience. But there’s also those who treat their online stores as something needed. Is common to copy layout practices from concurrents. Also, security is expensive. Then some companies have a budget to lose due to hacker invasions, instead of investing to security.

The good news is that the market is evolving faster and faster. Soon no industry will be behind.

Avoiding problems, Customers, Office, Practical examples

How to avoid suicide applications

January 29, 2018

Having the experience of managing many projects in many different methodologies and different mixes of them, recently I could notice some attributes that happen in all of them to make it to be a suicide project to manage. By suicide I mean a project that will face issues and go down until it reach an unacceptable status, leading to a client loss, or something extreme like that.

The items below were identified as something common to all of managing styles, and if any of them is common to you or part of your routine, it’s time to change something. Some of them regard people relationship and some of them technical parts.


No QA environment

A project which has only a formal Production environment. The development is made by coders on their own machines. There is no QA (Quality Assurance) environment.

It looks crazy, right? But it still happens. Usually it happens when who is paying the bill doesn’t understand how catastrophic not having a Quality Assurance / Testing / Homologation can be. When it happens you are dealing with your luck day-by-day like the image above.

The QA environment must be, ideally, an exact copy of the production environment. Routines of copy from PRD (production) to QA must be maintained from time to time. When it happens, whenever a problem on PRD comes, it will be faster to track and simulate on QA. If it’s faster to simulate, it’s faster to find a solution and fix PRD. When production is fixed fast, the client loses less money related to that failure, or its brand is less affected due to that stop.

There can be different ways or minimum-requirements to at least have a trustworthy test before sending new code to PRD. The QA data can be different of PRD current. The environments may have different hardware configurations and etc. But the more differences QA and PRD have, the more time will be spent to track down issues.

Talking about a different service, when you don’t have a QA environment, holding proportions, it’s the same thing as a doctor trying a new surgery method without testing it on mice or monkeys before. It can work. But there is a huge chance of going wrong. And when it works, its nothing but luck.


Zero Automation

The more automation you have, the less human errors will happen. People will fail. And it’s normal! Let’s let people to think on what really matters and not on routine tasks. Somehow it will happen in someday:

  • They will commit the last code to the wrong branch. Then production will have a break;
  • They will forget to run a script before deploying a package. Then production will have a break;
  • They will forget to run minimum tests before running a new process or service. Then production will have a break;

A good way to prevent those errors to occur is to automatize as many things as you can.

A good starting point may be automatizing the deployment activity. Many companies I’ve already seen still have issues with deploy. They plan 4 weeks in advance for a deploy. The deploy is done. It crashes. It requires one or two weeks to fix the code and get production running back again. Then it plans with 2 months in advance. It crashes again. If the deploy is a problem, why not turning it into something trivial? Try to make it everyday. A lot of things will be learned.

Automatizing tests is also a great initiative. Automatized tests is one of the best weapons for software stability. The more coverage of tests your app has, the more stable it will be. These two suggestions will require time to be implemented, but their benefits will be felt on software credibility trough the whole company.

Talking about a different service, not having automation is like loading a warehouse box by box, without a fork-lift. A lot of time will be wasted of course. Also some of them will be dropped. Some of them will be shaken too much. It will work, but a lot of risks are added to the project without a need.


Key-user displicence

The one responsible for testing your project is the main actor of your routine. They will be whose opinion will be asked about on going practices. This person must be committed and addicted to your project, helping and pointing fingers to anything that goes wrong. The best projects I’ve ever been enrolled had very picky people testing and giving their OK to delivered features.

If your project lacks this concern, many problems may come up:

  • Lack of perception of value. What you have been practicing/telling is not what the customer likes to hear/watch;
  • Lack of alignment of what is developed versus what really adds value to the business;
  • The worst one: Lack of knowledge about the system. This person should be the one who knows all the system. If they don’t, many problems may come up because of different conceptions of why something was developed in some way;

This example is like a mom who doesn’t care about its baby education. It will grow, of course. But it can turn into a time bomb.


Low judgement at technologies choices

Technology is something serious and can’t be treated as a fashion trend. Fashion trends come and go and are renewed from time to time. A programming language, framework, feature, cloud provider neither anything related to development can be elected because its fancy or is the thing who everybody is talking about right now. All of this decisions must be questioned by the development team and, if possible, count on help from someone outside the project to think together with the team and reach the best choice.

When wrong technology choices are made, it may cause some problems like the following:

  • Difficulties to find people who know that technology to work with you in your project;
  • Problems in the future due to incompatibility of that technology with the project requirements;

Using the parallel to other services, using a new technology which is not proved yet is like buying a hamburguer with vegan-bacon inside it. It can be good, but is not proved yet. It will be your bet.


Deficient communication/status

How is structured your routine of communication with your client? Do you have formal moments to share information? Or do you make it when a trigger come? At this point there is no right answer, like suggested on the article “The magical answer for software development project management“.

But you will have a process to share information and to tell what you are doing to whoever is paying you. It can sound strange, but sharing this information may be a serious problem in many projects. You must ensure the activities you are conducting make sense to the client, and they understand that what you are doing is good to the business. Will you have a weekly meeting? A daily? A status report?

When you don’t have a process to give feedback on what you are doing, is like investing money with an investment company that doesn’t inform you about your gains.


Why to attack all of this?

All of the statements above refer to practices that must be figured in your project. Otherwise they will mean problems on the relationship or on the app data. Whenever any of them happen, they mean losing money or at least weakening the system owner’s brand. It may be natural that the main apps of a bank won’t have large issues. But the user doesn’t like when the registration to that cashback program crashes. It means the bank is not serious about their user and marketing advertising.

Avoiding problems, Practical examples

When and how to do a status report

January 15, 2018

Following a list of few articles already written here, one of the activities I consider very important and I enjoy the most is creating the “status report” for the things that depend on me. (This practical example, was created few weeks ago to a real project).


Status report are NOT only for projects!

Status report is a bigger thing, and doing status reports is not only for projects, but also for everything that depends on you. Few examples:

  • Telling your leader what, about that key activity to the company, advanced during the last week;
  • Telling people what you learned during that important event;
  • Letting people know how is your project going on (sure);
  • Managing your own team;
  • Let yourself know how your life plans are going on. It can be your tool for the self-retrospective moment;

Whether you want to send it to someone (like your leader eg), or not, the status report is a moment, where you have to stop everything and take a look at what really matters from above.


Creating your model

What do you want to report? As of many status reports I’ve already dealt with, probably the most basic information is: date (when it was generated), last updates (textual relevant things) and KPIs (executive numbers). You must also have in mind who will receive it (if your plan is to send it somehow). What is the language the people on the other side will understand? Should it be more executive? Can it be very detailed (probably not)? Below I’m showing some suggestions for some kind of reports as examples above:


Key activity status report

You can think on sending your information inside an email’s body, according to how small and/or important is your information. It depends on how formal you want to be.

  • Date – is the information inside an email? The email sent date can be your information. Otherwise, say when the snapshot was taken;
  • What is the activity? Whenever you look at it, it will help you to keep the focus on what really matters and to take away anything which is not aligned with your main goal. What’s the expected result?
  • Dependencies – suppliers dependencies and schedule are very important to show. Anything you depend on must be shown here, because it may affect your progress.
  • Updates – general relevant information. Example: during this week, activities X and Y were finished. The last one, Z, will be finished in two weeks. Also, the scope or important facts regarding an important activity changed.
  • News! – are you researching a new technology to the company? A good section could be news regarding that research, having its content coming from the top influencers.
  • KPIs – how can you measure how close you are to reach the activity’s goal?


Event results status report

So the company is investing sending you to an important event in your area. It makes sense to let everyone know what you learned there and sharing the new information with whoever wants it. Imagine it will be 4 days of event, then one compilation a day can be nice.

  • Date – what is the day inside the event?
  • What was the main subjects? What was the subjects you heard during the speeches? Did you know any new important technology? Who was there talking? Was this guy someone known widely? What is his LinkedIn profile?
  • Descriptions about everything – Be short but also say everything relevant that was spoke. How can you and your company get benefits from that speech? Can you suggest an strategy to move towards that opinion/techonology?
  • Opinions! Since you were there, give your opinion. The speech was nice but you have your doubts about the subject potential? Say it!


A regular project status report


Team progress status report

If you are a team leader, a status report can be very useful to have regular looks at your work as a leader. Let’s look how your team is doing! It will, for many times, get crossed by the regular project status report (above), but there are more things to look when we talk about team, not only projects.

  • Date – the date when was this snapshot taken;
  • Operation / regular activities – it’s basic. How are your regular activities going on? Create a quick KPI here: is everything ok or not? Are you up to the schedule with your regular deliveries? Extract it from the project’s status report suggested above;
  • Customer – how is your relationship with the customer? Do you know who is responsible for taking important decision inside it? Who are your key actors that you must monitore everything they say? What is their straight current opinion about your relationship right now? A satisfaction survey, conducted by someone new can, be very useful;
  • Team health – is everyone inside your team well satisfied with their job? How does it fits in their careers? It’s your job to ensure everybody is well motivated and feeling guts.
  • New stuff – how are your plans to improve? Why not showing plans for a new process you may be thinking to increase people’s knowledge? Also trying to reach someone new inside the customer you don’t have a established relationship yet? It’s time to look to whatever is important over all your responsibility and plan actions to improve them;


Next steps


When you must let people know about your status report, it’s meant to be presented, not just sent. The most important thing after a big status report, is getting feedback. The feedback will also come from yourself when you are updating your file, and not only from people who will receive it. So, whenever you find something not going well as planned, plan and take actions to put that thing back on track.

Customers, Practical examples

Status report – A practical model and example

January 1, 2018

Status report is a very powerful tool used for many different subjects and intentions. This practical example brings a real scenario created by me few weeks ago and shows an approach to pass the message to people that matter.

The model is presented below, and all the sections are also described after it, showing what is the intention and why it makes sense to this scenario. It’s AVAILABLE HERE (english and portuguese) and is free to be used in your projects.


The model


Identification, project manager, objective and date

These four sections are very important to ensure everybody who opens your file know exactly about what project you are talking about.

  • Client X – What is the company paying and interested in your project? It’s important to you and any other people inside your company to understand;
  • Project Y – What’s the project name? It’s the identification to whoever may receive this file to understand its purpose;
  • Project Manager – Who is responsible for this project? On all sides that matter to you. In this case just my company and client’s were important. You may add other suppliers as an instance;
  • Objective – Your main goal. It’s important to keep focused. Since it’s a file accessible to everybody on the project, everybody must understand to what they are contributing and feel part of that;
  • Date – it shows the period the file regards. Keep track of your status reports! This will allow you to know your progress and things you have already tried to do, and how they are going;


Status, logos and deadlines
  • The overall status is an executive information and to where everyone will take a look. It’s a simple message showing how your project is going on. It’s not what you think is happening. You have to have numbers to set it. In this document we used 3 KPIs (explored below) and the rules are: if all of them are green, the overall is green. If at least one is yellow, the project is yellow. If anyone is red, the project is red;
  • The logos (yours and your client’s) have an important message about the partnership responsible to conduct this project. It passes a subjective message showing to the whole team how the two companies are engaged to reach the same objective;
  • The original deadline is the first planned date to finish the project according to the schedule;
  • The replanned deadline is filled only if you had to replan your deadline for some reason. The reasons must be shown on the updates (explored below) when a replanning happens;


Risks, changes and dependencies

Whatever happen to risks and changes must be checked by you, as the project manager in charge, to understand if it will require any kind of replanning. It can affect your schedule, your costs, people involved, external resources, suppliers, etc;

  • Risks is where you list everything that may affect your project and make something planned to change. It’s common to be something out of your control, so have a plan to each of them to deal with problems when something go wrong;
  • Changes must be filled when something relevant arranged was changed for some reason. A decision was rethought? Make it loud;
  • Dependencies section is used when you have tasks depending on someone else’s effort. It’s very important because if they are delayed, you will have to deal with the impact;

The example has a dependency with a strikethrough deadline. It’s meant to show that a planned deliver was not finished and was replanned.

There is also a dependency with a deadline painted red. It regards a task that must be performed by someone outside the original project’s team. It must have a person responsible and also the date when the deliverable will be available. If that task is delayed, the whole project can be delayed.



This section is the one which will have a high rate of information change from week to week. All of its content can be changed from one week to another. Here is where you list everything that happened, new people added, results of tests, deliveries from another suppliers, and etc. I also like to keep two sub sections: what was finished during the last week, and what will be done in the next.



The schedule makes sense when you are dealing with a project with fixed scope, and also when not. You can use it to make things visible about the next important deliver, when talking about agile. You can show macro deliveries or show 100% of your whole planning process (as the example shows).

It lists macro tasks, and gives them colors:

  • When green, it’s ready to be developed;
  • If its orange, it depends on something external and can’t be started now;
  • For red, it means it’s delayed for any reason;

It also shows the tasks planned to be delivered during the current week (orange painted on the first line of the table). It’s represented by the X on the respective cell. And at last has the percentage of each one of the tasks;



The things you decide to measure during the project. For this one, people, scope and dates were the main things. The table shows the metric and the status for each of them. Whenever you have something different than green, you have to write a reason telling why it’s not going well. Do not forget to add the costs to this KPI section;

Avoiding problems, Career, Customers, Practical examples

Communication issues? Maybe it’s engagement issues

December 24, 2017

I started thinking on this article as my self year’s end retrospective. And I realized the biggest thing I learned during this year regards engagement.

I’ve been hearing, for years, that “80% of the projects that fail, fail because of poor communication”. That’s a sad true, but this year I realized this is an easy but messy abstraction. When you say that communication is the problem, you are hiding many serious problems, not only in your project, but also in your organization. And after that eureka moment, a new one came: people do “communicate the proper way” if they are engaged and have support to develop themselves.


Some examples
  1. We have a contract to watch an online app. It’s not very critical, but we still have SLAs to accomplish when it’s down. Then one issue comes and the person from OPS forgets to tell the customer, during the first 15 minutes, that he’s looking for a solution. Beyond that, he takes one more hour to solve the problem, than the 4h regularly expected. It will cause an argument during a meeting between the managers and everybody will finish the subject saying that they had a communication issue;
  2. We evolve a very critical solution, that deals with money, to a customer. Then the developer didn’t create the automatized test and the app breaks when goes to production. It will cause an argument during a meeting between the managers, and everybody will agree that the QA guy didn’t communicate properly to the DEV guy about the missing test.
  3. The project manager tells the project is delayed, but doesn’t tell how much it is. When something goes bad and someone ask for it, he defends himself saying he told about the delay, and had an interpretation issue because those who read didn’t understand the message he wanted to say. Then after that meeting, everybody will agree that he haven’t communicated the right way;


Why these things happen

What I learned during this year, regarding engagement, was to identify why these examples happened:

  1. The person from OPS team didn’t tell the customer he was working at the first 15 minutes, and forgot to tell he would take one more hour to solve the ticket, because he was not well empowered of its position. He doesn’t understands his position. If he forgets to tell the customer about the down time, the customer won’t be able tell the users. The users will start calling the call center, and it will generate an overhead on the call center team. So at least 10 people will have more work to do, forcing the company to pay for that extra hours, because of one missing news. The person from OPS wouldn’t have forgot to tell anyone needed if he truly understood the scenario.
  2. The second example has nothing to the QA. The responsibility affects the developer. He hadn’t created the test because he doesn’t know that every one hour of that system stopped, makes the customer to loose US$ 10.000,00. Nobody, when old, will want to tell stories to their children like “hey, once I stopped the production environment, and screwed many people’s life”. So the developer was negligent about the test because he is not well empowered of its responsibilities and capabilities.
  3. The project manager will is not to hide information. He gets paid, essentially, to keep people informed. When he waits one or two months to say the project is delayed for 2 months, he doesn’t think the people from project will be needed in a next one, and it will cause a loss on the company. Since people won’t be available, the original project’s budget won’t be enough, and the whole company will spend more money to hire new people to the next one.

So if you are inside any of these examples, think on the other side as your true friend. Would you let, any of these examples to happen, if you knew how deep it would affect them?


Solving the wrong way

It’s easy to put all these problems inside one basket and label them as communication issues. Then, our technical minds will think about processes to guarantee that:

  1. The tickets system automatically send and email telling the customer we’re working to solve the issue;
  2. We will create an integration between the task acceptance tool and the code repository to check if all the tasks have at least one test designed;
  3. The project manager will always present the status report having the customer on it’s side, so no information will be missed again;

The suggestions above are not wrong and shouldn’t be faced as something to be avoided. But they must be the last alternative. They are all mending to lack of people engagement.


Solving the right way: build together

People tend to be good. When you share the right way their responsibilities and how their activities affect each other, they will feel inside the group and will always try to help everybody. So why not asking each one of the actors of these examples how they would solve the problem? Present the problem, tell everything that happened because of their behavior. How to solve? This is the most difficult part. Probably the first thing coming from them won’t be the perfect solution or the solution you thought. Then it’s time for to show a better solution and check if it makes sense to those who will put it in practice. A mix of the ideias can be a good start.

And now we talk about people’s maturity to solve problems:

  • If they are junior people with good potential, just letting them to know their responsibilities can solve the problem. They have good intentions! They won’t forget to tell anybody the next time about what’s happening;
  • If they are overwhelmed, you will advice about looking for self management tools. And also will help them to focus on what is important and not getting overwhelmed again;
  • If you realize its needed, why not using one of the process solutions described above as the wrong way? Take care because if must make sense to everybody affected. It depends on people maturity to not feel pushed;


The main goal is to look for a mindset of empowering and networks, as described on the image below:


Always have next steps

Once I was invited to join a retrospective. The goal was to help a team with an external vision so they could improve their processes. The first thing I asked was where the “next steps” of the previous retrospective were. For my surprise they weren’t.

Always keep track of goals and results. It gives you sense of improvement, and if it doesn’t it will be clear to find where new mistakes were made. Have a periodic checkpoint with each one of the scenarios above until it fits a track where the problems don’t happen again.

After that, when you have the whole solution, you will be able to share your good practice to other teams around you:

  • What happened; Why happened; When happened;
  • What did you do to solve it, what were the first trials, and what actually solved it?
  • What did you learn from it? Next steps;
Avoiding problems, Digital transformation, IT is business, Practical examples

What is telemetry, why it’s important and how to start!

December 17, 2017

The application stability has been a more frequent concern for companies specially when we talk about high value applications. Every time a core application stops working, many money is lost or many money stop being made. Because of that, a lot have been said about telemetry for applications more and more often. But what is telemetry for software actually and how to get benefits from this practice?


What is telemetry?

Telemetry is the act of measuring something remotely, by distance, and automatically.

Talking about software architecture, the telemetry is already very easy to find. Some simple examples are the Chrome platform, the Windows, OSX, Android, iOS and Sony’s Playstation OS operational systems, and also your mobile and desktop apps, such as Spotify and Microsoft Office. What these softwares do is to operate and gather all data that matters to its work. Then they send this data, naturally just if you allow, to their manufacturers (Google, Microsoft, Apple, etc). The next step, when they are grouped, is to analyze the data and change things if they have to. The core intention is to improve the systems so they can operate in many different environments having its proper behavior.

So the main thing about telemetry is to operate, gather data, analyze, and then improve the system code to reach a better behavior.


Why telemetry?

The telemetry can bring a lot of value to the business. Lets explore an example. Imagine you have an app running, which has a non-interactive FAQ screen to your users. Once your users get there, they will stop using your call center service because they have already found what they were looking for. This means money saving to your company. Now imagine that one of the answers (a quick how-to video) of this screen, for some reason, stops being shown (stops working properly). If you don’t have something checking for this screen’s healthy, it will be hard for you to notice, because we don’t have people browsing on systems 24/7. And then you will depend on some user good will to TELL you that the screen stopped working. It will happen sometime, but before that, your call center will start being called more and more. That’s waste of money because of software bad behavior.

The telemetry can be used to check many levels of service operation:

  • Very deep information, like machine’s CPU and memory. Are they green?
  • Do the number of machines up match your historic knowledge about how many were supposed to be up to support 1, 2 or 3 thousand of users at the same time?
  • Are the core webs-services your customers access every time up?
  • Is the final screen the user uses to login up?

Then when a telemetry practice is watching the important things on your application, you will be able to take actions and prevent problems from happening.


How to start?

A good telemetry implementation will depend on the size of information you will want to store and to analyze. The more information you have, the more infrastructure and knowledge you will have to have to process it all. It can even mean using bidata tools. But let’s talk about a new simple example, such as a system that receives vehicles security data from a company that sells insurance letters. A good way to start can be as following:

  • Identify why to measure: let’s measure because it is a core service that tells our customers how their loads are being transported over the roads;
  • Be sure of the goals related: the main goal is to keep the entire system running without down time, because every time this application stops, the contract allows the customers not to pay for that down time;
  • Identify what to measure: let’s check if all of our many data inputs are up. Let’s also check if the inputs are sending the same amount of data they are used to;
  • Set a strategy for measuring: let’s reach every end-point of the many data inputs. If they are available means its healthy. If they are not, its red. Then let’s read the amount of data received in the last minute. If it’s around the known number, its healthy;
  • Set an analysis strategy: can we automatize anything? If one of the endpoints is down, is it useful to restart the operational system, container or application server of the load balancer or the micro services? Or will it have to be shown in a dashboard to a human to analyze?
  • Implement ways to gather the data: let’s create the code to gather data and take actions, or to show it;
  • Show it! Now it’s time to show it. Is it useful to create a chart? Let’s show it using colors, so people can easily identify when there is a problem. If we need results fast, a good very first small step MVP could be opening a ticket on the infrastructure team;
  • Analyze: this is the most important time. It’s time to be critical and identify the root reason of the problem. Do not focus on the problem, but why it is happening. Why it is happening? Do we have problems with the parts that send data to us? Is the problem on our side? If yes, is our application running the way it should? Do we have to change something in our development process?
  • Take actions: through code or not, solve the things you found in all the steps above;


The telemetry can bring a lot of value to the business. It will give you intelligence to act before things happen. If your business is critical, it can mean a lot of money easily. It’s a common practice to many things in our administrative areas, like the PDCA mindset thought, why not do that for our software?

Avoiding problems, Office, Practical examples

A wrong example to customer first

October 29, 2017

I’ve already wrote briefly about how Brazil are still not used to think on customer first, and how different maturity-level companies face digital transformation. Now, one step beyond the theory, I’m sharing one WRONG example of customer first thinking.


Why we reached the problem

This specific and short example is a very technical one, since it’s an IT issue. But the software development team maturity is affecting directly the problems they are facing right now.

Just to start, I want to rewind a bit and present an infographic about software development maturity evolution. From “traditional” methods down to modern agile, where the customer first mindset is very strong. I can say the guys from this example are leaving waterfall and are reaching the SCRUM by the book as suggested on the infographic “Software development maturity evolution brief story“.


The problem
  1. It’s a company of investments advices. This is the most important system for the company. It gathers all the data to be processed sometime in the future. If this system stops working, the predictions stop being calculated and we have nothing to advice (information to sell).
  2. Right now there’s a huge load capacity issue, that makes it stop working from time to time and the company stops predicting. It makes them lose a lot of money almost weekly.
  3. It’s currently impossible to optimize databases and machine hardware for now. There’s a need for an architectural deep change.
  4. So, there’s not a short-term solution to the problem.

Then we have a crisis established. The company loses a lot of money weekly because it’s receiving more data that it can support. It’s a very clear IT planning problem. Nothing short-term can be done to solve it, and all the solutions are long-term. The first things to solve the problem needs 4~6 weeks to be done.


Why it happened and why the company is not thinking customer first

But why did it reach this point? It’s related to the development team maturity. They lack of practices of architecture, scalability, stability, resilience, failure prevention, failure tracking, log tracking, etc.

All of those listed above are just consequences. The real motivation is the lack of investment on the software production process. Like the example on the articles listed above, the mindset is defined to think about the product, which are the investments advices. The majority of the investments of the company go straight to people who look at the market, compare the data this system generates, and make the advices better and better. Cool uh?

But the real product here is the information this system processes and the insights it generates itself. There must be a half term of the company investment planning between the advices itself (business area) and the IT modernization (IT area). The bigger is the investment on the IT, in this case, the lower will be the need for investment on the business area, since the system already predicts that data the analysts are checking. If the development team maturity were located at the Lean/DevOps step, or at least XP, probably this problem wouldn’t have reachd this point.


What to do now

Of course, an express contract with a company to help them to make the architecture changes as fast as possible is the first thing to seek. But the long term solution is to fix the investment target. The money must have a better destination. The company board must stop seeing the IT area as a support and make it the area that holds the whole company.