Estimating Time Accurately in Software Development

2018-08-13 An ability to accurately estimate the time required to complete a task is a crucial trait of every software developer. As a rule, this skill is honed over time by trial and error. However, such mistakes may be costly both for the software development company and the product's owner. As a result, more than 66% of all projects in the industry have cost and effort overruns. Still, there is a way out. Having spent a little time and effort from the very beginning, we could avoid conflicts with customers and save funds in the future.

By engaging smarter approaches, we are able to predict more accurate timeframes of certain features' implementation. This isn't about some sort of magic - we mean well-known and rich statistical and mathematical machine. Mathematical methods already allow us to make accurate projections without any independent research and reinvention of the wheel. Summarizing the accumulated statistics and using it for the future planning of sprints and tasks, we can avoid frequent estimation mistakes and deadlines violation.

And, of course, we don't suppose to solve the task manually - after all, we want to make our lives easier, not vice versa. This is not required. Any project management software could provide you with this feature. Such a tool already gathers and processes enough data about your team and projects to make necessary calculations of precise timing and other useful parameters. The only challenge left is to use this data for something more than just drawing beautiful charts. Let's imagine how this could work. We will examine the problem on the example of Riter project management tool, although any other software could take its place with a small adaptation and provide the same functionality.

Look at the standard set of features in a project management application. It has access to the entire history of your projects, user stories, estimated and real time spent on tasks, the line-up of every team, workload of everybody (an employment in several projects, a number of tasks assigned to each developer) and other information depending on your tool. For example, what is a usual pace of each developer's works. How many hours a certain employee works per week. How accurate his time estimates are. How productive he is. Many of these characteristics can be easily defined by a project management software so as all required data is available.

For example, in Riter, we distinguish dirty and clean hours, so developers' efficiency can be calculated as the ratio of clean hours to dirty ones. Their accuracy of estimates, in turn, is defined as the ratio of time estimated for tasks to spent time. What to do with the parameters received is a personal choice of everybody. For instance, we can use the accuracy and productivity factors to refine the expected time estimates. Or apply them in more complex predictions.

The value of workload is also easy to calculate (for example, as total estimated hours). In Riter, it is already used to recognize dangerous situations when the load on a developer in a certain sprint is greater than he or she usually can cope with. If such a situation takes place, Riter warns the team that tasks should be redistributed among other employees. We're sure that many other use cases can be found and implemented in project management tools to make them smarter. But let's go back to the problem of time estimation.

Let's take two sets of values - estimated and real (clean) time spent on tasks during some period of time by a developer. We want to process this data in such a way to get a function, which takes estimated time as an input, and returns real time that the task will take. To this end, we need to build a function that approximates the number of hours worked by users, then performs extrapolation of the obtained data to predict the timing of the sprint's completion. Pure numerical analysis, no machine learning or something else is needed here.

But this is still clean time estimates - they take into account only the accuracy of developers, not their productivity. Besides, many other important criteria remain unaccounted as well. That's not enough for long-term forecasting. Sometimes we also need to take into account the joint work on tasks, experience and possible problems of developers, delays in communication, vacations, the current workload in all projects, again, etc. Of course, some factors can be ignored, some calculations left for manual processing, but the picture is still complex. It is simply impossible to describe this with one formula or program with numerous if-elses. Namely, in such cases - when writing of the program's algorithm is too difficult or impossible - machine learning comes to the rescue.

For example, let's imagine a neural network as a black box which gets all known project's parameters as inputs, and transforms them into a set of unknown parameters you are interested in. The optimal size or even the composition of teams, the time of completion of tasks or sprints, the allocation of resources and funds among teams and so on. We even should not bother how it works under-the-hood. It can be more universal models built into your project management tool, or unique plugins written individually for your company's goals and conditions. The architecture, characteristics, use cases for such neural networks can be completely different.

At the moment, such an implementation doesn't exist, however, machine learning is now successfully applied everywhere, and we just need to adapt existing solutions from other fields (with similar goals and tasks) to project management and software development. For example, we could take a look at DeepTravel solution - a neural network based time estimation model with auxiliary supervision. Or try to adjust another neural network architecture for process time estimation for our needs. Or invent something new.

What can such a neural network or just a simple math function give us? Of course, it will be some numerical parameters, but what to do with them next? A project management tool could analyze the values obtained and make recommendations for the team. If the estimated time is too different from the predicted, it is possible that the developers should re-evaluate the task. Or, perhaps, it is worth using the estimates of each task by several developers. The software could also advise to reorganize the team, change the deadlines, re-assign a complex task to someone else, etc.

As we noticed before, the lack of training data will be one of the main problems in project management. However, there are some ways to get required datasets:

Your own projects' history. Just wait until your AI learns from your mistakes.
Public datasets. Some project management datasets are published at ISBSG, Openscience, and Hindawi. However, other people's examples are not always suitable for your workplace conditions.
Open source repositories. Do not limit yourself to the types of networks and tasks described here - neural networks can be trained with other available data to solve your problems. We could analyze the statistics of git repositories for this end. For instance, how much time a certain task takes. How many developers it requires. How many times it was reopened (in other words, how difficult it is).

As you see, there are many ways to make our project management tools more intelligent. This can be either simple mathematical operations or more complex AI models for forecasting, evaluation, and optimization, and we are convinced that in the future, any self-respecting project management tool will include their more or less simple implementation. That's why, in Riter, we're working on it already today, and so do many other companies.

Riter development team