Success factors and pitfalls in a Machine Learning Project

Luise11 February 2021 6 min read

With an incredible range of applications and rapidly developing frameworks, machine learning is a fascinating topic. More and more companies are incorporating this trendy topic into their business processes. In order to make even the first projects a complete success, we at NextLytics provide you with success strategies and show you how to avoid common pitfalls in machine learning (ML) projects.

Specifics and challenges

Why do machine learning projects need a special approach? A significant difference to normal IT projects usually results from the use case itself. In ML projects, the questions are embedded in a multidimensional, complex environment. The technology is selected because conventional methods are not sufficient. Each ML project also entails new, typical challenges. The interpretability of the model and the selection of representative data are just some of them.
In addition, the models themselves - especially in Deep Learning - are often difficult to interpret. Even if the model produces great results, the question of "why" is not easy to answer. Under the buzzword ExplainableAI, approaches for resolving the black box can be found. However, this field is only just emerging. Not least because of this, the predictability of the required resources and time is a tricky task. Especially since in a lot of cases it is not possible to draw on previous experience from similar projects.

Complexity as a challenge
Own often novel difficulties
Black box application
Precarious project planning

Typical pitfalls in Machine Learning projects

Even before the project is started, you can contribute to avoiding typical pitfalls:

Fallstricke_2_en

Avoid unrealistic expectations. Machine learning is not a miracle worker. Even if the media and project presentations of digital pioneers suggest otherwise, a lot of time and work is behind the impressive results. An appropriate level of expectation is therefore essential to keep the enthusiasm and motivation high among those involved in the long term. Smaller projects are essential to gain experience and to determine what is possible in practice. Alternatively, the assessment of an experienced Data Scientist can help. If there is no person on the team with strong project experience, expertise can be externally sourced.
Also helpful for setting expectations is an informative kick-off meeting with all parties involved. There, the goals and also challenges should be talked through. Referring to these goals in further interim presentations is also recommended.

Is the data sufficient for the project purpose? Data is the basis of everything else. It is needed for the configuration of the model. On the one hand, the data should be accessible to a sufficient extent. On the other hand, the data should be of high quality and be representative. The model does not have any functions to detect bad data. Such data is simply accepted and leads to a reduction in the quality of the results.
As a countermeasure, the quality of the data must be checked before the project begins. Ideally, processes to ensure data quality are already established in the company. Important factors for the assessment are the amount of missing/invalid values, correctness and timeliness. In the exploratory data analysis at the start of the project, non-obvious factors such as outliers, distorted value ranges and redundancies can be discovered. On a case-by-case basis, it is then necessary to decide which corrective action is most appropriate for the use case (and the model used!).
It is also essential to consider any data privacy issues! If the data model has to be changed afterwards, the process of model generation starts all over again.

Finally, it must be clarified whether the maximum potential of the project is being exploited and real business value is being created. Often the enthusiasm for artificial intelligence is very strong and additional budget is set aside for showcase projects. Unfortunately, this can also lead to projects being launched quickly but then stuck at a proof-of-concept stage. While skills and confidence in the team are boosted by such projects, sustainable business value is not added. The solution to this is to focus primarily on making the model work productively. During the project, you should always strive to first produce a Minimal Viable Product (MVP) before initiating extensive optimization steps. As soon as the quality of the results is sufficient for the business context, you move on to implementation and automation. Optimization then takes place in a phase where the model already adds value and slowly gains acceptance.

Boost your business with
Artificial Intelligence and Machine Learning

Success factors for your project

In contrast to the pitfalls you should avoid, there are important strategies for success that will make your day-to-day project work easier:

Erfolgsfaktoren_en

Get everyone on board. Create an interdisciplinary team of IT experts and the business department. Make sure that the project status is comprehensible and does not resemble a statistics lecture. For effective goal achievement, the context knowledge of the business expert is just as important as the knowledge of the correct implementation in the productive system. With the involvement of application domain experts, you inspire the data analyses with known contexts from practice - and also intuitive hunches.
In addition, the business expert knows exactly what the added value of the machine learning application is from the user's point of view. From this perspective, close cooperation is critical for success during the implementation. This is where it is decided whether the project will grow beyond the pilot phase and can be used productively in everyday operations. ML Operations (MLOps) can be a promising solution for mastering this process.

MLOps background information

MLOps describes a set of processes, methods and tools for reproducible ML workflows and more effective collaboration between Data Scientists and operational staff. The machine learning lifecycle is positively impacted in quality and speed from model generation to implementation.

Derived from DevOps, ML Ops helps with its tools and best practices to integrate the machine learning model into daily business. Particular focus is placed on scalability, reproducibility of results, diagnostic capabilities and successful collaboration between the specialist teams.

Get used to agile, iterative working - you can't avoid it in machine learning projects. Define intermediate goals. Particularly at the beginning, the realities have not yet been explored. Many ideas will be tried out and discarded again. It can happen that new data sources, features and model components are introduced or that your project requirements change over time. The more it moves towards implementation, the more the project runs with typical milestones. The focus should be on building a production system that can be optimized afterwards. This way, your project team does not get lost in the details of model optimization or data cleanup.

Finally, you should make sure to allocate sufficient resources. This applies to the budget, the required infrastructure, the appropriate time frame for sufficient result maturity, as well as people with the right know-how. A cost-effective alternative to procuring your own infrastructure is to use various cloud services and platform solutions.

In our machine learning whitepaper you will find some recommendations regarding hardware, software and for the team composition.

If you need support in the planning and implementation of machine learning projects, NextLytics consultants will be happy to assist you to the desired extent with valuable practical experience. Together, we exploit the success factors to the full degree!

Luise

Luise Wiesalla joined NextLytics AG in 2019 as a working student / student consultant in the field of data analytics and machine learning. She has experience with full-stack data science projects and using the open-source workflow management solution Apache Airflow. She likes to spend her free time exploring her surroundings and being on the move.