explain, scoring sheet

Scoring Sheet Wizardry: Transforming Data into Insightful Scores

Žan Mervič

Nov 21, 2023

Machine learning models are rapidly advancing, becoming increasingly powerful and complex tools. This can be very useful in areas such as finance, where accuracy is paramount. However, complexity can be a double-edged sword, especially when trying to explain the predictions of these models, which can sometimes be crucial.

Consider the field of medicine, where a model might predict the probability of cancer. In such a scenario, the stakes are incredibly high. It's not just about the model spitting out a number; it's about why it gave that number. Why should a doctor or the patient trust this result?

This is where model interpretation widgets like the new Scoring Sheet and its companion, the Scoring Sheet Viewer, come into play in the Orange data mining toolkit. These widgets promise to simplify complex data scoring tasks with a fresh perspective on model explanation, enabling professionals in critical fields like healthcare to make better-informed decisions.

Hopefully, by now, you understand the need for explainable and interpretable machine learning models, so let's dive into the Scoring Sheet and see how it works. Instead of me mindlessly rambling on about how the widget works and how to use it, let's try to understand it by using it in a real-world scenario in which we will try to predict the risk of heart disease using the Heart Disease dataset from the UCI Machine Learning Repository.

The workflow above shows the most straightforward way of using the Scoring Sheet widgets. Here, after training the Scoring Sheet model using our dataset, we input it into the Scoring Sheet Viewer widget, which presents us with a, you guessed it, scoring sheet. It shows each feature's contribution to the final score, where a higher score indicates greater chance for an individual to be classified with the target class. Each feature's contribution can be positive or negative, indicating whether it increases or decreases the risk.

Before we continue with the example, let's try to understand how to use the Scoring Sheet widget and what each parameter does.

Number of Attributes After Feature Selection - This widget requires all features to be binary, resulting in a preprocessing pipeline that discretizes continuous features and one-hot encodes categorical ones. This parameter helps to manage (reduce) the potentially large number of resulting features and ensure a faster learning process by selecting only the best ones for model training.
Maximum Number of Decision Parameters - Limits the number of decision parameters in the model, balancing complexity and explainability. More parameters can increase accuracy but make the model harder to explain.
Maximum Points per Decision Parameter - Controls the range of points each decision parameter can contribute. A wider range can increase model complexity and accuracy but may reduce explainability.
Number of Input Features Used - Specifies how many original features (before binarization) the decision parameters can originate from. This is useful for ensuring each parameter originates from a unique feature or when only a subset of features is desired.

If you are wondering how the Scoring Sheet Widget works, all you need to know is that the backbone of it is an algorithm called FasterRisk. If you want to learn more about it, you can read the paper and the documentation.

Let's return back to the example and focus on the scoring sheet.

I've modified the workflow by dividing the data, with a portion routed to the Table widget. This setup allows us to select instances and observe how the scoring sheet performs with new, unseen data.

Let's analyze and learn to interpret the scoring sheet using the example. It features five decision parameters, with points ranging from -5 to 5. Given that the target class is '0,' indicating the absence of heart disease, positive-value decision parameters reduce the risk of heart disease, while those with negative values increase it.

Consider a selected instance from the Data Table widget. It has a 'major vessels colored' attribute value below 1, which reduces the heart disease risk by 5 points. However, it also has the 'chest pain' attribute set to 'asymptomatic,' increasing the risk by 5 points. This combination results in a total score of 0, corresponding to a 78.2% probability of not having heart disease.

And there you have it. Understanding and reading the scoring sheet is straightforward once you grasp these basics. I am sure you can't wait to try out the new Scoring Sheet widgets yourself by now. Still, before you do, I should tell you that the Explain add-on offers many other widgets (with visualizations much prettier than the scoring sheet) that can help you better understand your models and predictions, so let's compare them.

While the Scoring Sheet provides a simple, straightforward, and easily understandable risk assessment, a Nomogram offers a more user-friendly visual model for individual outcome predictions, ideal when personalized insights are critical. Feature Importance, on the other hand, highlights the most influential variables in your model, perfect for simplifying models by identifying crucial features. SHAP dives deeper, offering detailed, individual-level explanations of feature contributions essential for in-depth model analysis. Lastly, ICE Plots provide a granular view of how changes in a feature affect individual predictions, which is invaluable for models with significant interactions or non-linear relationships. While the other visualizations may be more complex and challenging to understand, in scenarios where understanding the nuanced impact of each feature is crucial, these widgets may offer more tailored insights than the Scoring Sheet.

While the main focus of this blog was the Scoring Sheet widgets and their explainability, the model's accuracy is also important. We've compared the ScoringSheet model with some other popular classification models using cross-validation on different datasets, and the results are shown in the Test and Score widget. As you can see, the Scoring Sheet model is not the most accurate but it is not far behind. However, the Scoring Sheet model is much more explainable than most of the other models, which is the main focus of this blog.

That is all for this blog. If you want, you can download the workflow which we used for the exaple and try the Scoring Sheet widgets for yourself.