Predict CVSS score for new CVEs

Rudra
5 min readNov 28, 2020

--

This article demonstrates the use of text analysis and linear regression to predict score for a newly identified software vulnerability. The solution demonstrated here relies on the historical CVE (Common Vulnerability Enumeration) data stored in NVD (National Vulnerability) database, and scikit-learn libraries. It uses the “summary” text of the vulnerability to extract features, and then predicts CVSS score (both V2 and V3 scores).

The RMSE (root mean squared error) for the current experiment, which measures how close are the predictions to the actuals, is 1.33.

Introduction

CVSS (common vulnerability system) is a vulnerability scoring system for software vulnerabilities that has been developed to help software industry to prioritize and remediate vulnerabilities promptly and reduce the number of attacks on systems. There are two versions: V2.0 and V3.x standards. The scores range from 0.0 to 10.0, and their qualitative severity rankings differ from V2 to V3. For V2, Low is 0.0–3.9, Medium is 4.0–6.9, and High is 7.0–10.0. For V3, Low is 0.1–3.9, Medium is 4.0–6.9, High is 7.0–8.9, and Critical is 9.0–10.0.

After disclosure of a vulnerability, the CVSS score may not be available for few days. In that situation, the proposed solution can be used to get an estimation of possible score for the newly identified vulnerability.

The rest of the paper describes data, its source, approach for extracting features, and use of LinerRegression to predict scores for different sets of features. The complete code from Jupyter Notebook is provided at the end of the article.

Data

Figure 1 gives a pictorial view of the dataset. The columns used for this experiment are “summary” and “cvss”.

Figure 1 CVE Data

This dataset was available on Kaggle. It had 89,660 records. It had CVE data published between Jan, 1999 and Nov, 2019.

Feature Extraction and Selection

The summary column is textual data that provides a high-level summary of a CVE when it gets disclosed for the first time. I broke down this text into a bag of words using “CountVectorizer” function available in “sklearn”. Here is the code extract

# will convert text to number vectorvectorizer = CountVectorizer(stop_words=stop_words)# convert text into word featuresX_train = vectorizer.fit_transform(X_train)X_test = vectorizer.transform(X_test)

For the dataset used, there were 82,934 words after eliminating common English stop words. I used the “SelectKBest” function from “sklearn” to identify the most influential features. I used “f_regression” as the score function that uses F value to determine statistically significant features. Figure-2 shows the relationship between actual and predicted values for different number of features (words) selected.

Figure 2 Actual vs. Predicted CVSS Scores

As can be seen from the picture above, there was large variation between prediction and actual when all features (words) were used. Then I experimented with multiple of 100 features (words) at a time. The variation improved with 100 words, 200, and 300 words, and started staying the same or diminished as the number of features (words) were increased.

Model Training

The goal was to predict CVSS score for the newly identified vulnerability that would fall between 0 and 10. Since this was a continuous target, I used multivariate regression for building the model.

For model training, I divided the total dataset into 70 and 30 ratios for training and test data. I used “LinearRegression” from “sklearn” to create the model. I evaluated the model using all, 100, 200, 300, 400, and 500 features. Figure-3 shows the comparison of RMSE¹ (Root Mean Squared Error) and R-square² values between these different experiments.

Figure 3 Model Scores

Model Prediction

I gathered 21 CVEs from NVD site those were latest CVEs to be scored as of 18th Nov 2020. I used the model to predict CVSS scores for these vulnerabilities and compared those scores against the actual scores given by NVD. Figure-4 shows the comparison for V2 scores, and Figure-5 shows for V3.x scores. The x-axis represents the difference between predicted and actual scores. The y-axis represents the frequency of this difference.

Figure 4 CVSS V2.0 (Prediction — Actual) Distribution

The greater number of data closer to zero on x-axis is better, meaning the model is predicting more CVEs closer to their actual values. As can be seen from the figure above, the experiment with 300 features provides better prediction. However, this is not the case for CVSS V3.x prediction (Figure 5). The possible explanation is that CVSS V3.x score started in Sept, 2019 whereas the dataset used for this experiment has CVSS V2.0 scores only.

Figure 5 CVSS V3.x (Prediction — Actual) Distribution

Figure 6 shows a comparison between the predicted and actual values.

Figure 6 Prediction vs. Actuals

Conclusion and Future Work

The current model can be used to estimate a possible score for the vulnerability as soon as it has been disclosed. This model can be further improved using other techniques and additional features. Here are some ideas for future work

  • Use combination of words (n-grams) to predict score
  • Use CPE (Common Platform Enumeration) if available
  • Create separate models for CVSS V2 and V3 prediction
  • Use mutual info regression function to score and select features
  • Use Neural Network for predicting score

[1] Root Mean Squared Error (RMSE): The mean squared error tells you how close a regression line is to a set of points. The lower is better.

[2] Coefficient Determination (R-squared): The coefficient of determination effectively compares the variance in the data to the variance in the residual. The residual is the difference between the predicted and observed value and its variance is the sum of squares of this difference. The closer to 1 is better.

Code

Sign up to discover human stories that deepen your understanding of the world.

--

--

No responses yet

Write a response