Microsoft: Our AI can spot security flaws from just the titles of developers’ bug reports

Microsoft has revealed how it’s applying machine learning to the challenge of correctly identifying which bug reports are actually security-related.

Its goal is to correctly identify security bugs at scale using a machine-learning model to analyze just the label of bug reports.

According to Microsoft, its 47,000 developers generate about 30,000 bugs a month, but only some of the flaws have security implications that need to be addressed during the development cycle.

Microsoft says its machine-learning model correctly distinguishes between security and non-security bugs 99% of the time. It can also accurately identify critical security bugs 97% of the time.

The model allows Microsoft to label and prioritize bugs without necessarily throwing more human resources at the challenge. Fortunately for Microsoft, it has a trove of 13 million work items and bugs it’s collected since 2001 to train its machine-learning model on.

Microsoft used a supervised learning approach to teach a machine-learning model how to classify data from pre-labeled data and then used that model to label data that wasn’t already classified.

Importantly, the classifier is able to classify bug reports just from the title of the bug report, allowing it to get around the problem of handling sensitive information within bug reports such as passwords or personal information.

“We train classifiers for the identification of security bug reports (SBRs) based solely on the title of the reports,” explain Mayana Pereira, a Microsoft data scientist, and Scott Christiansen from Microsoft’s Customer Security and Trust division in a new paper titled Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data.

“To the best of our knowledge this is the first work to do so. Previous works either used the complete bug report or enhanced the bug report with additional complementary features,” they write.

“Classifying bugs based solely on the tile is particularly relevant when the complete bug reports cannot be made available due to privacy concerns. For example, it is notorious the case of bug reports that contain passwords and other sensitive data.”

Microsoft still relies on security experts who are involved in training, retraining, and evaluating the model, as well as approving training data that its data scientists fed into the machine-learning model.

“By applying machine learning to our data, we accurately classify which work items are security bugs 99% of the time. The model is also 97% accurate at labeling critical and non-critical security bugs. This level of accuracy gives us confidence that we are catching more security vulnerabilities before they are exploited,” Pereira and Christiansen said in a blogpost.

Microsoft plans to share its methodology on GitHub in the coming months.

Hot this week

The Hidden Costs of Overengineering Security

Complex security systems often create more vulnerabilities than they prevent by overwhelming teams with noise and maintenance demands while missing actual threats.

The True Cost of Chasing Compliance Over Security

Compliance frameworks create a false sense of security while modern threats evolve beyond regulatory requirements. Learn how to build actual protection rather than just checking boxes.

The Hidden Risk of Over Reliance on AI Security Tools

Over reliance on AI security tools creates dangerous blind spots by weakening human analytical skills. True resilience comes from balancing technology with continuous team training and critical thinking.

The Quiet Dangers of Overlooking Basic Security Hygiene

Basic security hygiene prevents more breaches than advanced tools, yet most teams overlook fundamentals while chasing sophisticated threats.

Your Password Strategy Is Wrong and Making You Less Secure

The decades-old advice on password complexity is forcing users into insecure behaviors. Modern security requires a shift to passphrases, eliminating mandatory rotation, and embracing passwordless authentication.

Topics

The Hidden Costs of Overengineering Security

Complex security systems often create more vulnerabilities than they prevent by overwhelming teams with noise and maintenance demands while missing actual threats.

The True Cost of Chasing Compliance Over Security

Compliance frameworks create a false sense of security while modern threats evolve beyond regulatory requirements. Learn how to build actual protection rather than just checking boxes.

The Hidden Risk of Over Reliance on AI Security Tools

Over reliance on AI security tools creates dangerous blind spots by weakening human analytical skills. True resilience comes from balancing technology with continuous team training and critical thinking.

The Quiet Dangers of Overlooking Basic Security Hygiene

Basic security hygiene prevents more breaches than advanced tools, yet most teams overlook fundamentals while chasing sophisticated threats.

Your Password Strategy Is Wrong and Making You Less Secure

The decades-old advice on password complexity is forcing users into insecure behaviors. Modern security requires a shift to passphrases, eliminating mandatory rotation, and embracing passwordless authentication.

Why API Security Is Your Biggest Unseen Threat Right Now

APIs handle most web traffic but receive minimal security attention, creating massive unseen risks that traditional web security tools completely miss.

Security Teams Are Asking the Wrong Questions About AI

Banning AI tools is a failing strategy that creates shadow IT. Security teams must pivot to enabling safe usage through approved tools, clear guidelines, and employee training.

The Illusion of Secure by Default in Modern Cloud Services

Moving to the cloud does not automatically make you secure. Default configurations often create significant risks that organizations must actively address through proper tools and processes.
spot_img

Related Articles

Popular Categories