[CVEPRI] CVE accuracy, consistency, stability, and timeliness
While this email has arisen out of content decision (CD) discussions,
it touches on a number of critical issues that ultimately affect all
consumers of CVE. Therefore I've added a CVEPRI tag to this thread.
These discussions have illuminated a number of CVE content issues that
we have been wrestling with at MITRE in the last year, but especially
the past six months.
Pascal Meunier asked:
>if we have to analyze the nature of a vulnerability to make a CVE
>entry, haven't we gone too far with respect to the stated goals of the
Bill Fithen added:
>We should guard against creating a situation where in-depth analysis
>is required... just decide if a thing is one thing or two things.
>...Pursuing perfection too early may mean the introduction of delays
>that make the eventual acceptance of an entry less valuable merely
>because of the delay
We've been doing deep analysis of CVE items at MITRE, because I have
believed that it's important for CVE to be as consistent, accurate,
and stable as possible. And there have been delays as a result. Note
that there has been internal disagreement about this approach, and
we've been revisiting this issue in the past few weeks.
Various members of the CVE content team have conducted some deep
analysis to try and resolve some issues, e.g. the lpr problem that
L0pht announced in February that's still around after its initial
discovery several years ago. Linux problems, and Unix problems in
general, can also be troublesome because there are so many different
distributions that fix the same problem at different times.
This deep analysis has been a significant bottleneck with respect to
creating candidates and distinguishing between entries. In some
cases, the content team may spend several hours researching a single
issue that could be one candidate or several. The deep analysis may
involve poring through various information sources, patches, exploits,
software change logs, etc. - i.e. the type of research that I assume
people do for full-fledged vulnerability databases. With 10,000
legacy submissions for us to convert into candidates, I don't believe
we have the resources to do it all if we have to perform deep analysis
on 10% of them. And in the end, as people have pointed out, you will
never be completely sure of accuracy, because there's so much
My approach has been that if the CVE list is to be a "standard," it
should be both stable and reliable. (I say "CVE list" to distinguish
it from the candidates list, which we already accept to be
unreliable.) Maintainers of proprietary vulnerability databases
generally have more flexibility to change their own entries. With
CVE, a change could have an effect on many different consumers.
So I've been careful to avoid creating candidates that might be
duplicates of other candidates or entries, careful to be consistent
with respect to level of abstraction, and much more careful not to add
any candidate to the CVE list if it looks like it's a duplicate of an
existing entry. If we have to change the level of abstraction of CVE
entries very often, that becomes a maintenance problem for people who
maintain CVE compatible products - or, if the mappings aren't kept up
to date, CVE compatibility becomes less useful to the consumers of
those products. Changes in CVE will also have an impact on the
quality of any quantitative analysis that uses CVE names as a way of
normalizing the data.
This is one of the main reasons why the CD's are as detailed and
"strict" as they are. They attempt to make CVE as stable and
consistent as possible, as early in the process as possible. They
attempt to minimize the amount of modification to existing CVE
entries, and to minimize the amount of work for Editorial Board
members and maintainers of CVE-compatible products. On the other
hand, it is very labor-intensive and results in delays.
Perhaps a portion of the deep analysis can rely more heavily on the
expertise of Board members. If 2 candidates look similar, they could
be tagged to indicate that they need deeper analysis. Anybody who has
some good insight into the problem could provide feedback; if nobody
has enough information, maybe we move to a default position of
splitting or merging as appropriate. We could, as has been suggested,
annotate potentially related CVE entries (or candidates) and make that
information available to the few individuals who would need it.
Another way of minimizing the effects of poor information would be to
involve the software vendors as much as possible. This could be done
by bringing major software vendors onto the Editorial Board, and/or in
some consulting role; but with minor vendors, it could be an
especially labor intensive job that could duplicate some of what
others in the community are already doing. And while insufficient
vendor confirmation of security problems may be a significant problem,
maybe CVE isn't the right place to solve this. (Note that we are
looking to add more software vendors to the Board, so if you have any
recommendations, let me know.)
As you've seen in the CD's proposed so far, the default action has
generally been to MERGE two issues when there's incomplete
information. But several people have expressed a preference to keep
the issues SPLIT if there's no good information available otherwise.
I agree with David LeBlanc - I think we'll pay a price regardless of
which default action we choose. My initial thinking is that a default
SPLIT action would make the CVE maintenance job a lot easier - but we
have to consider the impact on the users of CVE.
So we need to have some feedback from people who have CVE compatible
products, to understand the potential impact of moving away from deep
analysis. I estimate that a maximum of 15% of all CVE entries could
ultimately require a change in the level of abstraction as new
information is discovered. Realistically, it may be more like 5%
(because most would be corrected in the candidate stage, and/or we may
decide to live with the "noise" in the absence of good information).
Note that I got the 15% figure based on the percentage of candidates
that are affected by content decisions related to abstraction, and of
course these figures can't really be measured anyway.
So to CVE-compatible database and tool vendors, and anyone who expects
to be conducting "CVE-based" analysis - is a 15% error rate tolerable?
How about 10% or 5%?
Perhaps we can minimize the amount of serious modifications to CVE
entries (e.g. SPLITS, MERGES, or deprecations) by only performing them
a few times a year, say in each reference version, to minimize the
impact on maintainers and users.
The fundamental question is: how much effort should be put into making
sure that CVE entries are accurate and stable, and can we live with
the extended review process that it would entail (in other words,
business as usual)? Or are we willing to accept some inaccuracy and
additional mapping maintenance in order to allow CVE to remain