Issues with High Cardinality vulnerabilities
>how many default passwords are there? 100? 200? Is that high
I guess that's the real question, and perhaps we need to answer it on
a case-by-case basis.
Let's consider a few examples that will come up during later
discussions (for other reasons). Let's suppose that the Editorial
Board decides that the CVE should have an entry that says something
like "a system has a Trojan horse installed." The question may
become, do we enumerate each possible Trojan horse that's out there,
and create a separate entry for each? Now we're talking about
something that's on the order of thousands of entries, not 100.
A different example is "a system-critical file has world write
permissions." How many different system-critical files are there on
any Unix or NT system? Hundreds, if not thousands. And that ignores
the fact that every machine has a slightly different combination of
A more radical example is CAN-1999-0501 from the PASS cluster, "A Unix
account has a guessable password." Consider identifying a particular
class of "instances" of this candidate. I doubt that anyone wants to
enumerate all the Unix account names out there that might have a
guessable password ;-) But some high cardinality vulnerabilities will
have similar features as this one.
Some of these examples will be considered later as part of other
content decision discussions. There are many different "high
cardinality" configuration problems. In some cases, we may decide
that it's important to enumerate each individual "instance," like we
might do with default passwords. But it may not make sense to do so
in other cases.
We also need to consider the different ways that the CVE may be used.
>From a tool perspective, it may make sense to enumerate every instance
of every high cardinality vulnerability and have a separate entry in
the CVE for it. (And that's what I see in most tools and databases
with respect to default passwords). From an IDS perspective, it may
But from the perspective of someone who has to create a CVE mapping or
use it to help maintain their own database, having too many entries
can create too much work for someone who's trying to link to the CVE.
Consider the amount of effort that is required just to validate these
original 650 vulnerabilities. Say that at this instant we have 2000
publicly known vulnerabilities, using the levels of abstraction that
the CVE itself is using right now. Depending on which content
decisions are made, that 2000 could turn into 10,000 or 20,000. It
makes the CVE less manageable for anyone who wants to use it.
The impact on tools is that the CVE won't solve all the
interoperability problems. It may only provide a basis for a more
refined "language" that is capable of sharing information at such a
low level of detail.
I'm not saying that we should always adopt a higher level of
abstraction just because it means there will be fewer CVE entries.
But we need to make sure that we have good, solid reasons to justify
such choices to anybody who has to use the CVE. Creating a separate
entry for default passwords is justifiable. Enumerating other high
cardinality vulnerabilities may not be justifiable.
Perhaps "high cardinality" is an inappropriate term here. Maybe a
better term is "not possible to enumerate." We can list each of the
known default passwords. We probably can't list all of the
system-critical files. And the "end user" who wants to see this
information may not want us to. (Though Adam pointed out at Black Hat
that the problem of users swimming in too much data could be addressed
by better reports.)