RE: CVE ID Syntax Vote - results and next steps
It might be useful to frame some of this discussion in terms of the areas where CVE ID format/syntax is applicable:
- search: a human enters a CVE ID (or an approximation thereof), possibly including imprecise copy-and-paste from another source.
- extraction: a process extracts CVE IDs from another source, e.g. scraping CVEs from a Bugtraq mailing list post, vendor advisory, or even an arbitrary selection of free text. The IDs may have been entered by a human.
- automated communication: one tool communicates a CVE ID to another.
- presentation: the CVE ID is displayed or "published" to a user within a document such as a web page.
The search and extraction may regularly involve human-entered IDs and may receive slightly malformed IDs or short-cuts (e.g. not all leading 0's provided). Opinions may vary on how to handle this in a way that balances usability and consistency. The automated communication and presentation might not involve human-entered IDs, or at least, due to their automation, might provide opportunities for automatically detecting obviously-malformed IDs that violate the syntax specification.
We have seen various examples of shortcuts made in presentation (e.g. omitting the "CVE-" prefix for narrower column width such as "1999-0067"), and extraction strategies can vary (e.g. in the form of different regular expressions). As background, the CVE web site currently does some types of normalization for search, so for example, "CVE-1999-67" and "1999-0067" both get automatically converted to CVE-1999-0067, and we even allow for "abc1999-0067def". We use slightly different logic for extracting CVEs from references that we monitor; for example, we look for a prefix with "CVE" because there are many other types of strings or IDs that contain two sequences of digits separated by a hyphen.