Re: Sources: Full and Partial Coverage
On 6/25/12 7:06 AM, Carsten Eiram wrote:
> Joining the discussion quite late after five weeks vacation and a lot
> of catching up with my i Today, we track an immense number of sources
> in order to provide as broad and deep a coverage as possible. I,
> however, estimate that an extremely small number of these sources
> likely provide us with 90-95% of the valid reports that we cover. The
> rest of the monitored sources (the vast majority) only provide the
> last 5-10% (where many are not for popular products), but I estimate
> that at the same time those sources provide ~80% of the noise we have
> to go through every single day.
> If you want to do full coverage of any popular product, you need to
> do it properly and that requires tracking a lot of very random
> sources and adding new ones regularly (including Chinese blogs where
> most of the time the quality of the local jaozi is discussed with one
> or two blogs a year about new vulnerabilities in Windows). I'm sure
> everyone on this board representing a VDB is familiar with this
Do we really need to restrict the list of sources too heavily? I'll
guess that Secunia and other places doesn't do all this monitoring by
1. Get a bunch of sources, of course the A-list stuff, but just add new
sources as you come across them. wget, curl, Yahoo! Pipes, r2e, etc.
2. Normalize to text and possibly for language (I'm assuming English,
might be necessary to support others).
3. Index (lucene, solr, Google, etc).
4. Have set search filters for things on the product list.
5. Have set searches for phrases that indicate important vulnerabilities
("overflow", "XSS", etc).
6. Maybe have additional manual checks of the A-list sources.
7. Manual inspection of results of #4 and #5.
8. Marking of duplicates/similarity scoring.
9. Manual inspection of what's left?
Hits on #4 and #5 get priority manual effort to include in CVE. #6 and
#9 get remaining manual effort?
This isn't a trivial solution, but it also isn't impossible. I guess I
don't quite see that one additional source requires an equivalent amount
of additional effort.