My Title clearly indicated my lack of understanding about the core concept for pattern matching - specifically using .HTACCESS to block Bad-Bots from accessing a site, which they do in order to crawl or mirror copy, consequently using up bandwidth.
But my question isn't about .HTACCESS - it's about the obvious ineffectiveness (IMO) of using a long list for searching and matching.
Isn't it far better to Allow a list of positive, as opposed to Match against list of negatives??
1st explanation attempt:
If the User Agent does NOT match one of these Good-Bots, then block.
If the User Agent matches one of these Bad-Bots, then block.
I can't express this idea by formula or algorithm because I don't know how to - assuming it could be, but I suppose if I could express the idea it would be . . .
2nd explanation attempt:
If THIS is NOT 'A' ('A' being a list of positives), then deny.
If THIS is ONE INSTANCE of 'A' ('A' being a list of negatives), then deny.
If THIS is not RED, then deny.
If this is BLUE, YELLOW, GREEN, (i.e. NOT RED), then deny.
If this makes sense, why would a web developer use the latter approach if the latter's list is more than the former. Presumably there are less Good-Bots (User Agents) than there are Bad-Bots (ignoring the fact the UA can be forged.)?
Ultimately, wouldn't it be far better to create an index of all common Good-Bots and use this to search and match, rather than list a seemingly and potentially infinite list of Bad-Bots? (Not forgetting the time it would take to update such a list with new Bad-Bots.)
Why search & match for MORE negatives, than search for less positives?