It is because the level of attainable term sequences boosts, and the patterns that advise results turn into weaker. By weighting words in the nonlinear, dispersed way, this model can "discover" to approximate words and phrases and not be misled by any unfamiliar values. Its "understanding" of a presented phrase isn't really as tightly tethered on t