The self-attention scores form a long-tail distribution, where the "active" queries lie in the "head" scores and "lazy" queries lie in the "tail" area. We designed the ProbSparse Attention to select ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results