Loading: Researchers find ‘inconsistent’ benchmarking across 3,867 AI research papers