The size of a provider’s catalog in a platform affects the exposure that will be given to that provider by session-based recommender systems. Small providers, that are as popular as the big ones, are likely to get under-exposed in the recommendations.
In an ECIR 2021 paper, with Alejandro Ariza, Francesco Fabbri, and Maria Salamó, we highlight side effects over the providers caused by state-of-the-art sesión-based recommendation models. We focus on the music domain and study how artists’ exposure in the recommendation lists is affected by the input data structure, where different session lengths are explored.
To assess these phenomena, we consider four session-based systems (namely, Association Rules, S-KNN, GRU4RECC, and NARM) on three types of datasets, with long, short, and mixed playlist length, extracted from the lastfm. We provide measures to characterize disparate treatment between the artists, through a systematic analysis by comparing (i) the exposure received by an artist in the recommendations and (ii) their input representation in the data.
Results
While the paper contains detailed results, here are the main take-home messages:
- S-KNN is the most effective approach in all datasets, minus the long-session one.
- When comparing the datasets, the short-session one (LFM-S) produces the most effective predictions. Hence, when sessions get longer, algorithms cannot capture users’ interests and understand what might be relevant to them.
- NARM returns a distribution of provider exposure closest to the test, thus creating a trade-off between recommendation effectiveness and distribution of providers.
- Longer-session data, reveals that longer sequences of interactions increase the unpredictability for the user, leading to a precarious artists representation.