Interplay between Upsampling and Regularization for Provider Fairness in Recommender Systems

In the presence of a minority group of item providers in the data (characterized by a sensitive attribute, such as gender or age), the items of these providers are considered as of lower relevance and are recommended to the users with a lower visibility (i.e., fewer times) and a lower exposure (i.e., in lower positions in the recommendation lists). Upsampling the interactions with the items of the minority group and regularizing the predicted relevance scores can mitigate disparities, without affecting recommendation effectiveness.

In a paper published by the User Modeling and Adaptive User Interaction journal (Springer), with Gianni Fenu and Mirko Marras, we study provider unfairness in recommender systems. We do this by considering the fact that (i) a provider is associated with multiple items of a list suggested to a user, (ii) an item is created by more than one provider jointly, and (iii) predicted user-item relevance scores are biasedly estimated for items of provider groups. Under this scenario, we assess disparities in relevance, visibility, and exposure, by simulating diverse representations of the minority group in the catalog and the interactions. Based on emerged unfair outcomes, we devise a treatment that combines observation upsampling and loss regularization, while learning user–item relevance scores.

Provider unfairness assessment

To assess provider unfairness, we consider three metrics:

Disparity in relevance, which measures the absolute difference between the representation in the catalog (intended as the percentage of items offered by a group) and the percentage of relevance for the minority group;
Disparate visibility, which measures the difference between the share of recommendations for items of a demographic group and the representation of that group in the catalog;
Disparate exposure, which measures the difference between the exposure obtained by a demographic group in the recommendation lists (i.e., in which positions the items of that group appear) and the representation of that group in the catalog.

We characterize the behavior of the BPR model on synthetic datasets, covering different percentages of catalog and observation imbalances. The following figure contains, on the left, each of the 15 datasets we created, where the x-axis contains the percentage of observations and the y-axis the percentage of items in the catalog for the minority group. Each cell contains the percentage of relevance of the minority group in that dataset. As the figure on the right shows, the bigger is the gap between the amount of observations and the representation in the catalog for the minority group, the larger is the disparity between the expected relevance and the observed one.

We can observe the same patterns also when measuring disparate visibility and exposure (see the paper for the detailed results). Indeed, in contexts with high catalog-interaction imbalances, there is a larger disparate visibility (exposure) against the minority group, based on its contribution in the catalog. The higher the disparate relevance is, the higher the disparate visibility (exposure) is.

Reducing disparities via upsampling and regularization

The analysis of the synthetic data has shown that the share of relevance may depend on the representation of provider groups in the catalog and the interactions. The more similar the two representations are for a group, the lower the resulting disparate relevance is. Nevertheless, this is unlikely to occur in real-world platforms.

For this reason, we devised an approach to mitigate provider unfairness that follows two strategies:

An upsampling of the interactions, to balance catalog-interaction representations. Specifically, we considered three setups, where the upsampled interactions were (i) selected randomly, (ii) fake and picked randomly, or (iii) fake and picked by considering the popularity of the items;
A regularization, to account for the distribution of relevance across groups. Our loss function balances the original accuracy loss, with a regularization term that minimizes the disparity in relevance.

Please, refer to the original paper for the technical details of our approach.

Experiments and results

We validate our approach on two widely studied domains, namely movies and education, considering the Movielens-10M and COCO datasets. Concretely, we consider the impact of each strategy (upsampling and regularization) and the effectiveness of each upsampling setup.

While the main paper contains the detailed results for four research questions, here are the main outcomes that can be extracted:

The upsampling of minority-group interactions reduces disparate impacts, i.e., the inequality of exposure, visibility, and relevance with respect to the contribution of the minority group in the catalog. The loss in recommendation utility is negligible or even absent in many cases. The amount of needed upsampling depends on the dataset and the upsampling technique.
Upsampling real existing interactions involving the minority can make it possible to achieve a good trade-off among recommendation utility, disparate impacts, and coverage. Upsampling minority-group interactions via fake user-item interactions is suitable when the minority group is very small.
Combining regularization and upsampling is crucial to fine-tune trade-offs achieved with the upsampling-only instance, especially when the up-sampled user-item interactions are fake.
Upsampling minority items and regularizing relevance can often lead to higher recommendation utility and lower disparities, compared with the state-of-the-art treatments, regardless of the dataset. This benefit does not necessarily imply a higher coverage of minority items.