Being able to assess explanation quality in recommender systems and by shaping recommendation lists that account for explanation quality allows us to produce more effective recommendations. These recommendations can also increase explanation quality according to the proposed properties, fairly across demographic groups.
In a SIGIR 2022 paper, with Giacomo Balloccu, Gianni Fenu, and Mirko Marras, we explore the space of relevant explanation properties through a mixed approach, combining literature review (also including psychological dimensions) and userβs studies (investigating which and whether users perceive certain properties as valuable). In this way, we identified properties recognized by users as important, such as recency, popularity, and diversity.
The recognized importance of these properties motivated us to operationalize three novel metrics for recency, popularity, and diversity of explanation. We then proposed a suite of re-ranking approaches that optimize the top-π list of recommended products and the accompanying explanation paths in the Knowledge Graph (KG) for the proposed metrics. We assessed the impact of our approaches on recommendation utility and the proposed explanation quality, investigating whether any trade-off aroused. Finally, we investigated how these impacts affect different demographic groups protected by law (i.e., gender).
The source code for this study was made available in our Software Impacts paper.
Explanation property design
Our literature analysis covered prior work on the general definition of explainable RSs, as well as beyond-accuracy properties investigated in the traditional RS research, e.g., time relevance, diversity, and novelty. Our analysis led to finally conceptualize three key properties for the produced explanations. We use an example path, namely π’π listened π πππ1 featuring πππ‘ππ π‘1 featuring π πππ2, to showcase each property.
Recency of the Linking Interaction. The first explored property is the recency of the user interaction with the product included in the selected path, i.e., π’π listened π πππ1 in the example path. Indeed, an explanation related to a recent interaction would be intrinsically easier to catch for a user, while older interactions might not be perceived as valuable nor remembered by the users.
Popularity of the Shared Entity. The second explored property is the popularity of the shared entity, i.e., πππ‘ππ π‘1 in the example path. We consider to investigate the extent to which the popularity of the shared entity can influence the perceived quality of the explanation as well. For instance, an artist who featured 20 songs might be considered more popular that one who featured 2 songs. Indeed, in case a very unpopular recommended product is given, an explanation that contains a popular entity can help the user decide whether that product can be interesting for them.
Diversity of the Explanation Type. Considering explanations provided in a recommended list as a whole, a possible conceptualization of diversity is that the more explanation path types we present, the better the explanations are perceived. For example, in the music domain, we might consider explanation types including featured (as in the example path), wrote by, and composed by, and aim to cover them in the provided explanations in a reasonably balanced way.
Online assessment
To ensure the process was scalable, we prepared and sent out a five-minute questionnaire. We specifically investigated whether users prefer to receive explanations connected to (i) recent/old interactions, (ii) popular/unpopular shared entities, and (iii) a wide/tiny variety of types.
In what follows, we report the main outcomes of our assessment:
- 64.6% of the participants preferred to see an explanation involving a product closely experienced in time, 6.8% opted for explanations involving older interactions, and the remaining 28.6% of the participants declared that this property would not be relevant to them;
- 40% of the participants preferred a popular shared entity, while 24.3% preferred an unpopular shared entity. 35.7% of the participants marked this property as not relevant;
- 70% of the participants were in favor of the recommended list accompanied by highly diverse explanation types. Surprisingly, 25.7% of the participants expressed their preference towards a low diversity, aligning with prior work that showed how the propensity to diversity depends on the userβs personality. 4.3% of the participants declared that this property would not be relevant.
Explanation property operationalization
Linking Interaction Recency (LIR). This property serves to quantify the time since the linking interaction in the explanation path occurred. Given a user, we sort the set of products this user interacted with chronologically. We applied an exponentially weighted moving average to the timestamps included in the list, to obtain the LIR of each interaction performed by the user.
Shared Entity Popularity (SEP). This property serves to quantify the extent to which the shared entity included in an explanation-path is popular. We assume that the number of relationships a shared entity is involved in the KG is a proxy of its popularity. We sort the list of entities of a given type in the KG based on their popularity. We applied an exponential decay to the popularity scores, to get the SEP of an entity of a given type.
Explanation Type Diversity (ETD). This property serves to quantify how many different types of explanations are accompanying the recommended products. The ETD of user π’ is computed as the number of unique types in the selected explanations relative to the minimum between the size of the recommended list and the total number of possible explanation types.
Explanation property optimization
Given that it is generally hard to embed the proposed properties in the internal model learning process, we propose to re-arrange the recommended lists (and the explanations) returned by a recommendation model, a common practice known as re-ranking.
We specifically, propose two classes of re-ranking approaches, whose details can be found in the paper:
- The first class, namely soft, includes approaches that re-rank the explanation paths for each recommended product according to one or more explanation properties, but not the originally recommended products;
- The second class, namely, weighted, includes approaches that re-rank both the recommended products and the associated explanation paths.
Experimental evaluation
Our experiments were conducted on MovieLens-1M (ML1M) and LastFM-1B (LASTFM), two public data sets that vary in domain, extensiveness, and sparsity. The considered baselines included two traditional matrix factorization models (FM, NFM, BPR), three explainable recommendation models based on regularization terms (CFKG, CKE, KGAT), and one explainable recommendation model based on explanation paths (PGPR).
In what follows, we report the main outcomes emerging from our study:
- Optimizing for explanation property x not only causes gains in x for the resulting explanations, but positively affects other explanation properties too (e.g., optimizing for SEP leads to gains on both SEP and ETD and vice versa, and SEP benefits from optimizing on LIR), showing a positive interdependence. Even with πΌ = 0.1 (the parameter that expresses the trade-off between relevance and the target property), our re-ranking leads to significant gains (β₯ 50%) on the optimized property, without a significant loss (just β€ 1%) in recommendation utility.
- Recommendations obtained through our re-ranking approaches achieved state-of-the-art NDCGs. In both data sets, all re-ranking approaches achieved a NDCG equal or at most 2 points lower than the non-(path-)explainable baselines. This negligible loss is observed in the cases where diversity is included as a property to optimize. Our study interestingly shows that accounting for beyond-accuracy aspects related to user-level explanation quality often does not lead to losses (when observed, they are negligible) in recommendation utility.
- Gains in explanation quality are large and proportional to the baseline PGPR value, on both data sets. Higher gains are observed for ETD than other properties. Interestingly, considering all three properties jointly does not lead to the highest overall explanation quality. This highlights possibly diverging optimization patterns across properties that vary according to the domain and characteristics of the data set.
- In general, the (un)fairness in recommendation utility measured for the original model is not statistically impacted by our approaches. Similar observations hold for the explanation properties, except for the fact that our approaches mitigate unfairness in ETD in LASTFM. Both the original model and our approaches tend to lead to unfairness in SEP.
1 thought on “Post Processing Recommender Systems with Knowledge Graphs for Recency, Popularity, and Diversity of Explanations”
Comments are closed.