Practicing the right math: Enhancing trust in an e-learning platform using an explainable recommender system

Date

In 2020-2021, I was the daily advisor for a master's thesis by Shotallo Kato, a student in Computer Science at KU Leuven. Shotallo built upon my mathematics learning platform Wiski for high school students (i.e. adolescents):

  • He tailored the difficulty level of exercises to students' level of mastery, and automatically recommended the most suitable exercises to students;
  • He provided an explanation interface such that students could understand why Wiski recommended a specific exercise to them.

Underneath is a summary of Shotallo's results. Find the full text here:

Tailoring exercises' difficulty level

To tailor the exercises' difficulty level to students’ level of mastery, Shotallo set up an Elo rating system for students and exercises: if a student correctly solves an exercise, their Elo score rises and the exercise’s Elo score drops, and vice versa. This Elo information was used in two ways:

  1. Students could manually pick exercises on Wiski and see which ones were relevant for them by looking at difficulty labels next to the exercise number (green = easy, orange = medium, red = hard).
  2. To also automate exercise selection, a recommender system combined Elo ratings and collaborative filtering. Broadly, the recommendation algorithm looked for candidate exercises based on a student’s Elo rating and recommended the three exercises that the student was most likely to answer correctly. These recommendations appeared when students correctly solved an exercise.
Exercise page on WiskiExercises on a specific topic with indicated difficulty levels

Explaining recommendations

The recommended exercises were accompanied by one of 3 explanation interfaces:

  1. The real explanation interface contained a histogram and a textual explanation, which clarified that recommendations are based on how many attempts similar students need to solve the exercise.
  2. The placebo explanation interface only stated that Wiski's algorithm computed that the exercise is suitable for the student. Obviously, this does not reveal anything about why it was recommended.
  3. The no explanation interface did not provide any feedback about why the exercise was recommended.

Shotallo investigated how adolescents react to these explanation interfaces. The results were mixed. Some students found the real explanations satisfactory, others wanted a more detailed explanation for the recommendations, and yet others did not really read the explanations altogether. Students who saw placebo explanations were also divided about the non-revealing text: some requested a real explanation; others did not ask for more details. Finally, some students who saw no explanation requested explanations for the recommendations; others did not. This shows that adolescents perceive the utility and transparency of explanation interfaces differently. Thus, one identical explanation for everyone is not the way to go. Instead, explanations need to be tailored to the students that see them.

Real explanationPlacebo explanationNo explanation

Does explaining recommendations increase trust in Wiski?

Shotallo's main research question was about how the explanation interfaces affect adolescents' trust in Wiski. Thus, after students solved 5 exercises, they filled out a questionnaire focused on their trust in Wiski. Surprisingly, it turned out that the answer to the research question depends on how trust is measured:

  • If trust is measured one-dimensionally with a single question, there are no significant differences in trust between students who saw real, placebo, or no explanations for the recommendations.
  • However, if trust is measured multidimensionally as an average of trusting beliefs, intention to return, and perceived transparency, students who saw real explanations had significantly more trust in Wiski.

This two-sided result suggests that one question cannot capture the multi-faceted nature of trust, and that dynamically learned factors like the perceived accuracy of the recommendation algorithm and the website’s appearance may be the leading cause for gaining trust in the platform.