Chou, Szu YuSzu YuChouYI-HSUAN YANGLin, Yu ChingYu ChingLin2023-10-202023-10-202015-08-04978147997082719457871https://scholars.lib.ntu.edu.tw/handle/123456789/636334Evaluation is important to assess the performance of a computer system in fulfilling a certain user need. In the context of recommendation, researchers usually evaluate the performance of a recommender system by holding out a random subset of observed ratings and calculating the accuracy of the system in reproducing such ratings. This evaluation strategy, however, does not consider the fact that in a real-world setting we are actually given the observed ratings of the past and have to predict for the future. There might be new songs, which create the cold-start problem, and the users' musical preference might change over time. Moreover, the user satisfaction of a recommender system may be related to factors other than accuracy. In light of these observations, we propose in this paper a novel evaluation framework that uses various time-based data splitting methods and evaluation metrics to assess the performance of recommender systems. Using millions of listening records collected from a commercial music streaming service, we compare the performance of collaborative filtering (CF) and content-based (CB) models with low-level audio features and semantic audio descriptors. Our evaluation shows that the CB model with semantic descriptors obtains a better trade-off among accuracy, novelty, diversity, freshness and popularity, and can nicely deal with the cold-start problems of new songs.cold-start | Collaborative filtering | content-based recommendation | data splitting | evaluation metricsEvaluating music recommendation in a real-world setting: On data splitting and evaluation metricsconference paper10.1109/ICME.2015.71774562-s2.0-84946047951https://api.elsevier.com/content/abstract/scopus_id/84946047951