Wu H.-HSHU-KAI HSIEH2021-08-122021-08-122017https://www.scopus.com/inward/record.uri?eid=2-s2.0-85085912649&partnerID=40&md5=e40f7ad95a29d6484adf34e75cf93da7https://scholars.lib.ntu.edu.tw/handle/123456789/577594Under the issue of gender and Natural Language Processing (NLP), most papers aim at gender-norm language that spoken by biologically males and females with opposite-sex desires. However, from the point of view of sexual orientation, this study presents the first work in the task of Chinese homosexual identification. Firstly, we collect homosexual texts from social media, and secondly examine linguistic behavior found in gay and lesbian texts. In addition, we also provide sets of linguistic features to automatically predict homosexual language with the adoption of 5-fold cross-validation Support Vector Machine (SVM) and Naive Bayes (NB) models. Training procedure in the study resulted in promising f-score around 70% with the use of particular lexicon-based feature set. ? The Association for Computational Linguistics and Chinese Language ProcessingBarium compounds; Computational linguistics; Natural language processing systems; Speech processing; Support vector machines; Cross validation; Feature sets; Lexicon-based; Linguistic features; NAtural language processing; Sexual orientations; Social media; Training procedures; Social networking (online)Exploring lavender tongue from social media textsconference paper2-s2.0-85085912649