Article Preview
TopIntroduction
A knowledge graph (KG), such as Freebase (Tanon et al., 2016), Wikipedia (Arvola & Alamettälä, 2022), or DBpedia (Lehmann et al., 2015), provides structured representations of facts about real-world entities and relations (Yan et al., 2024). Given one or more query entities of a KG, the task of finding associative entities revolves around obtaining a ranked list of associative entities by calculating their association degrees. This task has gained much attention (Zhou et al., 2020) and is indispensable in search engines (Yang et al., 2019) and recommendation systems (Yang et al., 2020). Figure 1 shows the overview of the association entity graph model (AEGM).
Figure 1. Overview of the association entity graph model (AEGM)
Note. AEGM = association entity graph model. Accurately finding associative entities is challenging due to the ambiguities in queries (Gu et al., 2019) and the inability to effectively narrow down the scope of candidate answers without information beyond the KG and query entities. For example, given the query entity Avatar, a user may search for the actors starring in Avatar, similar movies, or other associative entities from a KG, as shown in Figure 1. To resolve this ambiguity, the query-by-example method aims to return the associative entities by providing a complete example as the element that users are interested in (Ding et al., 2023). However, the associations among entities are rapidly evolving in the real world. Entities are not only linked by their relations in a KG but are also implicitly associated with user-entity interactions on online platforms, such as social media (Muppasani et al., 2024), wikis (Raganato et al., 2016), and ratings (Liu et al., 2023). In this paper, the collection of user-entity interactions is collectively referred to as user-generated data (UGD). In the context of KG applications, the associative entities may be found with limited precision and coverage by only considering the KG. As shown in Figure 1, compared to the Titanic, the term “Cameron” was more likely to be the ideal answer for Avatar, due to the frequent co-occurrence of Cameron and Avatar in recent news. That is, the users’ query intents could not only be inferred by a KG, but were often also hidden within users’ behaviors. In this paper, we leverage the UGD to improve the accuracy of finding associative entities by making use of the timeliness and credibility of UGD.
Intuitively, a KG and UGD involve massive entities and noisy issues (Bu et al., 2021; Wu et al., 2022), but only a portion of entities are strongly associated with each other. It is necessary to select candidates from a KG and UGD as the basis for finding the associative entities. Previous works extracted candidates w.r.t. the query entities from a KG, then calculated their association degrees from the UGD (J. Li et al., 2021a). These methods may have yielded inaccurate candidates because a KG can be incomplete, and some entities that are strongly associated with each other in the UGD may not be linked in the KG (J. Li et al., 2021b). This discrepancy may have led to the omission of association candidates during the extraction process.
Moreover, the probability of user behaviors on different entities typically reflected the strength of implicit associations between entities. Particularly, the associations between entities were directed. For example, the probability that a user likes Avatar and also likes the Terminator was 80%. Conversely, the probability that a user likes the Terminator and simultaneously likes Avatar was only 60%. This motivates us to assign weights to the directed associations among entities to find the answers accurately.