Skip to main content

Showing 1–30 of 30 results for author: Lim, K H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.06665  [pdf, other

    cs.CL cs.IR cs.LG

    Enhancing Language Models for Financial Relation Extraction with Named Entities and Part-of-Speech

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: The Financial Relation Extraction (FinRE) task involves identifying the entities and their relation, given a piece of financial statement/text. To solve this FinRE problem, we propose a simple but effective strategy that improves the performance of pre-trained language models by augmenting them with Named Entity Recognition (NER) and Part-Of-Speech (POS), as well as different approaches to combine… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024 Tiny Paper Track

  2. Towards Precise Observations of Neural Model Robustness in Classification

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: In deep learning applications, robustness measures the ability of neural models that handle slight changes in input data, which could lead to potential safety hazards, especially in safety-critical applications. Pre-deployment assessment of model robustness is essential, but existing methods often suffer from either high costs or imprecise results. To enhance safety in real-world scenarios, metric… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2404.16411  [pdf, other

    cs.AI

    Label-Free Topic-Focused Summarization Using Query Augmentation

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: In today's data and information-rich world, summarization techniques are essential in harnessing vast text to extract key information and enhance decision-making and efficiency. In particular, topic-focused summarization is important due to its ability to tailor content to specific aspects of an extended text. However, this usually requires extensive labelled datasets and considerable computationa… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.08662  [pdf, other

    cs.IR cs.LG cs.SI

    FewUser: Few-Shot Social User Geolocation via Contrastive Learning

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: To address the challenges of scarcity in geotagged data for social user geolocation, we propose FewUser, a novel framework for Few-shot social User geolocation. We incorporate a contrastive learning strategy between users and locations to improve geolocation performance with no or limited training data. FewUser features a user representation module that harnesses a pre-trained language model (PLM)… ▽ More

    Submitted 28 March, 2024; originally announced April 2024.

    Comments: 17 pages, 3 figures, 8 tables, submitted to ECML-PKDD 2024 for review

  5. arXiv:2403.00786  [pdf, other

    cs.IR cs.SI

    Leveraging Contrastive Learning for Few-shot Geolocation of Social Posts

    Authors: Menglin Li, Kwan Hui Lim

    Abstract: Social geolocation is an important problem of predicting the originating locations of social media posts. However, this task is challenging due to the need for a substantial volume of training data, alongside well-annotated labels. These issues are further exacerbated by new or less popular locations with insufficient labels, further leading to an imbalanced dataset. In this paper, we propose \tex… ▽ More

    Submitted 19 February, 2024; originally announced March 2024.

    Comments: This paper contains 7-page main content and 2-page references and was submitted to IJCAI2024 for review

  6. arXiv:2311.12355  [pdf, other

    cs.IR cs.CL cs.LG

    Utilizing Language Models for Tour Itinerary Recommendation

    Authors: Ngai Lam Ho, Kwan Hui Lim

    Abstract: Tour itinerary recommendation involves planning a sequence of relevant Point-of-Interest (POIs), which combines challenges from the fields of both Operations Research (OR) and Recommendation Systems (RS). As an OR problem, there is the need to maximize a certain utility (e.g., popularity of POIs in the tour) while adhering to some constraints (e.g., maximum time for the tour). As a RS problem, it… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: PMAI23 @IJCAI 2023 2nd International Workshop on Process Management in the AI era

  7. arXiv:2311.11071  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis

    Authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim

    Abstract: When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommen… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Report number: 01

  8. arXiv:2310.19886  [pdf

    cs.LG cs.IR cs.SI

    BTRec: BERT-Based Trajectory Recommendation for Personalized Tours

    Authors: Ngai Lam Ho, Roy Ka-Wei Lee, Kwan Hui Lim

    Abstract: An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: RecSys 2023, Workshop on Recommenders in Tourism

  9. arXiv:2304.08495   

    cs.AI cs.MA

    Optimizing Group Utility in Itinerary Planning: A Strategic and Crowd-Aware Approach

    Authors: Junhua Liu, Kwan Hui Lim, Kristin L. Wood, Menglin Li

    Abstract: Itinerary recommendation is a complex sequence prediction problem with numerous real-world applications. This task becomes even more challenging when considering the optimization of multiple user queuing times and crowd levels, as well as numerous involved parameters, such as attraction popularity, queuing time, walking time, and operating hours. Existing solutions typically focus on single-person… ▽ More

    Submitted 10 September, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: Will be going through major revision

  10. arXiv:2302.09938  [pdf, other

    cs.AI cs.LG cs.SI

    SkillRec: A Data-Driven Approach to Job Skill Recommendation for Career Insights

    Authors: Xiang Qian Ong, Kwan Hui Lim

    Abstract: Understanding the skill sets and knowledge required for any career is of utmost importance, but it is increasingly challenging in today's dynamic world with rapid changes in terms of the tools and techniques used. Thus, it is especially important to be able to accurately identify the required skill sets for any job for better career insights and development. In this paper, we propose and develop t… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted to the 15th International Conference on Computer and Automation Engineering (ICCAE 2023)

  11. arXiv:2212.13900  [pdf, other

    cs.IR cs.AI cs.LG

    POIBERT: A Transformer-based Model for the Tour Recommendation Problem

    Authors: Ngai Lam Ho, Kwan Hui Lim

    Abstract: Tour itinerary planning and recommendation are challenging problems for tourists visiting unfamiliar cities. Many tour recommendation algorithms only consider factors such as the location and popularity of Points of Interest (POIs) but their solutions may not align well with the user's own preferences and other location constraints. Additionally, these solutions do not take into consideration of t… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted to the 2022 IEEE International Conference on Big Data (BigData2022)

  12. arXiv:2211.01336  [pdf, other

    cs.IR cs.AI

    A Transformer-based Framework for POI-level Social Post Geolocation

    Authors: Menglin Li, Kwan Hui Lim, Teng Guo, Junhua Liu

    Abstract: POI-level geo-information of social posts is critical to many location-based applications and services. However, the multi-modality, complexity and diverse nature of social media data and their platforms limit the performance of inferring such fine-grained locations and their subsequent applications. To address this issue, we present a transformer-based general framework, which builds upon pre-tra… ▽ More

    Submitted 26 October, 2022; originally announced November 2022.

    Comments: Full papers are 12 pages in length plus additional 4 pages for references (turns to 18 pages in total after submitting to arxiv). One figure and 5 tables are contained. This paper was submitted to ECIR 2023 for review

  13. arXiv:2210.14260  [pdf, other

    cs.CL

    Universal Evasion Attacks on Summarization Scoring

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: The automatic scoring of summaries is important as it guides the development of summarizers. Scoring is also complex, as it involves multiple aspects such as fluency, grammar, and even textual entailment with the source text. However, summary scoring has not been considered a machine learning task to study its accuracy and robustness. In this study, we place automatic scoring in the context of reg… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  14. arXiv:2210.14257  [pdf, other

    cs.CL

    Revision for Concision: A Constrained Paraphrase Generation Task

    Authors: Wenchuan Mu, Kwan Hui Lim

    Abstract: Academic writing should be concise as concise sentences better keep the readers' attention and convey meaning clearly. Writing concisely is challenging, for writers often struggle to revise their drafts. We introduce and formulate revising for concision as a natural language processing task at the sentence level. Revising for concision requires algorithms to use only necessary words to rewrite a s… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  15. arXiv:2112.12940  [pdf, other

    cs.CL cs.AI

    Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling

    Authors: Trisha Singhal, Junhua Liu, Lucienne T. M. Blessing, Kwan Hui Lim

    Abstract: The scientific world is changing at a rapid pace, with new technology being developed and new trends being set at an increasing frequency. This paper presents a framework for conducting scientific analyses of academic publications, which is crucial to monitor research trends and identify potential innovations. This framework adopts and combines various techniques of Natural Language Processing, su… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Comments: Accepted at the 2021 IEEE International Conference on Big Data (BigData2021)

  16. arXiv:2106.13121  [pdf, other

    cs.SI cs.AI cs.LG

    Real-time Spatio-temporal Event Detection on Geotagged Social Media

    Authors: Yasmeen George, Shanika Karunasekera, Aaron Harwood, Kwan Hui Lim

    Abstract: A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online… ▽ More

    Submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted to Journal of Big Data

  17. arXiv:2106.11815  [pdf, other

    cs.LG cs.CY cs.SI

    User Identification across Social Networking Sites using User Profiles and Posting Patterns

    Authors: Prashant Solanki, Kwan Hui Lim, Aaron Harwood

    Abstract: With the prevalence of online social networking sites (OSNs) and mobile devices, people are increasingly reliant on a variety of OSNs for kee** in touch with family and friends, and using it as a source of information. For example, a user might utilise multiple OSNs for different purposes, such as using Flickr to share holiday pictures with family and friends, and Twitter to post short messages… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted at the 2021 International Joint Conference on Neural Networks (IJCNN'21)

  18. arXiv:2106.11359  [pdf, other

    cs.CV cs.AI cs.LG

    Photozilla: A Large-Scale Photography Dataset and Visual Embedding for 20 Photography Styles

    Authors: Trisha Singhal, Junhua Liu, Lucienne T. M. Blessing, Kwan Hui Lim

    Abstract: The advent of social media platforms has been a catalyst for the development of digital photography that engendered a boom in vision applications. With this motivation, we introduce a large-scale dataset termed 'Photozilla', which includes over 990k images belonging to 10 different photographic styles. The dataset is then used to train 3 classification models to automatically classify the images i… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

    Comments: In the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. (Poster)

  19. arXiv:2103.02464  [pdf, other

    cs.AI cs.IR

    User Preferential Tour Recommendation Based on POI-Embedding Methods

    Authors: Ngai Lam Ho, Kwan Hui Lim

    Abstract: Tour itinerary planning and recommendation are challenging tasks for tourists in unfamiliar countries. Many tour recommenders only consider broad POI categories and do not align well with users' preferences and other locational constraints. We propose an algorithm to recommend personalized tours using POI-embedding methods, which provides a finer representation of POI types. Our recommendation alg… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: Accepted to the 26th International Conference on Intelligent User Interfaces (IUI'21), Poster Track

  20. arXiv:2103.01472  [pdf, other

    cs.SI cs.CY cs.IR

    TweetCOVID: A System for Analyzing Public Sentiments and Discussions about COVID-19 via Twitter Activities

    Authors: Jolin Shaynn-Ly Kwan, Kwan Hui Lim

    Abstract: The COVID-19 pandemic has created widespread health and economical impacts, affecting millions around the world. To better understand these impacts, we present the TweetCOVID system that offers the capability to understand the public reactions to the COVID-19 pandemic in terms of their sentiments, emotions, topics of interest and controversial discussions, over a range of time periods and location… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: Accepted to the 26th International Conference on Intelligent User Interfaces (IUI'21), Demo Track

  21. arXiv:2012.03057  [pdf, other

    cs.SI cs.AI

    Urban Crowdsensing using Social Media: An Empirical Study on Transformer and Recurrent Neural Networks

    Authors: Jerome Heng, Junhua Liu, Kwan Hui Lim

    Abstract: An important aspect of urban planning is understanding crowd levels at various locations, which typically require the use of physical sensors. Such sensors are potentially costly and time consuming to implement on a large scale. To address this issue, we utilize publicly available social media datasets and use them as the basis for two urban sensing problems, namely event detection and crowd level… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: Accepted at the 2020 IEEE International Conference on Big Data (BigData'20), Poster Track

  22. arXiv:2012.03049  [pdf, other

    cs.SI cs.CY

    Urban Heat Islands: Beating the Heat with Multi-Modal Spatial Analysis

    Authors: Marcus Yong, Kwan Hui Lim

    Abstract: In today's highly urbanized environment, the Urban Heat Island (UHI) phenomenon is increasingly prevalent where surface temperatures in urbanized areas are found to be much higher than surrounding rural areas. Excessive levels of heat stress leads to problems at various levels, ranging from the individual to the world. At the individual level, UHI could lead to the human body being unable to cope… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: Accepted at the 2020 IEEE International Conference on Big Data (BigData'20)

  23. arXiv:2012.03039  [pdf, other

    cs.SI

    Understanding Public Sentiments, Opinions and Topics about COVID-19 using Twitter

    Authors: Jolin Shaynn-Ly Kwan, Kwan Hui Lim

    Abstract: The COVID-19 pandemic has caused widespread devastation throughout the world. In addition to the health and economical impacts, there is an enormous emotional toll associated with the constant stress of daily life with the numerous restrictions in place to combat the pandemic. To better understand the impact of COVID-19, we proposed a framework that utilizes public tweets to derive the sentiments,… ▽ More

    Submitted 5 December, 2020; originally announced December 2020.

    Comments: Accepted at the 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'20)

  24. arXiv:2011.11010  [pdf, ps, other

    cs.SI

    Mining Influentials and their Bot Activities on Twitter Campaigns

    Authors: Shanika Karunasekera, Kwan Hui Lim, Aaron Harwood

    Abstract: Twitter is increasingly used for political, advertising and marketing campaigns, where the main aim is to influence users to support specific causes, individuals or groups. We propose a novel methodology for mining and analyzing Twitter campaigns, which includes: (i) collecting tweets and detecting topics relating to a campaign; (ii) mining important campaign topics using scientometrics measures;… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: Presented at the 7th International Workshop on New Frontiers in Mining Complex Patterns (NFMCP'18)

  25. arXiv:2009.12061  [pdf, other

    cs.CL cs.LG

    An Unsupervised Sentence Embedding Method by Mutual Information Maximization

    Authors: Yan Zhang, Ruidan He, Zuozhu Liu, Kwan Hui Lim, Lidong Bing

    Abstract: BERT is inefficient for sentence-pair tasks such as clustering or semantic search as it needs to evaluate combinatorially many sentence pairs which is very time-consuming. Sentence BERT (SBERT) attempted to solve this challenge by learning semantically meaningful representations of single sentences, such that similarity comparison can be easily accessed. However, SBERT is trained on corpus with hi… ▽ More

    Submitted 4 February, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

    Comments: Accepted to EMNLP 2020, code is released

  26. arXiv:2006.08369  [pdf, other

    cs.SI cs.CL cs.CY

    EPIC30M: An Epidemics Corpus Of Over 30 Million Relevant Tweets

    Authors: Junhua Liu, Trisha Singhal, Lucienne T. M. Blessing, Kristin L. Wood, Kwan Hui Lim

    Abstract: Since the start of COVID-19, several relevant corpora from various sources are presented in the literature that contain millions of data points. While these corpora are valuable in supporting many analyses on this specific pandemic, researchers require additional benchmark corpora that contain other epidemics to facilitate cross-epidemic pattern recognition and trend analysis tasks. During our oth… ▽ More

    Submitted 22 June, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

  27. arXiv:2005.06627  [pdf, other

    cs.CL cs.LG cs.SI

    CrisisBERT: a Robust Transformer for Crisis Classification and Contextual Crisis Embedding

    Authors: Junhua Liu, Trisha Singhal, Lucienne T. M. Blessing, Kristin L. Wood, Kwan Hui Lim

    Abstract: Classification of crisis events, such as natural disasters, terrorist attacks and pandemics, is a crucial task to create early signals and inform relevant parties for spontaneous actions to reduce overall damage. Despite crisis such as natural disasters can be predicted by professional institutions, certain events are first signaled by civilians, such as the recent COVID-19 pandemics. Social media… ▽ More

    Submitted 18 May, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

  28. arXiv:2005.02780  [pdf, other

    cs.SI cs.CL

    A Large-scale Industrial and Professional Occupation Dataset

    Authors: Junhua Liu, Yung Chuen Ng, Kwan Hui Lim

    Abstract: There has been growing interest in utilizing occupational data mining and analysis. In today's job market, occupational data mining and analysis is growing in importance as it enables companies to predict employee turnover, model career trajectories, screen through resumes and perform other human resource tasks. A key requirement to facilitate these tasks is the need for an occupation-related data… ▽ More

    Submitted 25 April, 2020; originally announced May 2020.

  29. arXiv:1910.10495  [pdf, other

    cs.CL cs.IR cs.LG

    IPOD: An Industrial and Professional Occupations Dataset and its Applications to Occupational Data Mining and Analysis

    Authors: Junhua Liu, Yung Chuen Ng, Kristin L. Wood, Kwan Hui Lim

    Abstract: Occupational data mining and analysis is an important task in understanding today's industry and job market. Various machine learning techniques are proposed and gradually deployed to improve companies' operations for upstream tasks, such as employee churn prediction, career trajectory modelling and automated interview. Job titles analysis and embedding, as the fundamental building blocks, are cru… ▽ More

    Submitted 26 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

  30. arXiv:1909.07775  [pdf, other

    cs.AI cs.GT cs.MA

    Strategic and Crowd-Aware Itinerary Recommendation

    Authors: Junhua Liu, Kristin L. Wood, Kwan Hui Lim

    Abstract: There is a rapidly growing demand for itinerary planning in tourism but this task remains complex and difficult, especially when considering the need to optimize for queuing time and crowd levels for multiple users. This difficulty is further complicated by the large amount of parameters involved, i.e., attraction popularity, queuing time, walking time, operating hours, etc. Many recent works prop… ▽ More

    Submitted 9 June, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

    MSC Class: 68T20 ACM Class: I.2.8