Search | arXiv e-print repository

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Authors: Yuchen Zhou, Jiayuan Gu, Tung Yen Chiang, Fanbo Xiang, Hao Su

Abstract: The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing… ▽ More The development of 2D foundation models for image segmentation has been significantly advanced by the Segment Anything Model (SAM). However, achieving similar success in 3D models remains a challenge due to issues such as non-unified data formats, lightweight models, and the scarcity of labeled data with diverse masks. To this end, we propose a 3D promptable segmentation model (Point-SAM) focusing on point clouds. Our approach utilizes a transformer-based method, extending SAM to the 3D domain. We leverage part-level and object-level annotations and introduce a data engine to generate pseudo labels from SAM, thereby distilling 2D knowledge into our 3D model. Our model outperforms state-of-the-art models on several indoor and outdoor benchmarks and demonstrates a variety of applications, such as 3D annotation. Codes and demo can be found at https://github.com/zyc00/Point-SAM. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.09568 [pdf, other]

doi 10.1007/978-981-97-2238-9_16

Dynamic GNNs for Precise Seizure Detection and Classification from EEG Data

Authors: Arash Hajisafi, Haowen Lin, Yao-Yi Chiang, Cyrus Shahabi

Abstract: Diagnosing epilepsy requires accurate seizure detection and classification, but traditional manual EEG signal analysis is resource-intensive. Meanwhile, automated algorithms often overlook EEG's geometric and semantic properties critical for interpreting brain activity. This paper introduces NeuroGNN, a dynamic Graph Neural Network (GNN) framework that captures the dynamic interplay between the EE… ▽ More Diagnosing epilepsy requires accurate seizure detection and classification, but traditional manual EEG signal analysis is resource-intensive. Meanwhile, automated algorithms often overlook EEG's geometric and semantic properties critical for interpreting brain activity. This paper introduces NeuroGNN, a dynamic Graph Neural Network (GNN) framework that captures the dynamic interplay between the EEG electrode locations and the semantics of their corresponding brain regions. The specific brain region where an electrode is placed critically shapes the nature of captured EEG signals. Each brain region governs distinct cognitive functions, emotions, and sensory processing, influencing both the semantic and spatial relationships within the EEG data. Understanding and modeling these intricate brain relationships are essential for accurate and meaningful insights into brain activity. This is precisely where the proposed NeuroGNN framework excels by dynamically constructing a graph that encapsulates these evolving spatial, temporal, semantic, and taxonomic correlations to improve precision in seizure detection and classification. Our extensive experiments with real-world data demonstrate that NeuroGNN significantly outperforms existing state-of-the-art models. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in the proceedings of the 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024), Taipei, Taiwan, May 7-10, 2024, and is available online at https://doi.org/10.1007/978-981-97-2238-9_16

Journal ref: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'24). Singapore: Springer Nature Singapore, 2024

arXiv:2404.01643 [pdf, other]

A Closer Look at Spatial-Slice Features Learning for COVID-19 Detection

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

Abstract: Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions… ▽ More Conventional Computed Tomography (CT) imaging recognition faces two significant challenges: (1) There is often considerable variability in the resolution and size of each CT scan, necessitating strict requirements for the input size and adaptability of models. (2) CT-scan contains large number of out-of-distribution (OOD) slices. The crucial features may only be present in specific spatial regions and slices of the entire CT scan. How can we effectively figure out where these are located? To deal with this, we introduce an enhanced Spatial-Slice Feature Learning (SSFL++) framework specifically designed for CT scan. It aim to filter out a OOD data within whole CT scan, enabling our to select crucial spatial-slice for analysis by reducing 70% redundancy totally. Meanwhile, we proposed Kernel-Density-based slice Sampling (KDS) method to improve the stability when training and inference stage, therefore speeding up the rate of convergence and boosting performance. As a result, the experiments demonstrate the promising performance of our model using a simple EfficientNet-2D (E2D) model, even with only 1% of the training data. The efficacy of our approach has been validated on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop, in conjunction with CVPR 2024. Our source code is available at https://github.com/ming053l/E2D △ Less

Submitted 20 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Camera-ready version, accepted by DEF-AI-MIA workshop, in conjunted with CVPR2024

arXiv:2403.11230 [pdf, other]

Simple 2D Convolutional Neural Network-based Approach for COVID-19 Detection

Authors: Chih-Chung Hsu, Chia-Ming Lee, Yang Fan Chiang, Yi-Shiuan Chou, Chih-Yu Jiang, Shen-Chieh Tai, Chi-Han Tsai

Abstract: This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this me… ▽ More This study explores the use of deep learning techniques for analyzing lung Computed Tomography (CT) images. Classic deep learning approaches face challenges with varying slice counts and resolutions in CT images, a diversity arising from the utilization of assorted scanning equipment. Typically, predictions are made on single slices which are then combined for a comprehensive outcome. Yet, this method does not incorporate learning features specific to each slice, leading to a compromise in effectiveness. To address these challenges, we propose an advanced Spatial-Slice Feature Learning (SSFL++) framework specifically tailored for CT scans. It aims to filter out out-of-distribution (OOD) data within the entire CT scan, allowing us to select essential spatial-slice features for analysis by reducing data redundancy by 70\%. Additionally, we introduce a Kernel-Density-based slice Sampling (KDS) method to enhance stability during training and inference phases, thereby accelerating convergence and enhancing overall performance. Remarkably, our experiments reveal that our model achieves promising results with a simple EfficientNet-2D (E2D) model. The effectiveness of our approach is confirmed on the COVID-19-CT-DB datasets provided by the DEF-AI-MIA workshop. △ Less

Submitted 17 March, 2024; originally announced March 2024.

arXiv:2402.09658 [pdf]

Towards Precision Cardiovascular Analysis in Zebrafish: The ZACAF Paradigm

Authors: Amir Mohammad Naderi, Jennifer G. Casey, Mao-Hsiang Huang, Rachelle Victorio, David Y. Chiang, Calum MacRae, Hung Cao, Vandana A. Gupta

Abstract: Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend… ▽ More Quantifying cardiovascular parameters like ejection fraction in zebrafish as a host of biological investigations has been extensively studied. Since current manual monitoring techniques are time-consuming and fallible, several image processing frameworks have been proposed to automate the process. Most of these works rely on supervised deep-learning architectures. However, supervised methods tend to be overfitted on their training dataset. This means that applying the same framework to new data with different imaging setups and mutant types can severely decrease performance. We have developed a Zebrafish Automatic Cardiovascular Assessment Framework (ZACAF) to quantify the cardiac function in zebrafish. In this work, we further applied data augmentation, Transfer Learning (TL), and Test Time Augmentation (TTA) to ZACAF to improve the performance for the quantification of cardiovascular function quantification in zebrafish. This strategy can be integrated with the available frameworks to aid other researchers. We demonstrate that using TL, even with a constrained dataset, the model can be refined to accommodate a novel microscope setup, encompassing diverse mutant types and accommodating various video recording protocols. Additionally, as users engage in successive rounds of TL, the model is anticipated to undergo substantial enhancements in both generalizability and accuracy. Finally, we applied this approach to assess the cardiovascular function in nrap mutant zebrafish, a model of cardiomyopathy. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2401.17244 [pdf, other]

LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

Authors: Yuan Chiang, Elvis Hsieh, Chia-Hong Chou, Janosh Riebesell

Abstract: Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences, where reliability and reproducibility are crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of hi… ▽ More Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences, where reliability and reproducibility are crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of hierarchical reasoning-and-acting (ReAct) agents that can dynamically and recursively interact with computational and experimental data on Materials Project (MP) and run atomistic simulations via high-throughput workflow interface. Without fine-tuning, LLaMP demonstrates strong tool usage ability to comprehend and integrate various modalities of materials science concepts, fetch relevant data stores on the fly, process higher-order data (such as crystal structure and elastic tensor), and streamline complex tasks in computational materials and chemistry. We propose a simple metric combining uncertainty and confidence estimates to evaluate the self-consistency of responses by LLaMP and vanilla LLMs. Our benchmark shows that LLaMP effectively mitigates the intrinsic bias in LLMs, counteracting the errors on bulk moduli, electronic bandgaps, and formation energies that seem to derive from mixed data sources. We also demonstrate LLaMP's capability to edit crystal structures and run annealing molecular dynamics simulations using pre-trained machine-learning force fields. The framework offers an intuitive and nearly hallucination-free approach to exploring and scaling materials informatics, and establishes a pathway for knowledge distillation and fine-tuning other language models. Code and live demo are available at https://github.com/chiang-yuan/llamp △ Less

Submitted 2 June, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 31 pages, 5 figures

arXiv:2310.17341 [pdf, other]

De-novo Chemical Reaction Generation by Means of Temporal Convolutional Neural Networks

Authors: Andrei Buin, Hung Yi Chiang, S. Andrew Gadsden, Faraz A. Alderson

Abstract: We present here a combination of two networks, Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN) in de novo reaction generation using the novel Reaction Smiles-like representation of reactions (CGRSmiles) with atom map** directly incorporated. Recurrent Neural Networks are known for their autoregressive properties and are frequently used in language modelling wi… ▽ More We present here a combination of two networks, Recurrent Neural Networks (RNN) and Temporarily Convolutional Neural Networks (TCN) in de novo reaction generation using the novel Reaction Smiles-like representation of reactions (CGRSmiles) with atom map** directly incorporated. Recurrent Neural Networks are known for their autoregressive properties and are frequently used in language modelling with direct application to SMILES generation. The relatively novel TCNs possess similar properties with wide receptive field while obeying the causality required for natural language processing (NLP). The combination of both latent representations expressed through TCN and RNN results in an overall better performance compared to RNN alone. Additionally, it is shown that different fine-tuning protocols have a profound impact on generative scope of the model when applied on a dataset of interest via transfer learning. △ Less

Submitted 1 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.14478 [pdf, other]

GeoLM: Empowering Language Models for Geospatially Grounded Language Understanding

Authors: Zekun Li, Wenxuan Zhou, Yao-Yi Chiang, Muhao Chen

Abstract: Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical database… ▽ More Humans subconsciously engage in geospatial reasoning when reading articles. We recognize place names and their spatial relations in text and mentally associate them with their physical locations on Earth. Although pretrained language models can mimic this cognitive process using linguistic context, they do not utilize valuable geospatial information in large, widely available geographical databases, e.g., OpenStreetMap. This paper introduces GeoLM, a geospatially grounded language model that enhances the understanding of geo-entities in natural language. GeoLM leverages geo-entity mentions as anchors to connect linguistic information in text corpora with geospatial information extracted from geographical databases. GeoLM connects the two types of context through contrastive learning and masked language modeling. It also incorporates a spatial coordinate embedding mechanism to encode distance and direction relations to capture geospatial context. In the experiment, we demonstrate that GeoLM exhibits promising capabilities in supporting toponym recognition, toponym linking, relation extraction, and geo-entity ty**, which bridge the gap between natural language processing and geospatial sciences. The code is publicly available at https://github.com/knowledge-computing/geolm. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP23 main

arXiv:2308.14920 [pdf, other]

Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions

Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson

Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard… ▽ More Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate. △ Less

Submitted 4 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 31 pages, 18 figures, 4 tables

arXiv:2306.17059 [pdf, other]

The mapKurator System: A Complete Pipeline for Extracting and Linking Text from Historical Maps

Authors: **a Kim, Zekun Li, Yijun Lin, Min Namgung, Leeje Jang, Yao-Yi Chiang

Abstract: Scanned historical maps in libraries and archives are valuable repositories of geographic data that often do not exist elsewhere. Despite the potential of machine learning tools like the Google Vision APIs for automatically transcribing text from these maps into machine-readable formats, they do not work well with large-sized images (e.g., high-resolution scanned documents), cannot infer the relat… ▽ More Scanned historical maps in libraries and archives are valuable repositories of geographic data that often do not exist elsewhere. Despite the potential of machine learning tools like the Google Vision APIs for automatically transcribing text from these maps into machine-readable formats, they do not work well with large-sized images (e.g., high-resolution scanned documents), cannot infer the relation between the recognized text and other datasets, and are challenging to integrate with post-processing tools. This paper introduces the mapKurator system, an end-to-end system integrating machine learning models with a comprehensive data processing pipeline. mapKurator empowers automated extraction, post-processing, and linkage of text labels from large numbers of large-dimension historical map scans. The output data, comprising bounding polygons and recognized text, is in the standard GeoJSON format, making it easily modifiable within Geographic Information Systems (GIS). The proposed system allows users to quickly generate valuable data from large numbers of historical maps for in-depth analysis of the map content and, in turn, encourages map findability, accessibility, interoperability, and reusability (FAIR principles). We deployed the mapKurator system and enabled the processing of over 60,000 maps and over 100 million text/place names in the David Rumsey Historical Map collection. We also demonstrated a seamless integration of mapKurator with a collaborative web platform to enable accessing automated approaches for extracting and linking text labels from historical map scans and collective work to improve the results. △ Less

Submitted 3 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 4 pages, 4 figures

arXiv:2306.15927 [pdf, other]

doi 10.1145/3589132.3625567

Learning Dynamic Graphs from All Contextual Information for Accurate Point-of-Interest Visit Forecasting

Authors: Arash Hajisafi, Haowen Lin, Sina Shaham, Haoji Hu, Maria Despoina Siampou, Yao-Yi Chiang, Cyrus Shahabi

Abstract: Forecasting the number of visits to Points-of-Interest (POI) in an urban area is critical for planning and decision-making for various application domains, from urban planning and transportation management to public health and social studies. Although this forecasting problem can be formulated as a multivariate time-series forecasting task, the current approaches cannot fully exploit the ever-chan… ▽ More Forecasting the number of visits to Points-of-Interest (POI) in an urban area is critical for planning and decision-making for various application domains, from urban planning and transportation management to public health and social studies. Although this forecasting problem can be formulated as a multivariate time-series forecasting task, the current approaches cannot fully exploit the ever-changing multi-context correlations among POIs. Therefore, we propose Busyness Graph Neural Network (BysGNN), a temporal graph neural network designed to learn and uncover the underlying multi-context correlations between POIs for accurate visit forecasting. Unlike other approaches where only time-series data is used to learn a dynamic graph, BysGNN utilizes all contextual information and time-series data to learn an accurate dynamic graph representation. By incorporating all contextual, temporal, and spatial signals, we observe a significant improvement in our forecasting accuracy over state-of-the-art forecasting models in our experiments with real-world datasets across the United States. △ Less

Submitted 28 September, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.01209 [pdf, other]

Counting Crowds in Bad Weather

Authors: Zhi-Kai Huang, Wei-Ting Chen, Yuan-Chun Chiang, Sy-Yen Kuo, Ming-Hsuan Yang

Abstract: Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding. Numerous methods have been proposed and achieved state-of-the-art performance for real-world tasks. However, existing approaches do not perform well under adverse weather such as haze, rain, and snow since the visual appearances of crowds in such scenes a… ▽ More Crowd counting has recently attracted significant attention in the field of computer vision due to its wide applications to image understanding. Numerous methods have been proposed and achieved state-of-the-art performance for real-world tasks. However, existing approaches do not perform well under adverse weather such as haze, rain, and snow since the visual appearances of crowds in such scenes are drastically different from those images in clear weather of typical datasets. In this paper, we propose a method for robust crowd counting in adverse weather scenarios. Instead of using a two-stage approach that involves image restoration and crowd counting modules, our model learns effective features and adaptive queries to account for large appearance variations. With these weather queries, the proposed model can learn the weather information according to the degradation of the input image and optimize with the crowd counting module simultaneously. Experimental results show that the proposed algorithm is effective in counting crowds under different weather types on benchmark datasets. The source code and trained models will be made available to the public. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: including supplemental material

arXiv:2304.11732 [pdf, other]

Quantile Extreme Gradient Boosting for Uncertainty Quantification

Authors: Xiaozhe Yin, Masoud Fallah-Shorshani, Rob McConnell, Scott Fruin, Yao-Yi Chiang, Meredith Franklin

Abstract: As the availability, size and complexity of data have increased in recent years, machine learning (ML) techniques have become popular for modeling. Predictions resulting from applying ML models are often used for inference, decision-making, and downstream applications. A crucial yet often overlooked aspect of ML is uncertainty quantification, which can significantly impact how predictions from mod… ▽ More As the availability, size and complexity of data have increased in recent years, machine learning (ML) techniques have become popular for modeling. Predictions resulting from applying ML models are often used for inference, decision-making, and downstream applications. A crucial yet often overlooked aspect of ML is uncertainty quantification, which can significantly impact how predictions from models are used and interpreted. Extreme Gradient Boosting (XGBoost) is one of the most popular ML methods given its simple implementation, fast computation, and sequential learning, which make its predictions highly accurate compared to other methods. However, techniques for uncertainty determination in ML models such as XGBoost have not yet been universally agreed among its varying applications. We propose enhancements to XGBoost whereby a modified quantile regression is used as the objective function to estimate uncertainty (QXGBoost). Specifically, we included the Huber norm in the quantile regression model to construct a differentiable approximation to the quantile regression error function. This key step allows XGBoost, which uses a gradient-based optimization algorithm, to make probabilistic predictions efficiently. QXGBoost was applied to create 90\% prediction intervals for one simulated dataset and one real-world environmental dataset of measured traffic noise. Our proposed method had comparable or better performance than the uncertainty estimates generated for regular and quantile light gradient boosting. For both the simulated and traffic noise datasets, the overall performance of the prediction intervals from QXGBoost were better than other models based on coverage width-based criterion. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: 15 pages, 7 figures and 4 tables

arXiv:2302.02367 [pdf, other]

FastPillars: A Deployment-friendly Pillar-based 3D Detector

Authors: Sifan Zhou, Zhi Tian, Xiangxiang Chu, Xinyu Zhang, Bo Zhang, Xiaobo Lu, Chengjian Feng, Zequn Jie, Patrick Yin Chiang, Lin Ma

Abstract: The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios. Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference, which puts a hard barrier for deployment, especially for on-device applications. In this paper, to tackle the challenge of efficient 3D object detection from an ind… ▽ More The deployment of 3D detectors strikes one of the major challenges in real-world self-driving scenarios. Existing BEV-based (i.e., Bird Eye View) detectors favor sparse convolutions (known as SPConv) to speed up training and inference, which puts a hard barrier for deployment, especially for on-device applications. In this paper, to tackle the challenge of efficient 3D object detection from an industry perspective, we devise a deployment-friendly pillar-based 3D detector, termed FastPillars. First, we introduce a novel lightweight Max-and-Attention Pillar Encoding (MAPE) module specially for enhancing small 3D objects. Second, we propose a simple yet effective principle for designing a backbone in pillar-based 3D detection. We construct FastPillars based on these designs, achieving high performance and low latency without SPConv. Extensive experiments on two large-scale datasets demonstrate the effectiveness and efficiency of FastPillars for on-device 3D detection regarding both performance and speed. Specifically, FastPillars delivers state-of-the-art accuracy on Waymo Open Dataset with 1.8X speed up and 3.8 mAPH/L2 improvement over CenterPoint (SPConv-based). Our code is publicly available at: https://github.com/StiphyJay/FastPillars. △ Less

Submitted 13 December, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

Comments: Submitted to AAAI2024

arXiv:2301.08524 [pdf, other]

Clustering Human Mobility with Multiple Spaces

Authors: Haoji Hu, Haowen Lin, Yao-Yi Chiang

Abstract: Human mobility clustering is an important problem for understanding human mobility behaviors (e.g., work and school commutes). Existing methods typically contain two steps: choosing or learning a mobility representation and applying a clustering algorithm to the representation. However, these methods rely on strict visiting orders in trajectories and cannot take advantage of multiple types of mobi… ▽ More Human mobility clustering is an important problem for understanding human mobility behaviors (e.g., work and school commutes). Existing methods typically contain two steps: choosing or learning a mobility representation and applying a clustering algorithm to the representation. However, these methods rely on strict visiting orders in trajectories and cannot take advantage of multiple types of mobility representations. This paper proposes a novel mobility clustering method for mobility behavior detection. First, the proposed method contains a permutation-equivalent operation to handle sub-trajectories that might have different visiting orders but similar impacts on mobility behaviors. Second, the proposed method utilizes a variational autoencoder architecture to simultaneously perform clustering in both latent and original spaces. Also, in order to handle the bias of a single latent space, our clustering assignment prediction considers multiple learned latent spaces at different epochs. This way, the proposed method produces accurate results and can provide reliability estimates of each trajectory's cluster assignment. The experiment shows that the proposed method outperformed state-of-the-art methods in mobility behavior detection from trajectories with better accuracy and more interpretability. △ Less

Submitted 20 January, 2023; originally announced January 2023.

arXiv:2210.12213 [pdf, other]

SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation

Authors: Zekun Li, **a Kim, Yao-Yi Chiang, Muhao Chen

Abstract: Named geographic entities (geo-entities for short) are the building blocks of many geographic datasets. Characterizing geo-entities is integral to various application domains, such as geo-intelligence and map comprehension, while a key challenge is to capture the spatial-varying context of an entity. We hypothesize that we shall know the characteristics of a geo-entity by its surrounding entities,… ▽ More Named geographic entities (geo-entities for short) are the building blocks of many geographic datasets. Characterizing geo-entities is integral to various application domains, such as geo-intelligence and map comprehension, while a key challenge is to capture the spatial-varying context of an entity. We hypothesize that we shall know the characteristics of a geo-entity by its surrounding entities, similar to knowing word meanings by their linguistic context. Accordingly, we propose a novel spatial language model, SpaBERT, which provides a general-purpose geo-entity representation based on neighboring entities in geospatial data. SpaBERT extends BERT to capture linearized spatial context, while incorporating a spatial coordinate embedding mechanism to preserve spatial relations of entities in the 2-dimensional space. SpaBERT is pretrained with masked language modeling and masked entity prediction tasks to learn spatial dependencies. We apply SpaBERT to two downstream tasks: geo-entity ty** and geo-entity linking. Compared with the existing language models that do not use spatial context, SpaBERT shows significant performance improvement on both tasks. We also analyze the entity representation from SpaBERT in various settings and the effect of spatial coordinate embedding. △ Less

Submitted 21 October, 2022; originally announced October 2022.

Comments: Accepted by EMNLP 2022

arXiv:2210.06664 [pdf]

Are Macula or Optic Nerve Head Structures better at Diagnosing Glaucoma? An Answer using AI and Wide-Field Optical Coherence Tomography

Authors: Charis Y. N. Chiang, Fabian Braeu, Thanadet Chuangsuwanich, Royston K. Y. Tan, Jacqueline Chua, Leopold Schmetterer, Alexandre Thiery, Martin Buist, Michaël J. A. Girard

Abstract: Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field sw… ▽ More Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field swept-source OCT scans from 319 glaucoma subjects and 298 non-glaucoma subjects. All scans were compensated to improve deep-tissue visibility. We developed a deep learning algorithm to automatically label all major ONH tissue structures by using 270 manually annotated B-scans for training. The performance of our algorithm was assessed using the Dice coefficient (DC). A glaucoma classification algorithm (3D CNN) was then designed using a combination of 500 OCT volumes and their corresponding automatically segmented masks. This algorithm was trained and tested on 3 datasets: OCT scans cropped to contain the macular tissues only, those to contain the ONH tissues only, and the full wide-field OCT scans. The classification performance for each dataset was reported using the AUC. Results: Our segmentation algorithm was able to segment ONH and macular tissues with a DC of 0.94 $\pm$ 0.003. The classification algorithm was best able to diagnose glaucoma using wide-field 3D-OCT volumes with an AUC of 0.99 $\pm$ 0.01, followed by ONH volumes with an AUC of 0.93 $\pm$ 0.06, and finally macular volumes with an AUC of 0.91 $\pm$ 0.11. Conclusions: this study showed that using wide-field OCT as compared to the typical OCT images containing just the ONH or macular may allow for a significantly improved glaucoma diagnosis. This may encourage the mainstream adoption of 3D wide-field OCT scans. For clinical AI studies that use traditional machines, we would recommend the use of ONH scans as opposed to macula scans. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 23 pages, 5 figures

arXiv:2207.05631 [pdf, other]

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Authors: Wentse Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu

Abstract: Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers… ▽ More Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency. △ Less

Submitted 5 January, 2024; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2205.04665 [pdf, other]

doi 10.1109/TVLSI.2022.3172685

A 14uJ/Decision Keyword Spotting Accelerator with In-SRAM-Computing and On Chip Learning for Customization

Authors: Yu-Hsiang Chiang, Tian-Sheuan Chang, Shyh Jye Jou

Abstract: Keyword spotting has gained popularity as a natural way to interact with consumer devices in recent years. However, because of its always-on nature and the variety of speech, it necessitates a low-power design as well as user customization. This paper describes a low-power, energy-efficient keyword spotting accelerator with SRAM based in-memory computing (IMC) and on-chip learning for user customi… ▽ More Keyword spotting has gained popularity as a natural way to interact with consumer devices in recent years. However, because of its always-on nature and the variety of speech, it necessitates a low-power design as well as user customization. This paper describes a low-power, energy-efficient keyword spotting accelerator with SRAM based in-memory computing (IMC) and on-chip learning for user customization. However, IMC is constrained by macro size, limited precision, and non-ideal effects. To address the issues mentioned above, this paper proposes bias compensation and fine-tuning using an IMC-aware model design. Furthermore, because learning with low-precision edge devices results in zero error and gradient values due to quantization, this paper proposes error scaling and small gradient accumulation to achieve the same accuracy as ideal model training. The simulation results show that with user customization, we can recover the accuracy loss from 51.08\% to 89.76\% with compensation and fine-tuning and further improve to 96.71\% with customization. The chip implementation can successfully run the model with only 14$uJ$ per decision. When compared to the state-of-the-art works, the presented design has higher energy efficiency with additional on-chip model customization capabilities for higher accuracy. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: 10 pages, 18 figures, to be published in IEEE Transaction on VLSI, 2022

arXiv:2205.03996 [pdf, other]

doi 10.1109/JETCAS.2022.3171522

Hardware-Robust In-RRAM-Computing for Object Detection

Authors: Yu-Hsiang Chiang, Cheng En Ni, Yun Sung, Tuo-Hung Hou, Tian-Sheuan Chang, Shyh Jye Jou

Abstract: In-memory computing is becoming a popular architecture for deep-learning hardware accelerators recently due to its highly parallel computing, low power, and low area cost. However, in-RRAM computing (IRC) suffered from large device variation and numerous nonideal effects in hardware. Although previous approaches including these effects in model training successfully improved variation tolerance, t… ▽ More In-memory computing is becoming a popular architecture for deep-learning hardware accelerators recently due to its highly parallel computing, low power, and low area cost. However, in-RRAM computing (IRC) suffered from large device variation and numerous nonideal effects in hardware. Although previous approaches including these effects in model training successfully improved variation tolerance, they only considered part of the nonideal effects and relatively simple classification tasks. This paper proposes a joint hardware and software optimization strategy to design a hardware-robust IRC macro for object detection. We lower the cell current by using a low word-line voltage to enable a complete convolution calculation in one operation that minimizes the impact of nonlinear addition. We also implement ternary weight map** and remove batch normalization for better tolerance against device variation, sense amplifier variation, and IR drop problem. An extra bias is included to overcome the limitation of the current sensing range. The proposed approach has been successfully applied to a complex object detection task with only 3.85\% mAP drop, whereas a naive design suffers catastrophic failure under these nonideal effects. △ Less

Submitted 8 May, 2022; originally announced May 2022.

Comments: 10 pages, 18 figures

arXiv:2202.04883 [pdf]

doi 10.1016/j.compenvurbsys.2022.101794

Towards the automated large-scale reconstruction of past road networks from historical maps

Authors: Johannes H. Uhl, Stefan Leyk, Yao-Yi Chiang, Craig A. Knoblock

Abstract: Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce a… ▽ More Transportation infrastructure, such as road or railroad networks, represent a fundamental component of our civilization. For sustainable planning and informed decision making, a thorough understanding of the long-term evolution of transportation infrastructure such as road networks is crucial. However, spatially explicit, multi-temporal road network data covering large spatial extents are scarce and rarely available prior to the 2000s. Herein, we propose a framework that employs increasingly available scanned and georeferenced historical map series to reconstruct past road networks, by integrating abundant, contemporary road network data and color information extracted from historical maps. Specifically, our method uses contemporary road segments as analytical units and extracts historical roads by inferring their existence in historical map series based on image processing and clustering techniques. We tested our method on over 300,000 road segments representing more than 50,000 km of the road network in the United States, extending across three study areas that cover 53 historical topographic map sheets dated between 1890 and 1950. We evaluated our approach by comparison to other historical datasets and against manually created reference data, achieving F-1 scores of up to 0.95, and showed that the extracted road network statistics are highly plausible over time, i.e., following general growth patterns. We demonstrated that contemporary geospatial data integrated with information extracted from historical map series open up new avenues for the quantitative analysis of long-term urbanization processes and landscape changes far beyond the era of operational remote sensing and digital cartography. △ Less

Submitted 11 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: 36 pages, 22 figures

arXiv:2201.04565 [pdf, other]

doi 10.3389/fmats.2021.803875

ImageMech: From image to particle spring network for mechanical characterization

Authors: Yuan Chiang, Ting-Wai Chiu, Shu-Wei Chang

Abstract: The emerging demand for advanced structural and biological materials calls for novel modeling tools that can rapidly yield high-fidelity estimation on materials properties in design cycles. Lattice spring model (LSM), a coarse-grained particle spring network, has gained attention in recent years for predicting the mechanical properties and giving insights into the fracture mechanism with high repr… ▽ More The emerging demand for advanced structural and biological materials calls for novel modeling tools that can rapidly yield high-fidelity estimation on materials properties in design cycles. Lattice spring model (LSM), a coarse-grained particle spring network, has gained attention in recent years for predicting the mechanical properties and giving insights into the fracture mechanism with high reproducibility and generalizability. However, to simulate the materials in sufficient detail for guaranteed numerical stability and convergence, most of the time a large number of particles are needed, greatly diminishing the potential for high-throughput computation and therewith data generation for machine learning frameworks. Here, we implement CuLSM, a GPU-accelerated CUDA C++ code realizing parallelism over the spring list instead of the commonly used spatial decomposition, which requires intermittent updates on the particle neighbor list. Along with the image-to-particle conversion tool Img2Particle, our toolkit offers a fast and flexible platform to characterize the elastic and fracture behaviors of materials, expediting the design process between additive manufacturing and computer-aided design. With the growing demand for new lightweight, adaptable, and multi-functional materials and structures, such tailored and optimized modeling platform has profound impacts, enabling faster exploration in design spaces, better quality control for 3D printing by digital twin techniques, and larger data generation pipelines for image-based generative machine learning models. △ Less

Submitted 24 February, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

Journal ref: Front. Mater. 8:803875 (2022)

arXiv:2112.12033 [pdf, other]

doi 10.1016/j.xcrp.2022.100975

Encoding protein dynamic information in graph representation for functional residue identification

Authors: Yuan Chiang, Wei-Han Hui, Shu-Wei Chang

Abstract: Recent advances in protein function prediction exploit graph-based deep learning approaches to correlate the structural and topological features of proteins with their molecular functions. However, proteins in vivo are not static but dynamic molecules that alter conformation for functional purposes. Here we apply normal mode analysis to native protein conformations and augment protein graphs by co… ▽ More Recent advances in protein function prediction exploit graph-based deep learning approaches to correlate the structural and topological features of proteins with their molecular functions. However, proteins in vivo are not static but dynamic molecules that alter conformation for functional purposes. Here we apply normal mode analysis to native protein conformations and augment protein graphs by connecting edges between dynamically correlated residue pairs. In the multilabel function classification task, our method demonstrates a remarkable performance gain based on this dynamics-informed representation. The proposed graph neural network, ProDAR, increases the interpretability and generalizability of residue-level annotations and robustly reflects structural nuance in proteins. We elucidate the importance of dynamic information in graph representation by comparing class activation maps for hMTH1, nitrophorin, and SARS-CoV-2 receptor binding domain. Our model successfully learns the dynamic fingerprints of proteins and pinpoints the residues of functional impacts, with vast untapped potential for broad biotechnology and pharmaceutical applications. △ Less

Submitted 10 June, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Journal ref: Cell Reports Physical Science, Volume 3, Issue 7, 2022, 100975, ISSN 2666-3864

arXiv:2112.06104 [pdf, other]

doi 10.1145/3486635.3491070

Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection

Authors: Zekun Li, Runyu Guan, Qianmu Yu, Yao-Yi Chiang, Craig A. Knoblock

Abstract: Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have b… ▽ More Many historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map images could greatly speed up the map interpretation and helps generate rich metadata describing the map content. Many text detection algorithms have been proposed to locate text regions in map images automatically, but most of the algorithms are trained on out-ofdomain datasets (e.g., scenic images). Training data determines the quality of machine learning models, and manually annotating text regions in map images is labor-extensive and time-consuming. On the other hand, existing geographic data sources, such as Open- StreetMap (OSM), contain machine-readable map layers, which allow us to separate out the text layer and obtain text label annotations easily. However, the cartographic styles between OSM map tiles and historical maps are significantly different. This paper proposes a method to automatically generate an unlimited amount of annotated historical map images for training text detection models. We use a style transfer model to convert contemporary map images into historical style and place text labels upon them. We show that the state-of-the-art text detection models (e.g., PSENet) can benefit from the synthetic historical maps and achieve significant improvement for historical map text detection. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2112.05794 [pdf, other]

A Label Correction Algorithm Using Prior Information for Automatic and Accurate Geospatial Object Recognition

Authors: Weiwei Duan, Yao-Yi Chiang, Stefan Leyk, Johannes H. Uhl, Craig A. Knoblock

Abstract: Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlap** geo-referenced external vector data with topographic maps a… ▽ More Thousands of scanned historical topographic maps contain valuable information covering long periods of time, such as how the hydrography of a region has changed over time. Efficiently unlocking the information in these maps requires training a geospatial objects recognition system, which needs a large amount of annotated data. Overlap** geo-referenced external vector data with topographic maps according to their coordinates can annotate the desired objects' locations in the maps automatically. However, directly overlap** the two datasets causes misaligned and false annotations because the publication years and coordinate projection systems of topographic maps are different from the external vector data. We propose a label correction algorithm, which leverages the color information of maps and the prior shape information of the external vector data to reduce misaligned and false annotations. The experiments show that the precision of annotations from the proposed algorithm is 10% higher than the annotations from a state-of-the-art algorithm. Consequently, recognition results using the proposed algorithm's annotations achieve 9% higher correctness than using the annotations from the state-of-the-art algorithm. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2112.05786 [pdf, other]

Guided Generative Models using Weak Supervision for Detecting Object Spatial Arrangement in Overhead Images

Authors: Weiwei Duan, Yao-Yi Chiang, Stefan Leyk, Johannes H. Uhl, Craig A. Knoblock

Abstract: The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional… ▽ More The increasing availability and accessibility of numerous overhead images allows us to estimate and assess the spatial arrangement of groups of geospatial target objects, which can benefit many applications, such as traffic monitoring and agricultural monitoring. Spatial arrangement estimation is the process of identifying the areas which contain the desired objects in overhead images. Traditional supervised object detection approaches can estimate accurate spatial arrangement but require large amounts of bounding box annotations. Recent semi-supervised clustering approaches can reduce manual labeling but still require annotations for all object categories in the image. This paper presents the target-guided generative model (TGGM), under the Variational Auto-encoder (VAE) framework, which uses Gaussian Mixture Models (GMM) to estimate the distributions of both hidden and decoder variables in VAE. Modeling both hidden and decoder variables by GMM reduces the required manual annotations significantly for spatial arrangement estimation. Unlike existing approaches that the training process can only update the GMM as a whole in the optimization iterations (e.g., a "minibatch"), TGGM allows the update of individual GMM components separately in the same optimization iteration. Optimizing GMM components separately allows TGGM to exploit the semantic relationships in spatial data and requires only a few labels to initiate and guide the generative process. Our experiments shows that TGGM achieves results comparable to the state-of-the-art semi-supervised methods and outperforms unsupervised methods by 10% based on the $F_{1}$ scores, while requiring significantly fewer labeled data. △ Less

Submitted 10 December, 2021; originally announced December 2021.

arXiv:2112.05313 [pdf, other]

Building Autocorrelation-Aware Representations for Fine-Scale Spatiotemporal Prediction

Authors: Yijun Lin, Yao-Yi Chiang, Meredith Franklin, Sandrah P. Eckel, José Luis Ambite

Abstract: Many scientific prediction problems have spatiotemporal data- and modeling-related challenges in handling complex variations in space and time using only sparse and unevenly distributed observations. This paper presents a novel deep learning architecture, Deep learning predictions for LocATion-dependent Time-sEries data (DeepLATTE), that explicitly incorporates theories of spatial statistics into… ▽ More Many scientific prediction problems have spatiotemporal data- and modeling-related challenges in handling complex variations in space and time using only sparse and unevenly distributed observations. This paper presents a novel deep learning architecture, Deep learning predictions for LocATion-dependent Time-sEries data (DeepLATTE), that explicitly incorporates theories of spatial statistics into neural networks to address these challenges. In addition to a feature selection module and a spatiotemporal learning module, DeepLATTE contains an autocorrelation-guided semi-supervised learning strategy to enforce both local autocorrelation patterns and global autocorrelation trends of the predictions in the learned spatiotemporal embedding space to be consistent with the observed data, overcoming the limitation of sparse and unevenly distributed observations. During the training process, both supervised and semi-supervised losses guide the updates of the entire network to: 1) prevent overfitting, 2) refine feature selection, 3) learn useful spatiotemporal representations, and 4) improve overall prediction. We conduct a demonstration of DeepLATTE using publicly available data for an important public health topic, air quality prediction, in a well-studied, complex physical environment - Los Angeles. The experiment demonstrates that the proposed approach provides accurate fine-spatial-scale air quality predictions and reveals the critical environmental factors affecting the results. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Published in ICDM2020

arXiv:2112.01671 [pdf, other]

An Automatic Approach for Generating Rich, Linked Geo-Metadata from Historical Map Images

Authors: Zekun Li, Yao-Yi Chiang, Sasan Tavakkol, Basel Shbita, Johannes H. Uhl, Stefan Leyk, Craig A. Knoblock

Abstract: Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g… ▽ More Historical maps contain detailed geographic information difficult to find elsewhere covering long-periods of time (e.g., 125 years for the historical topographic maps in the US). However, these maps typically exist as scanned images without searchable metadata. Existing approaches making historical maps searchable rely on tedious manual work (including crowd-sourcing) to generate the metadata (e.g., geolocations and keywords). Optical character recognition (OCR) software could alleviate the required manual work, but the recognition results are individual words instead of location phrases (e.g., "Black" and "Mountain" vs. "Black Mountain"). This paper presents an end-to-end approach to address the real-world problem of finding and indexing historical map images. This approach automatically processes historical map images to extract their text content and generates a set of metadata that is linked to large external geospatial knowledge bases. The linked metadata in the RDF (Resource Description Framework) format support complex queries for finding and indexing historical maps, such as retrieving all historical maps covering mountain peaks higher than 1,000 meters in California. We have implemented the approach in a system called mapKurator. We have evaluated mapKurator using historical maps from several sources with various map styles, scales, and coverage. Our results show significant improvement over the state-of-the-art methods. The code has been made publicly available as modules of the Kartta Labs project at https://github.com/kartta-labs/Project. △ Less

Submitted 2 December, 2021; originally announced December 2021.

Comments: 10.1145/3394486.3403381

arXiv:2110.07660 [pdf, other]

A Semi-Supervised Approach for Abnormal Event Prediction on Large Operational Network Time-Series Data

Authors: Yijun Lin, Yao-Yi Chiang

Abstract: Large network logs, recording multivariate time series generated from heterogeneous devices and sensors in a network, can often reveal important information about abnormal activities, such as network intrusions and device malfunctions. Existing machine learning methods for anomaly detection on multivariate time series typically assume that 1) normal sequences would have consistent behavior for tra… ▽ More Large network logs, recording multivariate time series generated from heterogeneous devices and sensors in a network, can often reveal important information about abnormal activities, such as network intrusions and device malfunctions. Existing machine learning methods for anomaly detection on multivariate time series typically assume that 1) normal sequences would have consistent behavior for training unsupervised models, or 2) require a large set of labeled normal and abnormal sequences for supervised models. However, in practice, normal network activities can demonstrate significantly varying sequence patterns (e.g., before and after rerouting partial network traffic). Also, the recorded abnormal events can be sparse. This paper presents a novel semi-supervised method that efficiently captures dependencies between network time series and across time points to generate meaningful representations of network activities for predicting abnormal events. The method can use the limited labeled data to explicitly learn separable embedding space for normal and abnormal samples and effectively leverage unlabeled data to handle training data scarcity. The experiments demonstrate that our approach significantly outperformed state-of-the-art approaches for event detection on a large real-world network log. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: 9 pages + 3 supplementary pages; submitted to SDM2022

arXiv:2102.01397 [pdf, other]

Low-Rate Overuse Flow Tracer (LOFT): An Efficient and Scalable Algorithm for Detecting Overuse Flows

Authors: Simon Scherrer, Che-Yu Wu, Yu-Hsi Chiang, Benjamin Rothenberger, Daniele E. Asoni, Arish Sateesan, Jo Vliegen, Nele Mentens, Hsu-Chun Hsiao, Adrian Perrig

Abstract: Current probabilistic flow-size monitoring can only detect heavy hitters (e.g., flows utilizing 10 times their permitted bandwidth), but cannot detect smaller overuse (e.g., flows utilizing 50-100% more than their permitted bandwidth). Thus, these systems lack accuracy in the challenging environment of high-throughput packet processing, where fast-memory resources are scarce. Nevertheless, many ap… ▽ More Current probabilistic flow-size monitoring can only detect heavy hitters (e.g., flows utilizing 10 times their permitted bandwidth), but cannot detect smaller overuse (e.g., flows utilizing 50-100% more than their permitted bandwidth). Thus, these systems lack accuracy in the challenging environment of high-throughput packet processing, where fast-memory resources are scarce. Nevertheless, many applications rely on accurate flow-size estimation, e.g. for network monitoring, anomaly detection and Quality of Service. We design, analyze, implement, and evaluate LOFT, a new approach for efficiently detecting overuse flows that achieves dramatically better properties than prior work. LOFT can detect 1.5x overuse flows in one second, whereas prior approaches fail to detect 2x overuse flows within a timeout of 300 seconds. We demonstrate LOFT's suitability for high-speed packet processing with implementations in the DPDK framework and on an FPGA. △ Less

Submitted 2 February, 2021; originally announced February 2021.

arXiv:2010.06536 [pdf, other]

Kartta Labs: Collaborative Time Travel

Authors: Sasan Tavakkol, Feng Han, Brandon Mayer, Mark Phillips, Cyrus Shahabi, Yao-Yi Chiang, Raimondas Kiveris

Abstract: We introduce the modular and scalable design of Kartta Labs, an open source, open data, and scalable system for virtually reconstructing cities from historical maps and photos. Kartta Labs relies on crowdsourcing and artificial intelligence consisting of two major modules: Maps and 3D models. Each module, in turn, consists of sub-modules that enable the system to reconstruct a city from historical… ▽ More We introduce the modular and scalable design of Kartta Labs, an open source, open data, and scalable system for virtually reconstructing cities from historical maps and photos. Kartta Labs relies on crowdsourcing and artificial intelligence consisting of two major modules: Maps and 3D models. Each module, in turn, consists of sub-modules that enable the system to reconstruct a city from historical maps and photos. The result is a spatiotemporal reference that can be used to integrate various collected data (curated, sensed, or crowdsourced) for research, education, and entertainment purposes. The system empowers the users to experience collaborative time travel such that they work together to reconstruct the past and experience it on an open source and open data platform. △ Less

Submitted 6 October, 2020; originally announced October 2020.

arXiv:2003.01351 [pdf, other]

DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis

Authors: Mingxuan Yue, Yaguang Li, Haoze Yang, Ritesh Ahuja, Yao-Yi Chiang, Cyrus Shahabi

Abstract: Identifying mobility behaviors in rich trajectory data is of great economic and social interest to various applications including urban planning, marketing and intelligence. Existing work on trajectory clustering often relies on similarity measurements that utilize raw spatial and/or temporal information of trajectories. These measures are incapable of identifying similar moving behaviors that exh… ▽ More Identifying mobility behaviors in rich trajectory data is of great economic and social interest to various applications including urban planning, marketing and intelligence. Existing work on trajectory clustering often relies on similarity measurements that utilize raw spatial and/or temporal information of trajectories. These measures are incapable of identifying similar moving behaviors that exhibit varying spatio-temporal scales of movement. In addition, the expense of labeling massive trajectory data is a barrier to supervised learning models. To address these challenges, we propose an unsupervised neural approach for mobility behavior clustering, called the Deep Embedded TrajEctory ClusTering network (DETECT). DETECT operates in three parts: first it transforms the trajectories by summarizing their critical parts and augmenting them with context derived from their geographical locality (e.g., using POIs from gazetteers). In the second part, it learns a powerful representation of trajectories in the latent space of behaviors, thus enabling a clustering function (such as $k$-means) to be applied. Finally, a clustering oriented loss is directly built on the embedded features to jointly perform feature refinement and cluster assignment, thus improving separability between mobility behaviors. Exhaustive quantitative and qualitative experiments on two real-world datasets demonstrate the effectiveness of our approach for mobility behavior analyses. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: Published as a conference paper at BigData 2019

arXiv:1909.00955 [pdf, other]

doi 10.1145/3274895.3274911

Los Angeles Metro Bus Data Analysis Using GPS Trajectory and Schedule Data (Demo Paper)

Authors: Kien Nguyen, **gyun Yang, Yijun Lin, Jianfa Lin, Yao-Yi Chiang, Cyrus Shahabi

Abstract: With the widespread installation of location-enabled devices on public transportation, public vehicles are generating massive amounts of trajectory data in real time. However, using these trajectory data for meaningful analysis requires careful considerations in storing, managing, processing, and visualizing the data. Using the location data of the Los Angeles Metro bus system, along with publicly… ▽ More With the widespread installation of location-enabled devices on public transportation, public vehicles are generating massive amounts of trajectory data in real time. However, using these trajectory data for meaningful analysis requires careful considerations in storing, managing, processing, and visualizing the data. Using the location data of the Los Angeles Metro bus system, along with publicly available bus schedule data, we conduct a data processing and analyses study to measure the performance of the public transportation system in Los Angeles utilizing a number of metrics including travel-time reliability, on-time performance, bus bunching, and travel-time estimation. We demonstrate the visualization of the data analysis results through an interactive web-based application. The developed algorithms and system provide powerful tools to detect issues and improve the efficiency of public transportation systems. △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: SIGSPATIAL'18, demo paper, 4 pages

Journal ref: 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL '18), 2018

arXiv:1906.06154 [pdf, other]

Soft Subdivision Motion Planning for Complex Planar Robots

Authors: Bo Zhou, Yi-Jen Chiang, Chee Yap

Abstract: The design and implementation of theoretically-sound robot motion planning algorithms is challenging. Within the framework of resolution-exact algorithms, it is possible to exploit soft predicates for collision detection. The design of soft predicates is a balancing act between easily implementable predicates and their accuracy/effectivity. In this paper, we focus on the class of planar polygona… ▽ More The design and implementation of theoretically-sound robot motion planning algorithms is challenging. Within the framework of resolution-exact algorithms, it is possible to exploit soft predicates for collision detection. The design of soft predicates is a balancing act between easily implementable predicates and their accuracy/effectivity. In this paper, we focus on the class of planar polygonal rigid robots with arbitrarily complex geometry. We exploit the remarkable decomposability property of soft collision-detection predicates of such robots. We introduce a general technique to produce such a decomposition. If the robot is an m-gon, the complexity of this approach scales linearly in m. This contrasts with the O(m^3) complexity known for exact planners. It follows that we can now routinely produce soft predicates for any rigid polygonal robot. This results in resolution-exact planners for such robots within the general Soft Subdivision Search (SSS) framework. This is a significant advancement in the theory of sound and complete planners for planar robots. We implemented such decomposed predicates in our open-source Core Library. The experiments show that our algorithms are effective, perform in real time on non-trivial environments, and can outperform many sampling-based methods. △ Less

Submitted 14 June, 2019; originally announced June 2019.

Comments: The conference version of this paper appeared in Proc. European Symposium on Algorithms (ESA 2018), pages 73:1-73:14, 2018

arXiv:1903.09416 [pdf, other]

Rods and Rings: Soft Subdivision Planner for R^3 x S^2

Authors: Ching-Hsiang Hsu, Yi-Jen Chiang, Chee Yap

Abstract: We consider path planning for a rigid spatial robot moving amidst polyhedral obstacles. Our robot is either a rod or a ring. Being axially-symmetric, their configuration space is R^3 x S^2 with 5 degrees of freedom (DOF). Correct, complete and practical path planning for such robots is a long standing challenge in robotics. While the rod is one of the most widely studied spatial robots in path pla… ▽ More We consider path planning for a rigid spatial robot moving amidst polyhedral obstacles. Our robot is either a rod or a ring. Being axially-symmetric, their configuration space is R^3 x S^2 with 5 degrees of freedom (DOF). Correct, complete and practical path planning for such robots is a long standing challenge in robotics. While the rod is one of the most widely studied spatial robots in path planning, the ring seems to be new, and a rare example of a non-simply-connected robot. This work provides rigorous and complete algorithms for these robots with theoretical guarantees. We implemented the algorithms in our open-source Core Library. Experiments show that they are practical, achieving near real-time performance. We compared our planner to state-of-the-art sampling planners in OMPL. Our subdivision path planner is based on the twin foundations of ε-exactness and soft predicates. Correct implementation is relatively easy. The technical innovations include subdivision atlases for S^2, introduction of Σ_2 representations for footprints, and extensions of our feature-based technique for "opening up the blackbox of collision detection". △ Less

Submitted 6 June, 2019; v1 submitted 22 March, 2019; originally announced March 2019.

Comments: Conference version to appear in Proc. Symposium on Computational Geometry (SoCG '19), June, 2019. This is the full version, 25 pages. Some typo regarding a reference to an appendix section was fixed. References were further revised/corrected

arXiv:1903.03760 [pdf, other]

doi 10.1109/TIFS.2019.2899758

SAFECHAIN: Securing Trigger-Action Programming from Attack Chains (Extended Technical Report)

Authors: Kai-Hsiang Hsu, Yu-Hsi Chiang, Hsu-Chun Hsiao

Abstract: The proliferation of Internet of Things (IoT) is resha** our lifestyle. With IoT sensors and devices communicating with each other via the Internet, people can customize automation rules to meet their needs. Unless carefully defined, however, such rules can easily become points of security failure as the number of devices and complexity of rules increase. Device owners may end up unintentionally… ▽ More The proliferation of Internet of Things (IoT) is resha** our lifestyle. With IoT sensors and devices communicating with each other via the Internet, people can customize automation rules to meet their needs. Unless carefully defined, however, such rules can easily become points of security failure as the number of devices and complexity of rules increase. Device owners may end up unintentionally providing access or revealing private information to unauthorized entities due to complex chain reactions among devices. Prior work on trigger-action programming either focuses on conflict resolution or usability issues, or fails to accurately and efficiently detect such attack chains. This paper explores security vulnerabilities when users have the freedom to customize automation rules using trigger-action programming. We define two broad classes of attack--privilege escalation and privacy leakage--and present a practical model-checking-based system called SAFECHAIN that detects hidden attack chains exploiting the combination of rules. Built upon existing model-checking techniques, SAFECHAIN identifies attack chains by modeling the IoT ecosystem as a Finite State Machine. To improve practicability, SAFECHAIN avoids the need to accurately model an environment by frequently re-checking the automation rules given the current states, and employs rule-aware optimizations to further reduce overhead. Our comparative analysis shows that SAFECHAIN can efficiently and accurately identify attack chains, and our prototype implementation of SAFECHAIN can verify 100 rules in less than one second with no false positives. △ Less

Submitted 9 March, 2019; originally announced March 2019.

Comments: This is the extended technical report of the 16-page version on IEEE TIFS. Manuscript received April 27, 2018; accepted February 9, 2019

arXiv:1809.01540 [pdf]

Fail-Stop Group Signature Scheme

Authors: Yi-Yuan Chiang, Wang-Hsin Hsu, Wen-Yen Lin, Jonathan Jen-Rong Chen

Abstract: In this paper, we propose a Fail-Stop Group Signature Scheme (FSGSS). FSGSS combines the features of the Group Signature and the Fail-Stop Signature to enhance the security level of the original Group Signature. Assuming that the FSGSS encounters an attack by a hacker armed with a supercomputer, this scheme can prove that the digital signature is indeed forged. Based on the above objectives, this… ▽ More In this paper, we propose a Fail-Stop Group Signature Scheme (FSGSS). FSGSS combines the features of the Group Signature and the Fail-Stop Signature to enhance the security level of the original Group Signature. Assuming that the FSGSS encounters an attack by a hacker armed with a supercomputer, this scheme can prove that the digital signature is indeed forged. Based on the above objectives, this paper proposes three lemmas and proves that they are indeed feasible. First, how does a recipient of a digitally signed document verify the authenticity of the signature? Second, when a digitally signed document is under dispute, how can the group's manager find out the identity of the original group member who signed the document, if necessary for an investigation? Third, how can we prove that the signature is indeed forged following an external attack from a supercomputer? Soon, in a future paper, we will extend this work to make the scheme even more effective. Following an attack, the signature could be proved to be forged without the need to expose the key. △ Less

Submitted 5 September, 2018; originally announced September 2018.

Comments: 11 pages, 2 figures, 1 table

arXiv:1801.10300 [pdf, other]

Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures

Authors: Wen Hua Lin, Kuan-Ting Chen, Hung Yueh Chiang, Winston Hsu

Abstract: Recently, deep neural network models have achieved promising results in image captioning task. Yet, "vanilla" sentences, only describing shallow appearances (e.g., types, colors), generated by current works are not satisfied netizen style resulting in lacking engagements, contexts, and user intentions. To tackle this problem, we propose Netizen Style Commenting (NSC), to automatically generate cha… ▽ More Recently, deep neural network models have achieved promising results in image captioning task. Yet, "vanilla" sentences, only describing shallow appearances (e.g., types, colors), generated by current works are not satisfied netizen style resulting in lacking engagements, contexts, and user intentions. To tackle this problem, we propose Netizen Style Commenting (NSC), to automatically generate characteristic comments to a user-contributed fashion photo. We are devoted to modulating the comments in a vivid "netizen" style which reflects the culture in a designated social community and hopes to facilitate more engagement with users. In this work, we design a novel framework that consists of three major components: (1) We construct a large-scale clothing dataset named NetiLook, which contains 300K posts (photos) with 5M comments to discover netizen-style comments. (2) We propose three unique measures to estimate the diversity of comments. (3) We bring diversity by marrying topic models with neural networks to make up the insufficiency of conventional image captioning works. Experimenting over Flickr30k and our NetiLook datasets, we demonstrate our proposed approaches benefit fashion photo commenting and improve image captioning tasks both in accuracy and diversity. △ Less

Submitted 31 January, 2018; originally announced January 2018.

Comments: The Web Conference (WWW) 2018

arXiv:1611.02512 [pdf, other]

Cognitive Discriminative Map**s for Rapid Learning

Authors: Wen-Chieh Fang, Yi-ting Chiang

Abstract: Humans can learn concepts or recognize items from just a handful of examples, while machines require many more samples to perform the same task. In this paper, we build a computational model to investigate the possibility of this kind of rapid learning. The proposed method aims to improve the learning task of input from sensory memory by leveraging the information retrieved from long-term memory.… ▽ More Humans can learn concepts or recognize items from just a handful of examples, while machines require many more samples to perform the same task. In this paper, we build a computational model to investigate the possibility of this kind of rapid learning. The proposed method aims to improve the learning task of input from sensory memory by leveraging the information retrieved from long-term memory. We present a simple and intuitive technique called cognitive discriminative map**s (CDM) to explore the cognitive problem. First, CDM separates and clusters the data instances retrieved from long-term memory into distinct classes with a discrimination method in working memory when a sensory input triggers the algorithm. CDM then maps each sensory data instance to be as close as possible to the median point of the data group with the same class. The experimental results demonstrate that the CDM approach is effective for learning the discriminative features of supervised classifications with few training sensory input instances. △ Less

Submitted 8 November, 2016; originally announced November 2016.

arXiv:cs/0102006 [pdf, ps, other]

doi 10.1137/S0097539702411381

Orderly Spanning Trees with Applications

Authors: Yi-Ting Chiang, Ching-Chi Lin, Hsueh-I Lu

Abstract: We introduce and study the {\em orderly spanning trees} of plane graphs. This algorithmic tool generalizes {\em canonical orderings}, which exist only for triconnected plane graphs. Although not every plane graph admits an orderly spanning tree, we provide an algorithm to compute an {\em orderly pair} for any connected planar graph $G$, consisting of a plane graph $H$ of $G$, and an orderly span… ▽ More We introduce and study the {\em orderly spanning trees} of plane graphs. This algorithmic tool generalizes {\em canonical orderings}, which exist only for triconnected plane graphs. Although not every plane graph admits an orderly spanning tree, we provide an algorithm to compute an {\em orderly pair} for any connected planar graph $G$, consisting of a plane graph $H$ of $G$, and an orderly spanning tree of $H$. We also present several applications of orderly spanning trees: (1) a new constructive proof for Schnyder's Realizer Theorem, (2) the first area-optimal 2-visibility drawing of $G$, and (3) the best known encodings of $G$ with O(1)-time query support. All algorithms in this paper run in linear time. △ Less

Submitted 13 July, 2002; v1 submitted 7 February, 2001; originally announced February 2001.

Comments: 25 pages, 7 figures, A preliminary version appeared in Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001), Washington D.C., USA, January 7-9, 2001, pp. 506-515

ACM Class: F.2.2; G.2.2; E.4; E.1

Journal ref: SIAM Journal on Computing 34(4): 924-945 (2005)

Showing 1–40 of 40 results for author: Chiang, Y