Search | arXiv e-print repository

arXiv:2405.05355 [pdf, other]

Geometry-Informed Distance Candidate Selection for Adaptive Lightweight Omnidirectional Stereo Vision with Fisheye Images

Authors: Conner Pulling, Je Hon Tan, Yaoyu Hu, Sebastian Scherer

Abstract: Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduc… ▽ More Multi-view stereo omnidirectional distance estimation usually needs to build a cost volume with many hypothetical distance candidates. The cost volume building process is often computationally heavy considering the limited resources a mobile robot has. We propose a new geometry-informed way of distance candidates selection method which enables the use of a very small number of candidates and reduces the computational cost. We demonstrate the use of the geometry-informed candidates in a set of model variants. We find that by adjusting the candidates during robot deployment, our geometry-informed distance candidates also improve a pre-trained model's accuracy if the extrinsics or the number of cameras changes. Without any re-training or fine-tuning, our models outperform models trained with evenly distributed distance candidates. Models are also released as hardware-accelerated versions with a new dedicated large-scale dataset. The project page, code, and dataset can be found at https://theairlab.org/gicandidates/ . △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2402.03752 [pdf, other]

Pre-training of Lightweight Vision Transformers on Small Datasets with Minimally Scaled Images

Authors: Jen Hong Tan

Abstract: Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets i… ▽ More Can a lightweight Vision Transformer (ViT) match or exceed the performance of Convolutional Neural Networks (CNNs) like ResNet on small datasets with small image resolutions? This report demonstrates that a pure ViT can indeed achieve superior performance through pre-training, using a masked auto-encoder technique with minimal image scaling. Our experiments on the CIFAR-10 and CIFAR-100 datasets involved ViT models with fewer than 3.65 million parameters and a multiply-accumulate (MAC) count below 0.27G, qualifying them as 'lightweight' models. Unlike previous approaches, our method attains state-of-the-art performance among similar lightweight transformer-based architectures without significantly scaling up images from CIFAR-10 and CIFAR-100. This achievement underscores the efficiency of our model, not only in handling small datasets but also in effectively processing images close to their original scale. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 7 pages, 6 figures

arXiv:2202.05451 [pdf, other]

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

Authors: Jia Huei Tan, Ying Hua Tan, Chee Seng Chan, Joon Huang Chuah

Abstract: Recent research that applies Transformer-based architectures to image captioning has resulted in state-of-the-art image captioning performance, capitalising on the success of Transformers on natural language tasks. Unfortunately, though these models work well, one major flaw is their large model sizes. To this end, we present three parameter reduction methods for image captioning Transformers: Rad… ▽ More Recent research that applies Transformer-based architectures to image captioning has resulted in state-of-the-art image captioning performance, capitalising on the success of Transformers on natural language tasks. Unfortunately, though these models work well, one major flaw is their large model sizes. To this end, we present three parameter reduction methods for image captioning Transformers: Radix Encoding, cross-layer parameter sharing, and attention parameter sharing. By combining these methods, our proposed ACORT models have 3.7x to 21.6x fewer parameters than the baseline model without compromising test performance. Results on the MS-COCO dataset demonstrate that our ACORT models are competitive against baselines and SOTA approaches, with CIDEr score >=126. Finally, we present qualitative results and ablation studies to demonstrate the efficacy of the proposed changes further. Code and pre-trained models are publicly available at https://github.com/jiahuei/sparse-image-captioning. △ Less

Submitted 11 February, 2022; originally announced February 2022.

Comments: Neurocomputing; In Press

arXiv:2112.14574 [pdf]

Industry 4.0: Challenges and success factors for adopting digital technologies in airports

Authors: Jia Hao Tan, Tariq Masood

Abstract: With the advent of Industry 4.0 technologies in the last decade, airports have undergone digitalisation to capitalise on the purported benefits of these technologies such as improved operational efficiency and passenger experience. The ongoing COVID-19 pandemic with emergence of its variants (e.g. Delta, Omicron) has exacerbated the need for airports to adopt new technologies such as contactless a… ▽ More With the advent of Industry 4.0 technologies in the last decade, airports have undergone digitalisation to capitalise on the purported benefits of these technologies such as improved operational efficiency and passenger experience. The ongoing COVID-19 pandemic with emergence of its variants (e.g. Delta, Omicron) has exacerbated the need for airports to adopt new technologies such as contactless and robotic technologies to facilitate travel during this pandemic. However, there is limited knowledge of recent challenges and success factors for adoption of digital technologies in airports. Therefore, through an industry survey of airport operators and managers around the world (n=102, 0.754<Composite Reliability<0.892; conducted during COVID-19), this study identifies the challenges faced in adopting Industry 4.0 technologies (n=20) as well as enhances understanding of best practices or success factors that supported technology adoption in airports. The widely used technology, organisation, environment (TOE) framework is used as a theoretically basis for the quantitative part of the questionnaire. A complementary qualitative part is used to underpin and extend the findings. The industry survey is the first-of-its-kind that was conducted to understand the implementation challenges that airport operators face in adopting Industry 4.0 technologies in the airport. The survey results have shown that that the Industry 4.0 technologies were not implemented to a similar extent in airports despite the generic challenges that were faced in adopting the various Industry 4.0 technologies in the airport. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: 25 pages, 4 figures, 9 tables

arXiv:2112.14333 [pdf]

Adoption of Industry 4.0 technologies in airports -- A systematic literature review

Authors: Jia Hao Tan, Tariq Masood

Abstract: Airports have been constantly evolving and adopting digital technologies to improve operational efficiency, enhance passenger experience, generate ancillary revenues and boost capacity from existing infrastructure. The COVID-19 pandemic has also challenged airports and aviation stakeholders alike to adapt and manage new operational challenges such as facilitating a contactless travel experience an… ▽ More Airports have been constantly evolving and adopting digital technologies to improve operational efficiency, enhance passenger experience, generate ancillary revenues and boost capacity from existing infrastructure. The COVID-19 pandemic has also challenged airports and aviation stakeholders alike to adapt and manage new operational challenges such as facilitating a contactless travel experience and ensuring business continuity. Digitalisation using Industry 4.0 technologies offers opportunities for airports to address short-term challenges associated with the COVID-19 pandemic while also preparing for future long-term challenges that ensue the crisis. Through a systematic literature review of 102 relevant articles, we discuss the current state of adoption of Industry 4.0 technologies in airports, the associated challenges as well as future research directions. The results of this review suggest that the implementation of Industry 4.0 technologies is slowly gaining traction within the airport environment, and shall continue to remain relevant in the digital transformation journeys in develo** future airports. △ Less

Submitted 28 December, 2021; originally announced December 2021.

Comments: 25 pages, 2 figures, 2 tables, 106 references

arXiv:2112.13384 [pdf, other]

doi 10.1145/3487351.3488276

Will You Dance To The Challenge? Predicting User Participation of TikTok Challenges

Authors: Lynnette Hui Xian Ng, John Yeh Han Tan, Darryl **g Heng Tan, Roy Ka-Wei Lee

Abstract: TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform… ▽ More TikTok is a popular new social media, where users express themselves through short video clips. A common form of interaction on the platform is participating in "challenges", which are songs and dances for users to iterate upon. Challenge contagion can be measured through replication reach, i.e., users uploading videos of their participation in the challenges. The uniqueness of the TikTok platform where both challenge content and user preferences are evolving requires the combination of challenge and user representation. This paper investigates social contagion of TikTok challenges through predicting a user's participation. We propose a novel deep learning model, deepChallenger, to learn and combine latent user and challenge representations from past videos to perform this user-challenge prediction task. We collect a dataset of over 7,000 videos from 12 trending challenges on the ForYouPage, the app's landing page, and over 10,000 videos from 1303 users. Extensive experiments are conducted and the results show that our proposed deepChallenger (F1=0.494) outperforms baselines (F1=0.188) in the prediction task. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Comments: Accepted at ASONAM 2021

arXiv:2110.03298 [pdf, other]

doi 10.1016/j.patcog.2021.108366

End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

Authors: Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah

Abstract: With the advancement of deep models, research work on image captioning has led to a remarkable gain in raw performance over the last decade, along with increasing model complexity and computational cost. However, surprisingly works on compression of deep networks for image captioning task has received little to no attention. For the first time in image captioning research, we provide an extensive… ▽ More With the advancement of deep models, research work on image captioning has led to a remarkable gain in raw performance over the last decade, along with increasing model complexity and computational cost. However, surprisingly works on compression of deep networks for image captioning task has received little to no attention. For the first time in image captioning research, we provide an extensive comparison of various unstructured weight pruning methods on three different popular image captioning architectures, namely Soft-Attention, Up-Down and Object Relation Transformer. Following this, we propose a novel end-to-end weight pruning method that performs gradual sparsification based on weight sensitivity to the training loss. The pruning schemes are then extended with encoder pruning, where we show that conducting both decoder pruning and training simultaneously prior to the encoder pruning provides good overall performance. Empirically, we show that an 80% to 95% sparse network (up to 75% reduction in model size) can either match or outperform its dense counterpart. The code and pre-trained models for Up-Down and Object Relation Transformer that are capable of achieving CIDEr scores >120 on the MS-COCO dataset but with only 8.7 MB and 14.5 MB in model size (size reduction of 96% and 94% respectively against dense versions) are publicly available at https://github.com/jiahuei/sparse-image-captioning. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Pattern Recognition; In Press

arXiv:1908.10797 [pdf, other]

Image Captioning with Sparse Recurrent Neural Network

Authors: Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah

Abstract: Recurrent Neural Network (RNN) has been widely used to tackle a wide variety of language generation problems and are capable of attaining state-of-the-art (SOTA) performance. However despite its impressive results, the large number of parameters in the RNN model makes deployment to mobile and embedded devices infeasible. Driven by this problem, many works have proposed a number of pruning methods… ▽ More Recurrent Neural Network (RNN) has been widely used to tackle a wide variety of language generation problems and are capable of attaining state-of-the-art (SOTA) performance. However despite its impressive results, the large number of parameters in the RNN model makes deployment to mobile and embedded devices infeasible. Driven by this problem, many works have proposed a number of pruning methods to reduce the sizes of the RNN model. In this work, we propose an end-to-end pruning method for image captioning models equipped with visual attention. Our proposed method is able to achieve sparsity levels up to 97.5% without significant performance loss relative to the baseline (~ 2% loss at 40x compression after fine-tuning). Our method is also simple to use and tune, facilitating faster development times for neural network practitioners. We perform extensive experiments on the popular MS-COCO dataset in order to empirically validate the efficacy of our proposed method. △ Less

Submitted 28 October, 2019; v1 submitted 28 August, 2019; originally announced August 2019.

Comments: Corrected Eq 11, updated Table 5

arXiv:1903.01072 [pdf, other]

doi 10.1109/TMM.2019.2904878

COMIC: Towards A Compact Image Captioning Model with Attention

Authors: Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah

Abstract: Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of v… ▽ More Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to be deployed on embedded system with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that, our proposed model, named COMIC for COMpact Image Captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedding vocabulary size that is 39x - 99x smaller. The source code and models are available at: https://github.com/jiahuei/COMIC-Compact-Image-Captioning-with-Attention △ Less

Submitted 11 June, 2019; v1 submitted 4 March, 2019; originally announced March 2019.

Comments: Added source code link and new results in Table 3

arXiv:1702.00509 [pdf]

Segmentation of optic disc, fovea and retinal vasculature using a single convolutional neural network

Authors: Jen Hong Tan, U. Rajendra Acharya, Sulatha V. Bhandary, Kuang Chua Chua, Sobha Sivaprasad

Abstract: We have developed and trained a convolutional neural network to automatically and simultaneously segment optic disc, fovea and blood vessels. Fundus images were normalised before segmentation was performed to enforce consistency in background lighting and contrast. For every effective point in the fundus image, our algorithm extracted three channels of input from the neighbourhood of the point and… ▽ More We have developed and trained a convolutional neural network to automatically and simultaneously segment optic disc, fovea and blood vessels. Fundus images were normalised before segmentation was performed to enforce consistency in background lighting and contrast. For every effective point in the fundus image, our algorithm extracted three channels of input from the neighbourhood of the point and forward the response across the 7 layer network. In average, our segmentation achieved an accuracy of 92.68 percent on the testing set from Drive database. △ Less

Submitted 1 February, 2017; originally announced February 2017.

arXiv:1402.6387 [pdf]

Active spline model: A shape based model-interactive segmentation

Authors: Jen Hong Tan, U. Rajendra Acharya

Abstract: Rarely in literature a method of segmentation cares for the edit after the algorithm delivers. They provide no solution when segmentation goes wrong. We propose to formulate point distribution model in terms of centripetal-parameterized Catmull-Rom spline. Such fusion brings interactivity to model-based segmentation, so that edit is better handled. When the delivered segment is unsatisfactory, use… ▽ More Rarely in literature a method of segmentation cares for the edit after the algorithm delivers. They provide no solution when segmentation goes wrong. We propose to formulate point distribution model in terms of centripetal-parameterized Catmull-Rom spline. Such fusion brings interactivity to model-based segmentation, so that edit is better handled. When the delivered segment is unsatisfactory, user simply shifts points to vary the curve. We ran the method on three disparate imaging modalities and achieved an average overlap of 0.879 for automated lung segmentation on chest radiographs. The edit afterward improved the average overlap to 0.945, with a minimum of 0.925. The source code and the demo video are available at http://wp.me/p3vCKy-2S △ Less

Submitted 25 February, 2014; originally announced February 2014.

Comments: submitted to Computers in biology and Medicine, second revision

Showing 1–11 of 11 results for author: Tan, J H