Search | arXiv e-print repository

Improve ROI with Causal Learning and Conformal Prediction

Authors: Meng Ai, Zhuo Chen, Jibin Wang, **g Shang, Tao Tao, Zhen Li

Abstract: In the commercial sphere, such as operations and maintenance, advertising, and marketing recommendations, intelligent decision-making utilizing data mining and neural network technologies is crucial, especially in resource allocation to optimize ROI. This study delves into the Cost-aware Binary Treatment Assignment Problem (C-BTAP) across different industries, with a focus on the state-of-the-art… ▽ More In the commercial sphere, such as operations and maintenance, advertising, and marketing recommendations, intelligent decision-making utilizing data mining and neural network technologies is crucial, especially in resource allocation to optimize ROI. This study delves into the Cost-aware Binary Treatment Assignment Problem (C-BTAP) across different industries, with a focus on the state-of-the-art Direct ROI Prediction (DRP) method. However, the DRP model confronts issues like covariate shift and insufficient training data, hindering its real-world effectiveness. Addressing these challenges is essential for ensuring dependable and robust predictions in varied operational contexts. This paper presents a robust Direct ROI Prediction (rDRP) method, designed to address challenges in real-world deployment of neural network-based uplift models, particularly under conditions of covariate shift and insufficient training data. The rDRP method, enhancing the standard DRP model, does not alter the model's structure or require retraining. It utilizes conformal prediction and Monte Carlo dropout for interval estimation, adapting to model uncertainty and data distribution shifts. A heuristic calibration method, inspired by a Kaggle competition, combines point and interval estimates. The effectiveness of these approaches is validated through offline tests and online A/B tests in various settings, demonstrating significant improvements in target rewards compared to the state-of-the-art method. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by ICDE 2024; Link: https://icde2024.github.io/papers.html

arXiv:2406.20098 [pdf, other]

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, **hong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-to-code dataset for instruction tuning and an evaluation framework for the webpage understanding and HTML code translation abilities of MLLMs. For dataset construction, we leverage pretrained LLMs to enhance existing webpage-to-code datasets as well as generate a diverse pool of new webpages rendered into images. Specifically, the inputs are webpage images and instructions, while the responses are the webpage's HTML code. We further include diverse natural language QA pairs about the webpage content in the responses to enable a more comprehensive understanding of the web content. To evaluate model performance in these tasks, we develop an evaluation framework for testing MLLMs' abilities in webpage understanding and web-to-code generation. Extensive experiments show that our proposed dataset is beneficial not only to our proposed tasks but also in the general visual domain, while previous datasets result in worse performance. We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation. Our data and code will be available at https://github.com/MBZUAI-LLM/web2code. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: Website at https://mbzuai-llm.github.io/webpage2code/

arXiv:2406.12923 [pdf, other]

Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction

Authors: Wenzhao Jiang, **dong Han, Hao Liu, Tao Tao, Naiqiang Tan, Hui Xiong

Abstract: Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time esti… ▽ More Rapid urbanization has significantly escalated traffic congestion, underscoring the need for advanced congestion prediction services to bolster intelligent transportation systems. As one of the world's largest ride-hailing platforms, DiDi places great emphasis on the accuracy of congestion prediction to enhance the effectiveness and reliability of their real-time services, such as travel time estimation and route planning. Despite numerous efforts have been made on congestion prediction, most of them fall short in handling heterogeneous and dynamic spatio-temporal dependencies (e.g., periodic and non-periodic congestions), particularly in the presence of noisy and incomplete traffic data. In this paper, we introduce a Congestion Prediction Mixture-of-Experts, CP-MoE, to address the above challenges. We first propose a sparsely-gated Mixture of Adaptive Graph Learners (MAGLs) with congestion-aware inductive biases to improve the model capacity for efficiently capturing complex spatio-temporal dependencies in varying traffic scenarios. Then, we devise two specialized experts to help identify stable trends and periodic patterns within the traffic data, respectively. By cascading these experts with MAGLs, CP-MoE delivers congestion predictions in a more robust and interpretable manner. Furthermore, an ordinal regression strategy is adopted to facilitate effective collaboration among diverse experts. Extensive experiments on real-world datasets demonstrate the superiority of our proposed method compared with state-of-the-art spatio-temporal prediction models. More importantly, CP-MoE has been deployed in DiDi to improve the accuracy and reliability of the travel time estimation system. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09455 [pdf, other]

Pandora: Towards General World Model with Natural Language Actions and Video States

Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Website: https://world-model.maitrix.org/

arXiv:2405.13899 [pdf, ps, other]

Symmetric Linear Bandits with Hidden Symmetry

Authors: Nam Phuong Tran, The Anh Ta, Debmalya Mandal, Long Tran-Thanh

Abstract: High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the… ▽ More High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{1/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.13349 [pdf, other]

Building a Verifiable Logical Clock for P2P Networks

Authors: Guangda Sun, Tianyang Tao, Yanpei Guo, Michael Yiqing Hu, Jialin Li

Abstract: Logical clocks are a fundamental tool to establish causal ordering of events in a distributed system. They have been applied in weakly consistent storage systems, causally ordered broadcast, distributed snapshots, deadlock detection, and distributed system debugging. However, prior logical clock constructs fail to work in an open network with Byzantine participants. In this work, we present Chrono… ▽ More Logical clocks are a fundamental tool to establish causal ordering of events in a distributed system. They have been applied in weakly consistent storage systems, causally ordered broadcast, distributed snapshots, deadlock detection, and distributed system debugging. However, prior logical clock constructs fail to work in an open network with Byzantine participants. In this work, we present Chrono, a novel logical clock system that targets such challenging environment. We first redefine causality properties among distributed processes under the Byzantine failure model. To enforce these properties, Chrono defines a new validator abstraction for building fault-tolerant logical clocks. Furthermore, our validator abstraction is customizable: Chrono includes multiple backend implementations for the abstraction, each with different security-performance trade-offs. We have applied Chrono to build two decentralized applications, a mutual exclusive service and a weakly consistent key-value store. Chrono adds only marginal overhead compared to systems that tolerate no Byzantine faults. It also out-performs state-of-the-art BFT total order protocols by significant margins. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10691 [pdf, other]

LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the develo** brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2403.14918 [pdf, other]

Deep learning-based method for weather forecasting: A case study in Itoshima

Authors: Yuzhong Cheng, Linh Thi Hoai Nguyen, Akinori Ozaki, Ton Viet Ta

Abstract: Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed archite… ▽ More Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed architecture demonstrates superior performance compared to existing models, surpassing benchmarks such as Long Short-Term Memory and Recurrent Neural Networks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2402.13776 [pdf, other]

Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, making longitudinal infant brain atlas construction and developmental trajectory delineation quite challenging. Thanks to the development of an AI-based generative model, neuroimage completion has become a powerful technique to retain as much available data as possible. However, current image completion methods usually suffer from inconsistency within each individual subject in the time dimension, compromising the overall quality. To solve this problem, our paper proposed a two-stage cascaded diffusion model, Cas-DiffCom, for dense and longitudinal 3D infant brain MRI completion and super-resolution. We applied our proposed method to the Baby Connectome Project (BCP) dataset. The experiment results validate that Cas-DiffCom achieves both individual consistency and high fidelity in longitudinal infant brain image completion. We further applied the generated infant brain images to two downstream tasks, brain tissue segmentation and developmental trajectory delineation, to declare its task-oriented potential in the neuroscience field. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.07067 [pdf, other]

Learning the Expected Core of Strictly Convex Stochastic Cooperative Games

Authors: Nam Phuong Tran, The Anh Ta, Shuqing Shi, Debmalya Mandal, Yali Du, Long Tran-Thanh

Abstract: Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in dete… ▽ More Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in deterministic games or the reward distribution in stochastic games. However, this is unrealistic, as the reward function or distribution is often only partially known and may be subject to uncertainty. In this paper, we consider the core learning problem in stochastic cooperative games, where the reward distribution is unknown. Our goal is to learn the expected core, that is, the set of allocations that are stable in expectation, given an oracle that returns a stochastic reward for an enquired coalition each round. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a point in the expected core given a polynomial number of samples, with high probability. To analyse the algorithm, we develop a new extension of the separation hyperplane theorem for multiple convex sets. △ Less

Submitted 22 May, 2024; v1 submitted 10 February, 2024; originally announced February 2024.

arXiv:2401.16820 [pdf, other]

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Authors: Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, **yuan Jia, Jiaheng Zhang

Abstract: Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text gen… ▽ More Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user. This work introduces a novel watermarking method for LLM-generated text grounded in \textbf{error-correction codes} to address this challenge. We provide strong theoretical analysis, demonstrating that under bounded adversarial word/token edits (insertion, deletion, and substitution), our method can correctly extract watermarks, offering a provable robustness guarantee. This breakthrough is also evidenced by our extensive experimental results. The experiments show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a bit string of length 12 into a 200-token generated text, our approach attains an impressive match rate of $98.4\%$, surpassing the performance of Yoo et al. (state-of-the-art baseline) at $85.6\%$. When subjected to a copy-paste attack involving the injection of 50 tokens to generated texts with 200 words, our method maintains a substantial match rate of $90.8\%$, while the match rate of Yoo et al. diminishes to below $65\%$. △ Less

Submitted 15 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2312.10988 [pdf, other]

Graph Invariant Learning with Subgraph Co-mixup for Out-Of-Distribution Generalization

Authors: Tianrui Jia, Haoyang Li, Cheng Yang, Tao Tao, Chuan Shi

Abstract: Graph neural networks (GNNs) have been demonstrated to perform well in graph representation learning, but always lacking in generalization capability when tackling out-of-distribution (OOD) data. Graph invariant learning methods, backed by the invariance principle among defined multiple environments, have shown effectiveness in dealing with this issue. However, existing methods heavily rely on wel… ▽ More Graph neural networks (GNNs) have been demonstrated to perform well in graph representation learning, but always lacking in generalization capability when tackling out-of-distribution (OOD) data. Graph invariant learning methods, backed by the invariance principle among defined multiple environments, have shown effectiveness in dealing with this issue. However, existing methods heavily rely on well-predefined or accurately generated environment partitions, which are hard to be obtained in practice, leading to sub-optimal OOD generalization performances. In this paper, we propose a novel graph invariant learning method based on invariant and variant patterns co-mixup strategy, which is capable of jointly generating mixed multiple environments and capturing invariant patterns from the mixed graph data. Specifically, we first adopt a subgraph extractor to identify invariant subgraphs. Subsequently, we design one novel co-mixup strategy, i.e., jointly conducting environment Mixup and invariant Mixup. For the environment Mixup, we mix the variant environment-related subgraphs so as to generate sufficiently diverse multiple environments, which is important to guarantee the quality of the graph invariant learning. For the invariant Mixup, we mix the invariant subgraphs, further encouraging to capture invariant patterns behind graphs while getting rid of spurious correlations for OOD generalization. We demonstrate that the proposed environment Mixup and invariant Mixup can mutually promote each other. Extensive experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-art under various distribution shifts. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Has been accepted at the 38th AAAI Conference on Artificial Intelligence (AAAI-24)

arXiv:2312.06550 [pdf, other]

LLM360: Towards Fully Transparent Open-Source LLMs

Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Li** Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder progress in the field by degrading transparency into the training of LLMs and forcing teams to rediscover many details in the training process. We present LLM360, an initiative to fully open-source LLMs, which advocates for all training code and data, model checkpoints, and intermediate results to be made available to the community. The goal of LLM360 is to support open and collaborative AI research by making the end-to-end LLM training process transparent and reproducible by everyone. As a first step of LLM360, we release two 7B parameter LLMs pre-trained from scratch, Amber and CrystalCoder, including their training code, data, intermediate checkpoints, and analyses (at https://www.llm360.ai). We are committed to continually pushing the boundaries of LLMs through this open-source effort. More large-scale and stronger models are underway and will be released in the future. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2310.17189 [pdf, other]

Exploring Iterative Refinement with Diffusion Models for Video Grounding

Authors: Xiao Liang, Tao Shi, Yaoyuan Liang, Te Tao, Shao-Lun Huang

Abstract: Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with… ▽ More Video grounding aims to localize the target moment in an untrimmed video corresponding to a given sentence query. Existing methods typically select the best prediction from a set of predefined proposals or directly regress the target span in a single-shot manner, resulting in the absence of a systematical prediction refinement process. In this paper, we propose DiffusionVG, a novel framework with diffusion models that formulates video grounding as a conditional generation task, where the target span is generated from Gaussian noise inputs and interatively refined in the reverse diffusion process. During training, DiffusionVG progressively adds noise to the target span with a fixed forward diffusion process and learns to recover the target span in the reverse diffusion process. In inference, DiffusionVG can generate the target span from Gaussian noise inputs by the learned reverse diffusion process conditioned on the video-sentence representations. Without bells and whistles, our DiffusionVG demonstrates superior performance compared to existing well-crafted models on mainstream Charades-STA, ActivityNet Captions and TACoS benchmarks. △ Less

Submitted 29 December, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2309.10818 [pdf, other]

SlimPajama-DC: Understanding Data Combinations for LLM Training

Authors: Zhiqiang Shen, Tianhua Tao, Liqun Ma, Willie Neiswanger, Zhengzhong Liu, Hongyi Wang, Bowen Tan, Joel Hestness, Natalia Vassilieva, Daria Soboleva, Eric Xing

Abstract: This paper aims to understand the impacts of various data combinations (e.g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T token RedPajama dataset contributed by Together. We have termed our resear… ▽ More This paper aims to understand the impacts of various data combinations (e.g., web text, Wikipedia, GitHub, books) on the pretraining of large language models using SlimPajama. SlimPajama is a rigorously deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T token RedPajama dataset contributed by Together. We have termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations on SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16$\times$ CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our SlimPajama-DC models are available at: https://huggingface.co/MBZUAI-LLM/SlimPajama-DC and the separate SlimPajama-DC datasets are available at: https://huggingface.co/datasets/MBZUAI-LLM/SlimPajama-627B-DC. △ Less

Submitted 9 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: Technical report. Models at: https://huggingface.co/MBZUAI-LLM/SlimPajama-DC and dataset at: https://huggingface.co/datasets/MBZUAI-LLM/SlimPajama-627B-DC

arXiv:2307.04514 [pdf, other]

Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold

Authors: Tuc Nguyen-Van, Dung D. Le, The-Anh Ta

Abstract: In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of co… ▽ More In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of component spaces of different geometries (spherical, hyperbolic, or euclidean). In this paper, we take a closer look at the structure of product manifold embedding spaces and argue that each component space in a product contributes differently to expressing structures in the input graph, hence should be weighted accordingly. This is different from previous works which consider the roles of different components equally. We then propose WEIGHTED-PM, a data-driven method for learning embedding of heterogeneous graphs in weighted product manifolds. Our method utilizes the topological information of the input graph to automatically determine the weight of each component in product spaces. Extensive experiments on synthetic and real-world graph datasets demonstrate that WEIGHTED-PM is capable of learning better graph representations with lower geometric distortion from input data, and performs better on multiple downstream tasks, such as word similarity learning, top-$k$ recommendation, and knowledge graph embedding. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2307.00885 [pdf, other]

An Explainable Deep Framework: Towards Task-Specific Fusion for Multi-to-One MRI Synthesis

Authors: Luyi Han, Tianyu Zhang, Yunzhi Huang, Haoran Dou, Xin Wang, Yuan Gao, Chunyao Lu, Tan Tao, Ritse Mann

Abstract: Multi-sequence MRI is valuable in clinical settings for reliable diagnosis and treatment prognosis, but some sequences may be unusable or missing for various reasons. To address this issue, MRI synthesis is a potential solution. Recent deep learning-based methods have achieved good performance in combining multiple available sequences for missing sequence synthesis. Despite their success, these me… ▽ More Multi-sequence MRI is valuable in clinical settings for reliable diagnosis and treatment prognosis, but some sequences may be unusable or missing for various reasons. To address this issue, MRI synthesis is a potential solution. Recent deep learning-based methods have achieved good performance in combining multiple available sequences for missing sequence synthesis. Despite their success, these methods lack the ability to quantify the contributions of different input sequences and estimate the quality of generated images, making it hard to be practical. Hence, we propose an explainable task-specific synthesis network, which adapts weights automatically for specific sequence generation tasks and provides interpretability and reliability from two sides: (1) visualize the contribution of each input sequence in the fusion stage by a trainable task-specific weighted average module; (2) highlight the area the network tried to refine during synthesizing by a task-specific attention module. We conduct experiments on the BraTS2021 dataset of 1251 subjects, and results on arbitrary sequence synthesis indicate that the proposed method achieves better performance than the state-of-the-art methods. Our code is available at \url{https://github.com/fiy2W/mri_seq2seq}. △ Less

Submitted 3 July, 2023; originally announced July 2023.

arXiv:2306.14920 [pdf, other]

A Cosine Similarity-based Method for Out-of-Distribution Detection

Authors: Nguyen Ngoc-Hieu, Nguyen Hung-Quang, The-Anh Ta, Thanh Nguyen-Tang, Khoa D Doan, Hoang Thanh-Tung

Abstract: The ability to detect OOD data is a crucial aspect of practical machine learning applications. In this work, we show that cosine similarity between the test feature and the typical ID feature is a good indicator of OOD data. We propose Class Typical Matching (CTM), a post hoc OOD detection algorithm that uses a cosine similarity scoring function. Extensive experiments on multiple benchmarks show t… ▽ More The ability to detect OOD data is a crucial aspect of practical machine learning applications. In this work, we show that cosine similarity between the test feature and the typical ID feature is a good indicator of OOD data. We propose Class Typical Matching (CTM), a post hoc OOD detection algorithm that uses a cosine similarity scoring function. Extensive experiments on multiple benchmarks show that CTM outperforms existing post hoc OOD detection methods. △ Less

Submitted 23 June, 2023; originally announced June 2023.

Comments: Accepted paper at ICML 2023 Workshop on Spurious Correlations, Invariance, and Stability. 10 pages (4 main + appendix)

arXiv:2306.08798 [pdf, other]

MPSA-DenseNet: A novel deep learning model for English accent classification

Authors: Tianyu Song, Linh Thi Hoai Nguyen, Ton Viet Ta

Abstract: This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English… ▽ More This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents. △ Less

Submitted 14 June, 2023; originally announced June 2023.

arXiv:2305.10626 [pdf, other]

Language Models Meet World Models: Embodied Experiences Enhance Language Models

Authors: Jiannan Xiang, Tianhua Tao, Yi Gu, Tianmin Shu, Zirui Wang, Zichao Yang, Zhiting Hu

Abstract: While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose… ▽ More While large language models (LMs) have shown remarkable capabilities across numerous tasks, they often struggle with simple reasoning and planning in physical environments, such as understanding object permanence or planning household activities. The limitation arises from the fact that LMs are trained only on written text and miss essential embodied knowledge and skills. In this paper, we propose a new paradigm of enhancing LMs by finetuning them with world models, to gain diverse embodied knowledge while retaining their general language capabilities. Our approach deploys an embodied agent in a world model, particularly a simulator of the physical world (VirtualHome), and acquires a diverse set of embodied experiences through both goal-oriented planning and random exploration. These experiences are then used to finetune LMs to teach diverse abilities of reasoning and acting in the physical world, e.g., planning and completing goals, object permanence and tracking, etc. Moreover, it is desirable to preserve the generality of LMs during finetuning, which facilitates generalizing the embodied knowledge across tasks rather than being tied to specific simulations. We thus further introduce the classical (EWC) for selective weight updates, combined with low-rank adapters (LoRA) for training efficiency. Extensive experiments show our approach substantially improves base LMs on 18 downstream tasks by 64.28% on average. In particular, the small LMs (1.3B, 6B, and 13B) enhanced by our approach match or even outperform much larger LMs (e.g., ChatGPT). △ Less

Submitted 28 October, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

arXiv:2305.08492 [pdf]

On the conformance of Android applications with children's data protection regulations and safeguarding guidelines

Authors: Ricardo Lopes, Vinh Thong Ta, Ioannis Korkontzelos

Abstract: With the rapid development of online technologies and the widespread usage of mobile phones among children, it is crucial to protect their online safety. Some studies reported that online abuse and incidents negatively affect children's mental health and development. In this paper, we examine how Android applications follow the rules related to children's data protection in the EU General Data Pro… ▽ More With the rapid development of online technologies and the widespread usage of mobile phones among children, it is crucial to protect their online safety. Some studies reported that online abuse and incidents negatively affect children's mental health and development. In this paper, we examine how Android applications follow the rules related to children's data protection in the EU General Data Protection Regulation (GDPR) and the UK and EU children's online safeguarding guidelines. Our findings show that the number of non-compliant apps is still significant. Even the apps designed for children do not always comply with legislation or guidance. This lack of compliance could contribute to creating a path to causing physical or mental harm to children. We then discuss the relevance of automating the compliance verification and online safety risk assessment, including open questions, challenges, possible approaches, and directions. △ Less

Submitted 17 May, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 24 pages (changed the abstract and updated the related works.)

arXiv:2304.10406 [pdf, other]

LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

Authors: Tang Tao, Longfei Gao, Guangrun Wang, Yixing Lao, Peng Chen, Hengshuang Zhao, Dayang Hao, Xiaodan Liang, Mathieu Salzmann, Kaicheng Yu

Abstract: We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address thi… ▽ More We introduce a new task, novel view synthesis for LiDAR sensors. While traditional model-based LiDAR simulators with style-transfer neural networks can be applied to render novel views, they fall short of producing accurate and realistic LiDAR patterns because the renderers rely on explicit 3D reconstruction and exploit game engines, that ignore important attributes of LiDAR points. We address this challenge by formulating, to the best of our knowledge, the first differentiable end-to-end LiDAR rendering framework, LiDAR-NeRF, leveraging a neural radiance field (NeRF) to facilitate the joint learning of geometry and the attributes of 3D points. However, simply employing NeRF cannot achieve satisfactory results, as it only focuses on learning individual pixels while ignoring local information, especially at low texture areas, resulting in poor geometry. To this end, we have taken steps to address this issue by introducing a structural regularization method to preserve local structural details. To evaluate the effectiveness of our approach, we establish an object-centric multi-view LiDAR dataset, dubbed NeRF-MVL. It contains observations of objects from 9 categories seen from 360-degree viewpoints captured with multiple LiDAR sensors. Our extensive experiments on the scene-level KITTI-360 dataset, and on our object-level NeRF-MVL show that our LiDAR-NeRF surpasses the model-based algorithms significantly. △ Less

Submitted 14 July, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: This paper introduces a new task of novel LiDAR view synthesis, and proposes a differentiable framework called LiDAR-NeRF with a structural regularization, as well as an object-centric multi-view LiDAR dataset called NeRF-MVL

arXiv:2303.07292

Transformer-based approaches to Sentiment Detection

Authors: Olumide Ebenezer Ojo, Hoang Thang Ta, Alexander Gelbukh, Hiram Calvo, Olaronke Oluwayemisi Adebanji, Grigori Sidorov

Abstract: The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such as Bidirectional Encoder Representations from Tran… ▽ More The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such as Bidirectional Encoder Representations from Transformers (BERT), Robustly Optimized BERT Pre-training Approach (RoBERTa), a distilled version of BERT (DistilBERT), and a large bidirectional neural network architecture (XLNet) were proposed. The performance of the four models that were used to detect disaster in the text was compared. All the models performed well enough, indicating that transformer-based models are suitable for the detection of disaster in text. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions. Furthermore, we discovered that the learning algorithms' performance was influenced by the pre-processing techniques, the nature of words in the vocabulary, unbalanced labeling, and the model parameters. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: This submission has been removed from arXiv because the submitter did not have the authority to grant the license at the time of submission

arXiv:2210.12659 [pdf]

Map** Process for the Task: Wikidata Statements to Text as Wikipedia Sentences

Authors: Hoang Thang Ta, Alexander Gelbukha, Grigori Sidorov

Abstract: Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including develo** content for over 300 languages at the present. Therefore, the benefit that mach… ▽ More Acknowledged as one of the most successful online cooperative projects in human society, Wikipedia has obtained rapid growth in recent years and desires continuously to expand content and disseminate knowledge values for everyone globally. The shortage of volunteers brings to Wikipedia many issues, including develo** content for over 300 languages at the present. Therefore, the benefit that machines can automatically generate content to reduce human efforts on Wikipedia language projects could be considerable. In this paper, we propose our map** process for the task of converting Wikidata statements to natural language text (WS2T) for Wikipedia projects at the sentence level. The main step is to organize statements, represented as a group of quadruples and triples, and then to map them to corresponding sentences in English Wikipedia. We evaluate the output corpus in various aspects: sentence structure analysis, noise filtering, and relationships between sentence components based on word embedding models. The results are helpful not only for the data-to-text generation task but also for other relevant works in the field. △ Less

Submitted 23 October, 2022; originally announced October 2022.

Comments: 29 pages

arXiv:2209.13101 [pdf, other]

WikiDes: A Wikipedia-Based Dataset for Generating Short Descriptions from Paragraphs

Authors: Hoang Thang Ta, Abu Bakar Siddiqur Rahman, Navonil Majumder, Amir Hussain, Lotfollah Najjar, Newton Howard, Soujanya Poria, Alexander Gelbukh

Abstract: As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summar… ▽ More As free online encyclopedias with massive volumes of content, Wikipedia and Wikidata are key to many Natural Language Processing (NLP) tasks, such as information retrieval, knowledge base building, machine translation, text classification, and text summarization. In this paper, we introduce WikiDes, a novel dataset to generate short descriptions of Wikipedia articles for the problem of text summarization. The dataset consists of over 80k English samples on 6987 topics. We set up a two-phase summarization method - description generation (Phase I) and candidate ranking (Phase II) - as a strong approach that relies on transfer and contrastive learning. For description generation, T5 and BART show their superiority compared to other small-scale pre-trained models. By applying contrastive learning with the diverse input from beam search, the metric fusion-based ranking models outperform the direct description generation models significantly up to 22 ROUGE in topic-exclusive split and topic-independent split. Furthermore, the outcome descriptions in Phase II are supported by human evaluation in over 45.33% chosen compared to 23.66% in Phase I against the gold descriptions. In the aspect of sentiment analysis, the generated descriptions cannot effectively capture all sentiment polarities from paragraphs while doing this task better from the gold descriptions. The automatic generation of new descriptions reduces the human efforts in creating them and enriches Wikidata-based knowledge graphs. Our paper shows a practical impact on Wikipedia and Wikidata since there are thousands of missing descriptions. Finally, we expect WikiDes to be a useful dataset for related works in capturing salient information from short paragraphs. The curated dataset is publicly available at: https://github.com/declare-lab/WikiDes. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 27 pages, 8 figures, 15 tables

arXiv:2206.13144 [pdf]

An Indirect Social Trust Model for Vehicular Social Networks Using Evolving Graph Theory

Authors: Max Hashem Eiza, Vinh Thong Ta

Abstract: The increasing importance and consequent challenges of establishing indirect trusted relationships in highly dynamic social networks such as vehicular social networks (VSNs), are investigated in this paper. VSNs are mobile social networks that aim to create social links among travellers on the roads. Besides matching interests between two users, social trust is essential to successfully establish… ▽ More The increasing importance and consequent challenges of establishing indirect trusted relationships in highly dynamic social networks such as vehicular social networks (VSNs), are investigated in this paper. VSNs are mobile social networks that aim to create social links among travellers on the roads. Besides matching interests between two users, social trust is essential to successfully establish and nurture a social relationship. However, the unique characteristics of VSNs pose many challenges such as uncertainty, subjectivity and intransitivity to indirect social trust modelling. Furthermore, the current trust models in the literature inadequately address trust propagation in VSNs. We propose a novel indirect social trust model for VSNs using evolving graph theory and the Paillier cryptosystem. We consider the VSN as a highly dynamic social evolving graph where social ties among vehicles hold a trustworthiness factor that evolves over time. This factor is estimated based on the behaviours, opinions, distances, and communication metrics of the parties involved. Employing the homomorphic property of the Paillier cryptosystem, the proposed model targets the subjectivity problem when combining multiple opinions to establish an indirect trusted relationship. Through analysis of computational and communication complexities, we show the viability of the proposed model and the efficiency of its indirect trust computation algorithm. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.05975 [pdf, other]

On the Learning of Non-Autoregressive Transformers

Authors: Fei Huang, Tianhua Tao, Hao Zhou, Lei Li, Minlie Huang

Abstract: Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT le… ▽ More Non-autoregressive Transformer (NAT) is a family of text generation models, which aims to reduce the decoding latency by predicting the whole sentences in parallel. However, such latency reduction sacrifices the ability to capture left-to-right dependencies, thereby making NAT learning very challenging. In this paper, we present theoretical and empirical analyses to reveal the challenges of NAT learning and propose a unified perspective to understand existing successes. First, we show that simply training NAT by maximizing the likelihood can lead to an approximation of marginal distributions but drops all dependencies between tokens, where the dropped information can be measured by the dataset's conditional total correlation. Second, we formalize many previous objectives in a unified framework and show that their success can be concluded as maximizing the likelihood on a proxy distribution, leading to a reduced information loss. Empirical studies show that our perspective can explain the phenomena in NAT learning and guide the design of new training methods. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: accepted at ICML2022

arXiv:2206.04596 [pdf, other]

doi 10.1109/ICRA48891.2023.10160578

Linear Delta Arrays for Compliant Dexterous Distributed Manipulation

Authors: Sarvesh Patil, Tony Tao, Tess Hellebrekers, Oliver Kroemer, F. Zeynep Temel

Abstract: This paper presents a new type of distributed dexterous manipulator: delta arrays. Our delta array setup consists of 64 linearly-actuated delta robots with 3D-printed compliant linkages. Through the design of the individual delta robots, the modular array structure, and distributed communication and control, we study a wide range of in-plane and out-of-plane manipulations, as well as prehensile ma… ▽ More This paper presents a new type of distributed dexterous manipulator: delta arrays. Our delta array setup consists of 64 linearly-actuated delta robots with 3D-printed compliant linkages. Through the design of the individual delta robots, the modular array structure, and distributed communication and control, we study a wide range of in-plane and out-of-plane manipulations, as well as prehensile manipulations among subsets of neighboring delta robots. We also demonstrate dexterous manipulation capabilities of the delta array using reinforcement learning while leveraging the compliance to not break the end-effectors. Our evaluations show that the resulting 192 DoF compliant robot is capable of performing various coordinated distributed manipulations of a variety of objects, including translation, alignment, prehensile squeezing, lifting, and gras**. △ Less

Submitted 14 August, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: ICRA 2023

arXiv:2205.14951 [pdf, other]

Benchmarking the Robustness of LiDAR-Camera Fusion for 3D Object Detection

Authors: Kaicheng Yu, Tang Tao, Hongwei Xie, Zhiwei Lin, Zhongwei Wu, Zhongyu Xia, Tingting Liang, Haiyang Sun, Jiong Deng, Dayang Hao, Yongtao Wang, Xiaodan Liang, Bing Wang

Abstract: There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary informatio… ▽ More There are two critical sensors for 3D perception in autonomous driving, the camera and the LiDAR. The camera provides rich semantic information such as color, texture, and the LiDAR reflects the 3D shape and locations of surrounding objects. People discover that fusing these two modalities can significantly boost the performance of 3D perception models as each modality has complementary information to the other. However, we observe that current datasets are captured from expensive vehicles that are explicitly designed for data collection purposes, and cannot truly reflect the realistic data distribution due to various reasons. To this end, we collect a series of real-world cases with noisy data distribution, and systematically formulate a robustness benchmark toolkit, that simulates these cases on any clean autonomous driving datasets. We showcase the effectiveness of our toolkit by establishing the robustness benchmark on two widely-adopted autonomous driving datasets, nuScenes and Waymo, then, to the best of our knowledge, holistically benchmark the state-of-the-art fusion methods for the first time. We observe that: i) most fusion methods, when solely developed on these data, tend to fail inevitably when there is a disruption to the LiDAR input; ii) the improvement of the camera input is significantly inferior to the LiDAR one. We further propose an efficient robust training strategy to improve the robustness of the current fusion method. The benchmark and code are available at https://github.com/kcyu2014/lidar-camera-robust-benchmark △ Less

Submitted 30 May, 2022; originally announced May 2022.

Comments: Technical report. The first three authors contribute equally

arXiv:2205.05036 [pdf, other]

Multi-agent Reinforcement Learning for Dynamic Resource Management in 6G in-X Subnetworks

Authors: Xiao Du, Ting Wang, Qiang Feng, Chenhui Ye, Tao Tao, Yuanming Shi, Mingsong Chen

Abstract: The 6G network enables a subnetwork-wide evolution, resulting in a "network of subnetworks". However, due to the dynamic mobility of wireless subnetworks, the data transmission of intra-subnetwork and inter-subnetwork will inevitably interfere with each other, which poses a great challenge to radio resource management. Moreover, most of the existing approaches require the instantaneous channel gai… ▽ More The 6G network enables a subnetwork-wide evolution, resulting in a "network of subnetworks". However, due to the dynamic mobility of wireless subnetworks, the data transmission of intra-subnetwork and inter-subnetwork will inevitably interfere with each other, which poses a great challenge to radio resource management. Moreover, most of the existing approaches require the instantaneous channel gain between subnetworks, which are usually difficult to be collected. To tackle these issues, in this paper we propose a novel effective intelligent radio resource management method using multi-agent deep reinforcement learning (MARL), which only needs the sum of received power, named received signal strength indicator (RSSI), on each channel instead of channel gains. However, to directly separate individual interference from RSSI is an almost impossible thing. To this end, we further propose a novel MARL architecture, named GA-Net, which integrates a hard attention layer to model the importance distribution of inter-subnetwork relationships based on RSSI and exclude the impact of unrelated subnetworks, and employs a graph attention network with a multi-head attention layer to exact the features and calculate their weights that will impact individual throughput. Experimental results prove that our proposed framework significantly outperforms both traditional and MARL-based methods in various aspects. △ Less

Submitted 10 May, 2022; originally announced May 2022.

arXiv:2205.00307 [pdf, other]

Learning to Get Up

Authors: Tianxin Tao, Matthew Wilson, Ruiyu Gou, Michiel van de Panne

Abstract: Getting up from an arbitrary fallen state is a basic human skill. Existing methods for learning this skill often generate highly dynamic and erratic get-up motions, which do not resemble human get-up strategies, or are based on tracking recorded human get-up motions. In this paper, we present a staged approach using reinforcement learning, without recourse to motion capture data. The method first… ▽ More Getting up from an arbitrary fallen state is a basic human skill. Existing methods for learning this skill often generate highly dynamic and erratic get-up motions, which do not resemble human get-up strategies, or are based on tracking recorded human get-up motions. In this paper, we present a staged approach using reinforcement learning, without recourse to motion capture data. The method first takes advantage of a strong character model, which facilitates the discovery of solution modes. A second stage then learns to adapt the control policy to work with progressively weaker versions of the character. Finally, a third stage learns control policies that can reproduce the weaker get-up motions at much slower speeds. We show that across multiple runs, the method can discover a diverse variety of get-up strategies, and execute them at a variety of speeds. The results usually produce policies that use a final stand-up strategy that is common to the recovery motions seen from all initial states. However, we also find policies for which different strategies are seen for prone and supine initial fallen states. The learned get-up control strategies often have significant static stability, i.e., they can be paused at a variety of points during the get-up motion. We further test our method on novel constrained scenarios, such as having a leg and an arm in a cast. △ Less

Submitted 27 August, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

Comments: SIGGRAPH 2022. Project page: https://tianxintao.github.io/get_up_control/

arXiv:2204.04905 [pdf, other]

Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Authors: Tianxin Tao, Daniele Reda, Michiel van de Panne

Abstract: Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL)… ▽ More Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL) control tasks and compare these results to a leading convolutional-network architecture method, RAD. For training the ViT encoder, we consider several recently-proposed self-supervised losses that are treated as auxiliary tasks, as well as a baseline with no additional loss terms. We find that the CNN architectures trained using RAD still generally provide superior performance. For the ViT methods, all three types of auxiliary tasks that we consider provide a benefit over plain ViT training. Furthermore, ViT reconstruction-based tasks are found to significantly outperform ViT contrastive-learning. △ Less

Submitted 15 May, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2203.02574 [pdf, other]

Style-ERD: Responsive and Coherent Online Motion Style Transfer

Authors: Tianxin Tao, Xiaohang Zhan, Zhongquan Chen, Michiel van de Panne

Abstract: Motion style transfer is a common method for enriching character animation. Motion style transfer algorithms are often designed for offline settings where motions are processed in segments. However, for online animation applications, such as realtime avatar animation from motion capture, motions need to be processed as a stream with minimal latency. In this work, we realize a flexible, high-qualit… ▽ More Motion style transfer is a common method for enriching character animation. Motion style transfer algorithms are often designed for offline settings where motions are processed in segments. However, for online animation applications, such as realtime avatar animation from motion capture, motions need to be processed as a stream with minimal latency. In this work, we realize a flexible, high-quality motion style transfer method for this setting. We propose a novel style transfer model, Style-ERD, to stylize motions in an online manner with an Encoder-Recurrent-Decoder structure, along with a novel discriminator that combines feature attention and temporal attention. Our method stylizes motions into multiple target styles with a unified model. Although our method targets online settings, it outperforms previous offline methods in motion realism and style expressiveness and provides significant gains in runtime efficiency △ Less

Submitted 28 March, 2022; v1 submitted 4 March, 2022; originally announced March 2022.

Comments: CVPR 2022, project page: https://tianxintao.github.io/Online-Motion-Style-Transfer

arXiv:2112.11661 [pdf]

New metal-plastic hybrid additive manufacturing strategy: Fabrication of arbitrary metal-patterns on external and even internal surfaces of 3D plastic structures

Authors: Kewei Song, Yue Cui, Tiannan Tao, Xiangyi Meng, Michinari Sone, Masahiro Yoshino, Shinjiro Umezu, Hirotaka Sato

Abstract: Constructing precise micro-nano metal patterns on complex three-dimensional (3D) plastic parts allows the fabrication of functional devices for advanced applications. However, this patterning is currently expensive and requires complex processes with long manufacturing lead time. The present work demonstrates a process for the fabrication of micro-nano 3D metal-plastic composite structures with ar… ▽ More Constructing precise micro-nano metal patterns on complex three-dimensional (3D) plastic parts allows the fabrication of functional devices for advanced applications. However, this patterning is currently expensive and requires complex processes with long manufacturing lead time. The present work demonstrates a process for the fabrication of micro-nano 3D metal-plastic composite structures with arbitrarily complex shapes. In this approach, a light-cured resin is modified to prepare an active precursor capable of allowing subsequent electroless plating (ELP). A multi-material digital light processing 3D printer was newly developed to enable the fabrication of parts containing regions made of either standard resin or active precursor resin nested within each other. Selective 3D ELP processing of such parts provided various metal-plastic composite parts having complicated hollow micro-nano structures with specific topological relationships on a size scale as small as 40 um. Using this technique, 3D metal topologies that cannot be manufactured by traditional methods are possible, and metal patterns can be produced inside plastic parts as a means of further miniaturizing electronic devices. The proposed method can also generate metal coatings exhibiting improved adhesion of metal to plastic substrate. Based on this technique, several sensors composed of different functional nonmetallic materials and specific metal patterns were designed and fabricated. The present results demonstrate the viability of the proposed method and suggest potential applications in the fields of smart 3D micro-nano electronics, 3D wearable devices, micro/nano-sensors, and health care. △ Less

Submitted 21 December, 2021; originally announced December 2021.

arXiv:2110.02525 [pdf, other]

User Scheduling and Power Allocation for Precoded Multi-Beam High Throughput Satellite Systems with Individual Quality of Service Constraints

Authors: Trinh Van Chien, Eva Lagunas, Tung Hai Ta, Symeon Chatzinotas, Björn Ottersten

Abstract: For extensive coverage areas, multi-beam high throughput satellite (MB-HTS) communication is a promising technology that plays a crucial role in delivering broadband services to many users with diverse Quality of Service (QoS) requirements. This paper focuses on MB-HTS systems where all beams reuse the same spectrum. In particular, we propose a novel user scheduling and power allocation design cap… ▽ More For extensive coverage areas, multi-beam high throughput satellite (MB-HTS) communication is a promising technology that plays a crucial role in delivering broadband services to many users with diverse Quality of Service (QoS) requirements. This paper focuses on MB-HTS systems where all beams reuse the same spectrum. In particular, we propose a novel user scheduling and power allocation design capable of providing guarantees in terms of the individual QoS requirements while maximizing the system throughput under a limited power budget. Precoding is employed in the forward link to mitigate mutual interference at the users in multiple-access scenarios over different coherence time intervals. The combinatorial optimization structure from user scheduling requires an extremely high cost to obtain the global optimum even when a reduced number of users fit into a time slot. Therefore, we propose a heuristic algorithm yielding good trade-off between performance and computational complexity, applicable to a static operation framework of geostationary (GEO) satellite networks. Although the power allocation optimization is signomial programming, non-convex on a standard form, the solution can be lower bounded by the global optimum of a geometric program with a hidden convex structure. A local solution to the joint user scheduling and power allocation problem is consequently obtained by a successive optimization approach. Numerical results demonstrate the effectiveness of our algorithms on large-scale systems by providing better QoS satisfaction combined with outstanding overall system throughput. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: 14 pages, 8 figures, and 1 table. Submitted to the IEEE for publication. arXiv admin note: substantial text overlap with arXiv:2106.12873

arXiv:2108.13599 [pdf, other]

Through the Looking Glass: Diminishing Occlusions in Robot Vision Systems with Mirror Reflections

Authors: Kentaro Yoshioka, Hidenori Okuni, Tuan Thanh Ta, Akihide Sai

Abstract: The quality of robot vision greatly affects the performance of automation systems, where occlusions stand as one of the biggest challenges. If the target is occluded from the sensor, detecting and gras** such objects become very challenging. For example, when multiple robot arms cooperate in a single workplace, occlusions will be created under the robot arm itself and hide objects underneath. Wh… ▽ More The quality of robot vision greatly affects the performance of automation systems, where occlusions stand as one of the biggest challenges. If the target is occluded from the sensor, detecting and gras** such objects become very challenging. For example, when multiple robot arms cooperate in a single workplace, occlusions will be created under the robot arm itself and hide objects underneath. While occlusions can be greatly reduced by installing multiple sensors, the increase in sensor costs cannot be ignored. Moreover, the sensor placements must be rearranged every time the robot operation routine and layout change. To diminish occlusions, we propose the first robot vision system with tilt-type mirror reflection sensing. By instantly tilting the sensor itself, we obtain two sensing results with different views: conventional direct line-of-sight sensing and non-line-of-sight sensing via mirror reflections. Our proposed system removes occlusions adaptively by detecting the occlusions in the scene and dynamically configuring the sensor tilt angle to sense the detected occluded area. Thus, sensor rearrangements are not required even after changes in robot operation or layout. Since the required hardware is the tilt-unit and a commercially available mirror, the cost increase is marginal. Through experiments, we show that our system can achieve a similar detection accuracy as systems with multiple sensors, regardless of the single-sensor implementation. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: Accepted to IROS 2021

arXiv:2106.15078 [pdf, other]

Don't Take It Literally: An Edit-Invariant Sequence Loss for Text Generation

Authors: Guangyi Liu, Zichao Yang, Tianhua Tao, Xiaodan Liang, Junwei Bao, Zhen Li, Xiaodong He, Shuguang Cui, Zhiting Hu

Abstract: Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy (CE) loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence is not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision… ▽ More Neural text generation models are typically trained by maximizing log-likelihood with the sequence cross entropy (CE) loss, which encourages an exact token-by-token match between a target sequence with a generated sequence. Such training objective is sub-optimal when the target sequence is not perfect, e.g., when the target sequence is corrupted with noises, or when only weak sequence supervision is available. To address the challenge, we propose a novel Edit-Invariant Sequence Loss (EISL), which computes the matching loss of a target n-gram with all n-grams in the generated sequence. EISL is designed to be robust to various noises and edits in the target sequences. Moreover, the EISL computation is essentially an approximate convolution operation with target n-grams as kernels, which is easy to implement and efficient to compute with existing libraries. To demonstrate the effectiveness of EISL, we conduct experiments on a wide range of tasks, including machine translation with noisy target sequences, unsupervised text style transfer with only weak training signals, and non-autoregressive generation with non-predefined generation order. Experimental results show our method significantly outperforms the common CE loss and other strong baselines on all the tasks. EISL has a simple API that can be used as a drop-in replacement of the CE loss: https://github.com/guangyliu/EISL. △ Less

Submitted 7 May, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Camera ready, 2022 NAACL main conference

arXiv:2106.12873 [pdf, other]

User Scheduling for Precoded Satellite Systems with Individual Quality of Service Constraints

Authors: Trinh Van Chien, Eva Lagunas, Tung Hai Ta, Symeon Chatzinotas, Björn Ottersten

Abstract: Multibeam high throughput satellite (MB-HTS) systems will play a key role in delivering broadband services to a large number of users with diverse Quality of Service (QoS) requirements. This paper focuses on MB-HTS where the same spectrum is re-used by all user links and, in particular, we propose a novel user scheduling design capable to provide guarantees in terms of individual QoS requirements… ▽ More Multibeam high throughput satellite (MB-HTS) systems will play a key role in delivering broadband services to a large number of users with diverse Quality of Service (QoS) requirements. This paper focuses on MB-HTS where the same spectrum is re-used by all user links and, in particular, we propose a novel user scheduling design capable to provide guarantees in terms of individual QoS requirements while maximizing the system throughput. This is achieved by precoding to mitigate mutual interference. The combinatorial optimization structure requires an extremely high cost to obtain the global optimum even with a reduced number of users. We, therefore, propose a heuristic algorithm yielding a good local solution and tolerable computational complexity, applicable for large-scale networks. Numerical results demonstrate the effectiveness of our proposed algorithm on scheduling many users with better sum throughput than the other benchmarks. Besides, the QoS requirements for all scheduled users are guaranteed. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: 6 pages,2 figures, Accepted to present at PIMRC 2021

arXiv:2104.00460 [pdf]

doi 10.1002/spy2.191

Augmenting Zero Trust Architecture to Endpoints Using Blockchain: A State-of-The-Art Review

Authors: Lampis Alevizos, Vinh Thong Ta, Max Hashem Eiza

Abstract: With the purpose of defending against lateral movement in today's borderless networks, Zero Trust Architecture (ZTA) adoption is gaining momentum. With a full scale ZTA implementation, it is unlikely that adversaries will be able to spread through the network starting from a compromised endpoint. However, the already authenticated and authorised session of a compromised endpoint can be leveraged t… ▽ More With the purpose of defending against lateral movement in today's borderless networks, Zero Trust Architecture (ZTA) adoption is gaining momentum. With a full scale ZTA implementation, it is unlikely that adversaries will be able to spread through the network starting from a compromised endpoint. However, the already authenticated and authorised session of a compromised endpoint can be leveraged to perform limited, though malicious, activities ultimately rendering the endpoints the Achilles heel of ZTA. To effectively detect such attacks, distributed collaborative intrusion detection systems with an attack scenario-based approach have been developed. Nonetheless, Advanced Persistent Threats (APTs) have demonstrated their ability to bypass this approach with a high success ratio. As a result, adversaries can pass undetected or potentially alter the detection logging mechanisms to achieve a stealthy presence. Recently, blockchain technology has demonstrated solid use cases in the cyber security domain. In this paper, motivated by the convergence of ZTA and blockchain-based intrusion detection and prevention, we examine how ZTA can be augmented onto endpoints. Namely, we perform a state-of-the-art review of ZTA models, real-world architectures with a focus on endpoints, and blockchain-based intrusion detection systems. We discuss the potential of blockchain's immutability fortifying the detection process and identify open challenges as well as potential solutions and future directions. △ Less

Submitted 15 November, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

Comments: (1) Fixed the reference numbering (2) Fixed syntax errors, improvements (3) document re-structured

Journal ref: Security and Privacy. 2021;e191

arXiv:2012.14142 [pdf, other]

Perception Consistency Ultrasound Image Super-resolution via Self-supervised CycleGAN

Authors: Heng Liu, Jianyong Liu, Tao Tao, Shudong Hou, Jungong Han

Abstract: Due to the limitations of sensors, the transmission medium and the intrinsic properties of ultrasound, the quality of ultrasound imaging is always not ideal, especially its low spatial resolution. To remedy this situation, deep learning networks have been recently developed for ultrasound image super-resolution (SR) because of the powerful approximation capability. However, most current supervised… ▽ More Due to the limitations of sensors, the transmission medium and the intrinsic properties of ultrasound, the quality of ultrasound imaging is always not ideal, especially its low spatial resolution. To remedy this situation, deep learning networks have been recently developed for ultrasound image super-resolution (SR) because of the powerful approximation capability. However, most current supervised SR methods are not suitable for ultrasound medical images because the medical image samples are always rare, and usually, there are no low-resolution (LR) and high-resolution (HR) training pairs in reality. In this work, based on self-supervision and cycle generative adversarial network (CycleGAN), we propose a new perception consistency ultrasound image super-resolution (SR) method, which only requires the LR ultrasound data and can ensure the re-degenerated image of the generated SR one to be consistent with the original LR image, and vice versa. We first generate the HR fathers and the LR sons of the test ultrasound LR image through image enhancement, and then make full use of the cycle loss of LR-SR-LR and HR-LR-SR and the adversarial characteristics of the discriminator to promote the generator to produce better perceptually consistent SR results. The evaluation of PSNR/IFC/SSIM, inference efficiency and visual effects under the benchmark CCA-US and CCA-US datasets illustrate our proposed approach is effective and superior to other state-of-the-art methods. △ Less

Submitted 28 December, 2020; originally announced December 2020.

arXiv:2010.04304 [pdf, other]

doi 10.1145/3424636.3426907

Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning

Authors: Daniele Reda, Tianxin Tao, Michiel van de Panne

Abstract: Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show tha… ▽ More Learning to locomote is one of the most common tasks in physics-based animation and deep reinforcement learning (RL). A learned policy is the product of the problem to be solved, as embodied by the RL environment, and the RL algorithm. While enormous attention has been devoted to RL algorithms, much less is known about the impact of design choices for the RL environment. In this paper, we show that environment design matters in significant ways and document how it can contribute to the brittle nature of many RL results. Specifically, we examine choices related to state representations, initial state distributions, reward structure, control frequency, episode termination procedures, curriculum usage, the action space, and the torque limits. We aim to stimulate discussion around such choices, which in practice strongly impact the success of RL when applied to continuous-action control problems of interest to animation, such as learning to locomote. △ Less

Submitted 8 October, 2020; originally announced October 2020.

Comments: Presented at The 13th Annual ACM SIGGRAPH Conference on Motion, Interaction and Games

arXiv:2008.08936 [pdf, other]

DataProVe: A Data Protection Policy and System Architecture Verification Tool

Authors: Vinh Thong Ta

Abstract: In this paper, we propose a tool, called DataProVe, for specifying high-level data protection policies and system architectures, as well as verifying the conformance between them in a fully automated way. The syntax of the policies and the architectures is based on semi-formal languages, and the automated verification engine relies on logic and resolution based proofs. The functionality and operat… ▽ More In this paper, we propose a tool, called DataProVe, for specifying high-level data protection policies and system architectures, as well as verifying the conformance between them in a fully automated way. The syntax of the policies and the architectures is based on semi-formal languages, and the automated verification engine relies on logic and resolution based proofs. The functionality and operation of the tool are presented using different examples. △ Less

Submitted 17 December, 2020; v1 submitted 20 August, 2020; originally announced August 2020.

Comments: 65 pages. Improved algorithm description and explanation. Semantics of policy language added. More complete list of properties, and inference rules added. More figures and discussion section added. Finally, we refer to this version in our (shorter) paper under review

arXiv:2007.03152 [pdf, other]

The gem5 Simulator: Version 20.0+

Authors: Jason Lowe-Power, Abdul Mutaal Ahmad, Ayaz Akram, Mohammad Alian, Rico Amslinger, Matteo Andreozzi, Adrià Armejach, Nils Asmussen, Brad Beckmann, Srikant Bharadwaj, Gabe Black, Gedare Bloom, Bobby R. Bruce, Daniel Rodrigues Carvalho, Jeronimo Castrillon, Lizhong Chen, Nicolas Derumigny, Stephan Diestelhorst, Wendy Elsasser, Carlos Escuin, Marjan Fariborz, Amin Farmahini-Farahani, Pouya Fotouhi, Ryan Gambord, Jayneel Gandhi , et al. (53 additional authors not shown)

Abstract: The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 si… ▽ More The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research. △ Less

Submitted 29 September, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: Source, comments, and feedback: https://github.com/darchr/gem5-20-paper

arXiv:1911.04293 [pdf, ps, other]

Error bound of critical points and KL property of exponent $1/2$ for squared F-norm regularized factorization

Authors: Ting Tao, Shaohua Pan, Shujun Bi

Abstract: This paper is concerned with the squared F(robenius)-norm regularized factorization form for noisy low-rank matrix recovery problems. Under a suitable assumption on the restricted condition number of the Hessian for the loss function, we derive an error bound to the true matrix for the non-strict critical points with rank not more than that of the true matrix. Then, for the squared F-norm regulari… ▽ More This paper is concerned with the squared F(robenius)-norm regularized factorization form for noisy low-rank matrix recovery problems. Under a suitable assumption on the restricted condition number of the Hessian for the loss function, we derive an error bound to the true matrix for the non-strict critical points with rank not more than that of the true matrix. Then, for the squared F-norm regularized factorized least squares loss function, under the noisy and full sample setting we establish its KL property of exponent $1/2$ on its global minimizer set, and under the noisy and partial sample setting achieve this property for a class of critical points. These theoretical findings are also confirmed by solving the squared F-norm regularized factorization problem with an accelerated alternating minimization method. △ Less

Submitted 26 June, 2021; v1 submitted 11 November, 2019; originally announced November 2019.

arXiv:1908.09078 [pdf, other]

KL property of exponent $1/2$ of $\ell_{2,0}$-norm and DC regularized factorizations for low-rank matrix recovery

Authors: Shujun Bi, Ting Tao, Shaohua Pan

Abstract: This paper is concerned with the factorization form of the rank regularized loss minimization problem. To cater for the scenario in which only a coarse estimation is available for the rank of the true matrix, an $\ell_{2,0}$-norm regularized term is added to the factored loss function to reduce the rank adaptively; and account for the ambiguities in the factorization, a balanced term is then intro… ▽ More This paper is concerned with the factorization form of the rank regularized loss minimization problem. To cater for the scenario in which only a coarse estimation is available for the rank of the true matrix, an $\ell_{2,0}$-norm regularized term is added to the factored loss function to reduce the rank adaptively; and account for the ambiguities in the factorization, a balanced term is then introduced. For the least squares loss, under a restricted condition number assumption on the sampling operator, we establish the KL property of exponent $1/2$ of the nonsmooth factored composite function and its equivalent DC reformulations in the set of their global minimizers. We also confirm the theoretical findings by applying a proximal linearized alternating minimization method to the regularized factorizations. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: 29 pages, 3 figures

arXiv:1705.10930 [pdf, other]

doi 10.1016/j.optlaseng.2017.10.013

Micro Fourier Transform Profilometry ($μ$FTP): 3D shape measurement at 10,000 frames per second

Authors: Chao Zuo, Tianyang Tao, Shijie Feng, Lei Huang, Anand Asundi, Qian Chen

Abstract: Recent advances in imaging sensors and digital light projection technology have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces of complex-shaped objects to be captured with improved resolution and accuracy. However, due to the large number of projection patterns required for phase recovery and disambiguation, the maximum fame rates of current 3D shape measurement techniqu… ▽ More Recent advances in imaging sensors and digital light projection technology have facilitated a rapid progress in 3D optical sensing, enabling 3D surfaces of complex-shaped objects to be captured with improved resolution and accuracy. However, due to the large number of projection patterns required for phase recovery and disambiguation, the maximum fame rates of current 3D shape measurement techniques are still limited to the range of hundreds of frames per second (fps). Here, we demonstrate a new 3D dynamic imaging technique, Micro Fourier Transform Profilometry ($μ$FTP), which can capture 3D surfaces of transient events at up to 10,000 fps based on our newly developed high-speed fringe projection system. Compared with existing techniques, $μ$FTP has the prominent advantage of recovering an accurate, unambiguous, and dense 3D point cloud with only two projected patterns. Furthermore, the phase information is encoded within a single high-frequency fringe image, thereby allowing motion-artifact-free reconstruction of transient events with temporal resolution of 50 microseconds. To show $μ$FTP's broad utility, we use it to reconstruct 3D videos of 4 transient scenes: vibrating cantilevers, rotating fan blades, bullet fired from a toy gun, and balloon's explosion triggered by a flying dart, which were previously difficult or even unable to be captured with conventional approaches. △ Less

Submitted 30 May, 2017; originally announced May 2017.

Comments: This manuscript was originally submitted on 30th January 17

arXiv:1606.01010 [pdf, other]

Automated Road Traffic Congestion Detection and Alarm Systems: Incorporating V2I communications into ATCSs

Authors: Vinh Thong Ta

Abstract: In this position paper, we address the problems of automated road congestion detection and alerting systems and their security properties. We review different theoretical adaptive road traffic control approaches, and three widely deployed adaptive traffic control systems (ATCSs), namely, SCATS, SCOOT and InSync. We then discuss some related research questions, and the corresponding possible approa… ▽ More In this position paper, we address the problems of automated road congestion detection and alerting systems and their security properties. We review different theoretical adaptive road traffic control approaches, and three widely deployed adaptive traffic control systems (ATCSs), namely, SCATS, SCOOT and InSync. We then discuss some related research questions, and the corresponding possible approaches, as well as the adversary model and potential attack scenarios. Two theoretical concepts of automated road congestion alarm systems (including system architecture, communication protocol, and algorithms) are proposed on top of ATCSs, such as SCATS, SCOOT and InSync, by incorporating secure wireless vehicle-to-infrastructure (V2I) communications. Finally, the security properties of the proposed system have been discussed and analysed using the ProVerif protocol verification tool. △ Less

Submitted 3 June, 2016; originally announced June 2016.

Comments: 31 pages

MSC Class: 68-02

arXiv:1511.01249 [pdf, other]

Privacy by Design: On the Formal Design and Conformance Check of Personal Data Protection Policies and Architectures

Authors: Vinh Thong Ta

Abstract: The new General Data Protection Regulation (GDPR) will take effect in May 2018, and hence, designing compliant data protection policies and system architectures became crucial for organizations to avoid penalties. Unfortunately, the regulations given in a textual format can be easily misinterpreted by the policy and system designers, which also making the conformance check error-prone for auditors… ▽ More The new General Data Protection Regulation (GDPR) will take effect in May 2018, and hence, designing compliant data protection policies and system architectures became crucial for organizations to avoid penalties. Unfortunately, the regulations given in a textual format can be easily misinterpreted by the policy and system designers, which also making the conformance check error-prone for auditors. In this paper, we apply formal approach to facilitate systematic design of policies and architectures in an unambiguous way, and provide a framework for mathematically sound conformance checks against the current data protection regulations. We propose a (semi-)formal approach for specifying and reasoning about data protection policies and architectures as well as defining conformance relations between architectures and policies. The usability of our proposed approach is demonstrated on a smart metering service case study. △ Less

Submitted 14 May, 2018; v1 submitted 4 November, 2015; originally announced November 2015.

Comments: 41 pages, 2 figures

arXiv:1501.02155 [pdf, ps, other]

A formal proof of the Kepler conjecture

Authors: Thomas Hales, Mark Adams, Gertrud Bauer, Dat Tat Dang, John Harrison, Truong Le Hoang, Cezary Kaliszyk, Victor Magron, Sean McLaughlin, Thang Tat Nguyen, Truong Quang Nguyen, Tobias Nipkow, Steven Obua, Joseph Pleso, Jason Rute, Alexey Solovyev, An Hoai Thi Ta, Trung Nam Tran, Diep Thi Trieu, Josef Urban, Ky Khac Vu, Roland Zumkeller

Abstract: This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants. This paper constitutes the official published account of the now completed Flyspeck project. This article describes a formal proof of the Kepler conjecture on dense sphere packings in a combination of the HOL Light and Isabelle proof assistants. This paper constitutes the official published account of the now completed Flyspeck project. △ Less

Submitted 9 January, 2015; originally announced January 2015.

Comments: 21 pages

arXiv:1401.5826 [pdf, other]

doi 10.1109/ICC.2014.6884157

Improving Smartphone Battery Life Utilizing Device-to-device Cooperative Relays Underlaying LTE Networks

Authors: Tuan Ta, John S. Baras, Chenxi Zhu

Abstract: The utility of smartphones has been limited to a great extent by their short battery life. In this work, we propose a new approach to prolonging smartphone battery life. We introduce the notions of "valueless" and "valued battery", as being the available battery when the user does or does not have access to a power source, respectively. We propose a cooperative system where users with high battery… ▽ More The utility of smartphones has been limited to a great extent by their short battery life. In this work, we propose a new approach to prolonging smartphone battery life. We introduce the notions of "valueless" and "valued battery", as being the available battery when the user does or does not have access to a power source, respectively. We propose a cooperative system where users with high battery level help carry the traffic of users with low battery level. Our scheme helps increase the amount of valued battery in the network, thus it reduces the chance of users running out of battery early. Our system can be realized in the form of a proximity service (ProSe) which utilizes a device-to-device (D2D) communication architecture underlaying LTE. We show through simulations that our system reduces the probability of cellular users running out of battery before their target usage time (probability of outage). Our simulator source code is made available to the public. △ Less

Submitted 22 January, 2014; originally announced January 2014.

Comments: Accepted at IEEE ICC 2014

Showing 1–50 of 53 results for author: Ta, T