-
Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers
Authors:
**cheng Dong,
Yonghao Tan,
Dong Zhang,
Tianwei Ni,
Xuejiao Liu,
Yu Liu,
Peng Luo,
Luhong Liang,
Shih-Yang Liu,
Xijie Huang,
Huaiyu Zhu,
Yun Pan,
Fengwei An,
Kwang-Ting Cheng
Abstract:
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of…
▽ More
Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs. Previous state-of-the-art works optimize these operations by piece-wise linear approximation and store the parameters in look-up tables (LUT), but most of them require unfriendly high-precision arithmetics such as FP/INT 32 and lack consideration of integer-only INT quantization. This paper proposed a genetic LUT-Approximation algorithm namely GQA-LUT that can automatically determine the parameters with quantization awareness. The results demonstrate that GQA-LUT achieves negligible degradation on the challenging semantic segmentation task for both vanilla and linear Transformer models. Besides, proposed GQA-LUT enables the employment of INT8-based LUT-Approximation that achieves an area savings of 81.3~81.7% and a power reduction of 79.3~80.2% compared to the high-precision FP/INT 32 alternatives. Code is available at https:// github.com/**chengDong/GQA-LUT.
△ Less
Submitted 29 March, 2024; v1 submitted 28 March, 2024;
originally announced March 2024.
-
Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm
Authors:
Titouan Renard,
Andreas Schlaginhaufen,
Tingting Ni,
Maryam Kamgarpour
Abstract:
Given a dataset of expert demonstrations, inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve entropy-regularized IRL problem. In particular, we employ a stochastic gradient descent update for the reward and a stochastic soft policy iteration update for the policy. Assuming access to a generative model, w…
▽ More
Given a dataset of expert demonstrations, inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve entropy-regularized IRL problem. In particular, we employ a stochastic gradient descent update for the reward and a stochastic soft policy iteration update for the policy. Assuming access to a generative model, we prove that our algorithm is guaranteed to recover a reward for which the expert is $\varepsilon$-optimal using $\mathcal{O}(1/\varepsilon^{2})$ samples of the Markov decision process (MDP). Furthermore, with $\mathcal{O}(1/\varepsilon^{4})$ samples we prove that the optimal policy corresponding to the recovered reward is $\varepsilon$-close to the expert policy in total variation distance.
△ Less
Submitted 23 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Do Transformer World Models Give Better Policy Gradients?
Authors:
Michel Ma,
Tianwei Ni,
Clement Gehring,
Pierluca D'Oro,
Pierre-Luc Bacon
Abstract:
A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long…
▽ More
A natural approach for reinforcement learning is to predict future rewards by unrolling a neural network world model, and to backpropagate through the resulting computational graph to learn a policy. However, this method often becomes impractical for long horizons since typical world models induce hard-to-optimize loss landscapes. Transformers are known to efficiently propagate gradients over long horizons: could they be the solution to this problem? Surprisingly, we show that commonly-used transformer world models produce circuitous gradient paths, which can be detrimental to long-range policy gradients. To tackle this challenge, we propose a class of world models called Actions World Models (AWMs), designed to provide more direct routes for gradient propagation. We integrate such AWMs into a policy gradient framework that underscores the relationship between network architectures and the policy gradient updates they inherently represent. We demonstrate that AWMs can generate optimization landscapes that are easier to navigate even when compared to those from the simulator itself. This property allows transformer AWMs to produce better policies than competitive baselines in realistic long-horizon tasks.
△ Less
Submitted 10 February, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Bridging State and History Representations: Understanding Self-Predictive RL
Authors:
Tianwei Ni,
Benjamin Eysenbach,
Erfan Seyedsalehi,
Michel Ma,
Clement Gehring,
Aditya Mahajan,
Pierre-Luc Bacon
Abstract:
Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared propertie…
▽ More
Representations are at the core of all deep reinforcement learning (RL) methods for both Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs). Many representation learning methods and theoretical frameworks have been developed to understand what constitutes an effective representation. However, the relationships between these methods and the shared properties among them remain unclear. In this paper, we show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction. Furthermore, we provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations. These findings together yield a minimalist algorithm to learn self-predictive representations for states and histories. We validate our theories by applying our algorithm to standard MDPs, MDPs with distractors, and POMDPs with sparse rewards. These findings culminate in a set of preliminary guidelines for RL practitioners.
△ Less
Submitted 21 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
A safe exploration approach to constrained Markov decision processes
Authors:
Tingting Ni,
Maryam Kamgarpour
Abstract:
We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on develo** a model-free and simulator-free algorithm that ensures constraint satisf…
▽ More
We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on develo** a model-free and simulator-free algorithm that ensures constraint satisfaction during learning. To this end, we develop an interior point approach based on the log barrier function of the CMDP. Under the commonly assumed conditions of Fisher non-degeneracy and bounded transfer error of the policy parameterization, we establish the theoretical properties of the algorithm. In particular, in contrast to existing CMDP approaches that ensure policy feasibility only upon convergence, our algorithm guarantees the feasibility of the policies during the learning process and converges to the $\varepsilon$-optimal policy with a sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-6})$. In comparison to the state-of-the-art policy gradient-based algorithm, C-NPG-PDA, our algorithm requires an additional $\mathcal{O}(\varepsilon^{-2})$ samples to ensure policy feasibility during learning with the same Fisher non-degenerate parameterization.
△ Less
Submitted 23 May, 2024; v1 submitted 1 December, 2023;
originally announced December 2023.
-
When Do Transformers Shine in RL? Decoupling Memory from Credit Assignment
Authors:
Tianwei Ni,
Michel Ma,
Benjamin Eysenbach,
Pierre-Luc Bacon
Abstract:
Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the u…
▽ More
Reinforcement learning (RL) algorithms face two distinct challenges: learning effective representations of past and present observations, and determining how actions influence future returns. Both challenges involve modeling long-term dependencies. The Transformer architecture has been very successful to solve problems that involve long-term dependencies, including in the RL domain. However, the underlying reason for the strong performance of Transformer-based RL methods remains unclear: is it because they learn effective memory, or because they perform effective credit assignment? After introducing formal definitions of memory length and credit assignment length, we design simple configurable tasks to measure these distinct quantities. Our empirical results reveal that Transformers can enhance the memory capability of RL algorithms, scaling up to tasks that require memorizing observations $1500$ steps ago. However, Transformers do not improve long-term credit assignment. In summary, our results provide an explanation for the success of Transformers in RL, while also highlighting an important area for future research and benchmark design. Our code is open-sourced at https://github.com/twni2016/Memory-RL
△ Less
Submitted 3 November, 2023; v1 submitted 7 July, 2023;
originally announced July 2023.
-
FastATDC: Fast Anomalous Trajectory Detection and Classification
Authors:
Tianle Ni,
**gwei Wang,
Yunlong Ma,
Shuang Wang,
Min Liu,
Weiming Shen
Abstract:
Automated detection of anomalous trajectories is an important problem with considerable applications in intelligent transportation systems. Many existing studies have focused on distinguishing anomalous trajectories from normal trajectories, ignoring the large differences between anomalous trajectories. A recent study has made great progress in identifying abnormal trajectory patterns and proposed…
▽ More
Automated detection of anomalous trajectories is an important problem with considerable applications in intelligent transportation systems. Many existing studies have focused on distinguishing anomalous trajectories from normal trajectories, ignoring the large differences between anomalous trajectories. A recent study has made great progress in identifying abnormal trajectory patterns and proposed a two-stage algorithm for anomalous trajectory detection and classification (ATDC). This algorithm has excellent performance but suffers from a few limitations, such as high time complexity and poor interpretation. Here, we present a careful theoretical and empirical analysis of the ATDC algorithm, showing that the calculation of anomaly scores in both stages can be simplified, and that the second stage of the algorithm is much more important than the first stage. Hence, we develop a FastATDC algorithm that introduces a random sampling strategy in both stages. Experimental results show that FastATDC is 10 to 20 times faster than ATDC on real datasets. Moreover, FastATDC outperforms the baseline algorithms and is comparable to the ATDC algorithm.
△ Less
Submitted 23 July, 2022;
originally announced July 2022.
-
Towards Disturbance-Free Visual Mobile Manipulation
Authors:
Tianwei Ni,
Kiana Ehsani,
Luca Weihs,
Jordi Salvador
Abstract:
Deep reinforcement learning has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. Prior work generally aims to build embodied agents that solve their assigned tasks as quickly as possible, while largely ignoring the problems caused by collision with objects during interaction. This lack of prioritization is understandable: there i…
▽ More
Deep reinforcement learning has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. Prior work generally aims to build embodied agents that solve their assigned tasks as quickly as possible, while largely ignoring the problems caused by collision with objects during interaction. This lack of prioritization is understandable: there is no inherent cost in breaking virtual objects. As a result, "well-trained" agents frequently collide with objects before achieving their primary goals, a behavior that would be catastrophic in the real world. In this paper, we study the problem of training agents to complete the task of visual mobile manipulation in the ManipulaTHOR environment while avoiding unnecessary collision (disturbance) with objects. We formulate disturbance avoidance as a penalty term in the reward function, but find that directly training with such penalized rewards often results in agents being unable to escape poor local optima. Instead, we propose a two-stage training curriculum where an agent is first allowed to freely explore and build basic competencies without penalization, after which a disturbance penalty is introduced to refine the agent's behavior. Results on testing scenes show that our curriculum not only avoids these poor local optima, but also leads to 10% absolute gains in success rate without disturbance, compared to our state-of-the-art baselines. Moreover, our curriculum is significantly more performant than a safe RL algorithm that casts collision avoidance as a constraint. Finally, we propose a novel disturbance-prediction auxiliary task that accelerates learning.
△ Less
Submitted 21 October, 2022; v1 submitted 17 December, 2021;
originally announced December 2021.
-
Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs
Authors:
Tianwei Ni,
Benjamin Eysenbach,
Ruslan Salakhutdinov
Abstract:
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory-based architectures, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more…
▽ More
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with memory-based architectures, such as recurrent neural networks, provides a general approach to solving all types of POMDPs. However, prior work has found that such recurrent model-free RL methods tend to perform worse than more specialized algorithms that are designed for specific types of POMDPs. This paper revisits this claim. We find that careful architecture and hyperparameter decisions can often yield a recurrent model-free implementation that performs on par with (and occasionally substantially better than) more sophisticated recent techniques. We compare to 21 environments from 6 prior specialized methods and find that our implementation achieves greater sample efficiency and asymptotic performance than these methods on 18/21 environments. We also release a simple and efficient implementation of recurrent model-free RL for future work to use as a baseline for POMDPs.
△ Less
Submitted 4 June, 2022; v1 submitted 11 October, 2021;
originally announced October 2021.
-
Explore BiLSTM-CRF-Based Models for Open Relation Extraction
Authors:
Tao Ni,
Qing Wang,
Gabriela Ferraro
Abstract:
Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlap** problems and enhance models' performance. From the…
▽ More
Extracting multiple relations from text sentences is still a challenge for current Open Relation Extraction (Open RE) tasks. In this paper, we develop several Open RE models based on the bidirectional LSTM-CRF (BiLSTM-CRF) neural network and different contextualized word embedding methods. We also propose a new tagging scheme to solve overlap** problems and enhance models' performance. From the evaluation results and comparisons between models, we select the best combination of tagging scheme, word embedder, and BiLSTM-CRF network to achieve an Open RE model with a remarkable extracting ability on multiple-relation sentences.
△ Less
Submitted 25 April, 2021;
originally announced April 2021.
-
Adaptive Agent Architecture for Real-time Human-Agent Teaming
Authors:
Tianwei Ni,
Huao Li,
Siddharth Agrawal,
Suhas Raja,
Fan Jia,
Yikang Gui,
Dana Hughes,
Michael Lewis,
Katia Sycara
Abstract:
Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human t…
▽ More
Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human teams. To optimize team performance in human-agent teaming, it is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. Though these agents are guaranteed to perform well with the learned model, they lay heavy assumptions on human policy such as optimality and consistency, which is unlikely in many real-world scenarios. In this paper, we propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game, namely Team Space Fortress (TSF). Previous human-human team research have shown complementary policies in TSF game and diversity in human players' skill, which encourages us to relax the assumptions on human policy. Therefore, we discard learning human models from human data, and instead use an adaptation strategy on a pre-trained library of exemplar policies composed of RL algorithms or rule-based methods with minimal assumptions of human behavior. The adaptation strategy relies on a novel similarity metric to infer human policy and then selects the most complementary policy in our library to maximize the team performance. The adaptive agent architecture can be deployed in real-time and generalize to any off-the-shelf static agents. We conducted human-agent experiments to evaluate the proposed adaptive agent framework, and demonstrated the suboptimality, diversity, and adaptability of human policies in human-agent teams.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
f-IRL: Inverse Reinforcement Learning via State Marginal Matching
Authors:
Tianwei Ni,
Harshit Sikchi,
Yufei Wang,
Tejus Gupta,
Lisa Lee,
Benjamin Eysenbach
Abstract:
Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. rewar…
▽ More
Imitation learning is well-suited for robotic tasks where it is difficult to directly program the behavior or specify a cost for optimal control. In this work, we propose a method for learning the reward function (and the corresponding policy) to match the expert state density. Our main result is the analytic gradient of any f-divergence between the agent and expert state distribution w.r.t. reward parameters. Based on the derived gradient, we present an algorithm, f-IRL, that recovers a stationary reward function from the expert density by gradient descent. We show that f-IRL can learn behaviors from a hand-designed target state density or implicitly through expert observations. Our method outperforms adversarial imitation learning methods in terms of sample efficiency and the required number of expert trajectories on IRL benchmarks. Moreover, we show that the recovered reward function can be used to quickly solve downstream tasks, and empirically demonstrate its utility on hard-to-explore tasks and for behavior transfer across changes in dynamics.
△ Less
Submitted 29 December, 2020; v1 submitted 9 November, 2020;
originally announced November 2020.
-
DCDChain: A Credible Architecture of Digital Copyright Detection Based on Blockchain
Authors:
Zhili Chen,
Yuting Wang,
Tianjiao Ni
Abstract:
Copyright detection is an effective method to prevent piracy. However, untrustworthy detection parties may lead to falsified detection results. Due to its credibility and tamper resistance, blockchain has been applied to copyright protection. Previous works mainly utilized blockchain for reliable copyright information storage or copyrighted digital media trading. As far as we know, the problem of…
▽ More
Copyright detection is an effective method to prevent piracy. However, untrustworthy detection parties may lead to falsified detection results. Due to its credibility and tamper resistance, blockchain has been applied to copyright protection. Previous works mainly utilized blockchain for reliable copyright information storage or copyrighted digital media trading. As far as we know, the problem of credible copyright detection has not been addressed. In this paper, we propose a credible copyright detection architecture based on the blockchain, called DCDChain. In this architecture, the detection agency first detects copyrights off the chain, then uploads the detection records to the blockchain. Since data on the blockchain are publicly accessible, media providers can verify the correctness of the copyright detection, and appeal to a smart contract if there is any dissent. The smart contract then arbitrates the disputes by verifying the correctness of detection on the chain. The detect-verify-and-arbitrate mechanism guarantees the credibility of copyright detection. Security analysis and experimental simulations show that the digital copyright detection architecture is credible, secure and efficient. The proposed credible copyright detection scheme is highly important for copyright protection. The future work is to improve the scheme by designing more effective locality sensitive hash algorithms for various digital media.
△ Less
Submitted 24 August, 2022; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Utility-efficient Differentially Private K-means Clustering based on Cluster Merging
Authors:
Tianjiao Ni,
Minghao Qiao,
Zhili Chen,
Shun Zhang,
Hong Zhong
Abstract:
Differential privacy is widely used in data analysis. State-of-the-art $k$-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private $k$-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and mergin…
▽ More
Differential privacy is widely used in data analysis. State-of-the-art $k$-means clustering algorithms with differential privacy typically add an equal amount of noise to centroids for each iterative computation. In this paper, we propose a novel differentially private $k$-means clustering algorithm, DP-KCCM, that significantly improves the utility of clustering by adding adaptive noise and merging clusters. Specifically, to obtain $k$ clusters with differential privacy, the algorithm first generates $n \times k$ initial centroids, adds adaptive noise for each iteration to get $n \times k$ clusters, and finally merges these clusters into $k$ ones. We theoretically prove the differential privacy of the proposed algorithm. Surprisingly, extensive experimental results show that: 1) cluster merging with equal amounts of noise improves the utility somewhat; 2) although adding adaptive noise only does not improve the utility, combining both cluster merging and adaptive noise further improves the utility significantly.
△ Less
Submitted 2 October, 2020;
originally announced October 2020.
-
Meta-SAC: Auto-tune the Entropy Temperature of Soft Actor-Critic via Metagradient
Authors:
Yufei Wang,
Tianwei Ni
Abstract:
Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and expl…
▽ More
Exploration-exploitation dilemma has long been a crucial issue in reinforcement learning. In this paper, we propose a new approach to automatically balance between these two. Our method is built upon the Soft Actor-Critic (SAC) algorithm, which uses an "entropy temperature" that balances the original task reward and the policy entropy, and hence controls the trade-off between exploitation and exploration. It is empirically shown that SAC is very sensitive to this hyperparameter, and the follow-up work (SAC-v2), which uses constrained optimization for automatic adjustment, has some limitations. The core of our method, namely Meta-SAC, is to use metagradient along with a novel meta objective to automatically tune the entropy temperature in SAC. We show that Meta-SAC achieves promising performances on several of the Mujoco benchmarking tasks, and outperforms SAC-v2 over 10% in one of the most challenging tasks, humanoid-v2.
△ Less
Submitted 31 July, 2020; v1 submitted 3 July, 2020;
originally announced July 2020.
-
Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics
Authors:
Tianyi Ni
Abstract:
More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace lingui…
▽ More
More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace linguists work, the automatic computational methods have been taken into consideration and it can help people reduce their workload. One of the most important work in historical linguistics is word comparison from different languages and find the cognate words for them, which means people try to figure out if the two languages are related to each other or not. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree based on the clusters.
△ Less
Submitted 13 March, 2020; v1 submitted 22 February, 2020;
originally announced February 2020.
-
Differentially Private Combinatorial Cloud Auction
Authors:
Tianjiao Ni,
Zhili Chen,
Lin Chen,
Hong Zhong,
Shun Zhang,
Yan Xu
Abstract:
Cloud service providers typically provide different types of virtual machines (VMs) to cloud users with various requirements. Thanks to its effectiveness and fairness, auction has been widely applied in this heterogeneous resource allocation. Recently, several strategy-proof combinatorial cloud auction mechanisms have been proposed. However, they fail to protect the bid privacy of users from being…
▽ More
Cloud service providers typically provide different types of virtual machines (VMs) to cloud users with various requirements. Thanks to its effectiveness and fairness, auction has been widely applied in this heterogeneous resource allocation. Recently, several strategy-proof combinatorial cloud auction mechanisms have been proposed. However, they fail to protect the bid privacy of users from being inferred from the auction results. In this paper, we design a differentially private combinatorial cloud auction mechanism (DPCA) to address this privacy issue. Technically, we employ the exponential mechanism to compute a clearing unit price vector with a probability proportional to the corresponding revenue. We further improve the mechanism to reduce the running time while maintaining high revenues, by computing a single clearing unit price, or a subgroup of clearing unit prices at a time, resulting in the improved mechanisms DPCA-S and its generalized version DPCA-M, respectively. We theoretically prove that our mechanisms can guarantee differential privacy, approximate truthfulness and high revenue. Extensive experimental results demonstrate that DPCA can generate near-optimal revenues at the price of relatively high time complexity, while the improved mechanisms achieve a tunable trade-off between auction revenue and running time.
△ Less
Submitted 2 January, 2020;
originally announced January 2020.
-
Elastic Boundary Projection for 3D Medical Image Segmentation
Authors:
Tianwei Ni,
Lingxi Xie,
Huangjie Zheng,
Elliot K. Fishman,
Alan L. Yuille
Abstract:
We focus on an important yet challenging problem: using a 2D deep network to deal with 3D segmentation for medical image analysis. Existing approaches either applied multi-view planar (2D) networks or directly used volumetric (3D) networks for this purpose, but both of them are not ideal: 2D networks cannot capture 3D contexts effectively, and 3D networks are both memory-consuming and less stable…
▽ More
We focus on an important yet challenging problem: using a 2D deep network to deal with 3D segmentation for medical image analysis. Existing approaches either applied multi-view planar (2D) networks or directly used volumetric (3D) networks for this purpose, but both of them are not ideal: 2D networks cannot capture 3D contexts effectively, and 3D networks are both memory-consuming and less stable arguably due to the lack of pre-trained models.
In this paper, we bridge the gap between 2D and 3D using a novel approach named Elastic Boundary Projection (EBP). The key observation is that, although the object is a 3D volume, what we really need in segmentation is to find its boundary which is a 2D surface. Therefore, we place a number of pivot points in the 3D space, and for each pivot, we determine its distance to the object boundary along a dense set of directions. This creates an elastic shell around each pivot which is initialized as a perfect sphere. We train a 2D deep network to determine whether each ending point falls within the object, and gradually adjust the shell so that it gradually converges to the actual shape of the boundary and thus achieves the goal of segmentation. EBP allows boundary-based segmentation without cutting a 3D volume into slices or patches, which stands out from conventional 2D and 3D approaches. EBP achieves promising accuracy in abdominal organ segmentation. Our code has been open-sourced https://github.com/twni2016/Elastic-Boundary-Projection.
△ Less
Submitted 6 June, 2020; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Phase Collaborative Network for Two-Phase Medical Image Segmentation
Authors:
Huangjie Zheng,
Lingxi Xie,
Tianwei Ni,
Ya Zhang,
Yan-Feng Wang,
Qi Tian,
Elliot K. Fishman,
Alan L. Yuille
Abstract:
In real-world practice, medical images acquired in different phases possess complementary information, {\em e.g.}, radiologists often refer to both arterial and venous scans in order to make the diagnosis. However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise co…
▽ More
In real-world practice, medical images acquired in different phases possess complementary information, {\em e.g.}, radiologists often refer to both arterial and venous scans in order to make the diagnosis. However, in medical image analysis, fusing prediction from two phases is often difficult, because (i) there is a domain gap between two phases, and (ii) the semantic labels are not pixel-wise corresponded even for images scanned from the same patient. This paper studies organ segmentation in two-phase CT scans. We propose Phase Collaborative Network (PCN), an end-to-end framework that contains both generative and discriminative modules. PCN can be mathematically explained to formulate phase-to-phase and data-to-label relations jointly. Experiments are performed on a two-phase CT dataset, on which PCN outperforms the baselines working with one-phase data by a large margin, and we empirically verify that the gain comes from inter-phase collaboration. Besides, PCN transfers well to two public single-phase datasets, demonstrating its potential applications.
△ Less
Submitted 12 September, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Differentially Private Double Spectrum Auction with Approximate Social Welfare Maximization
Authors:
Zhili Chen,
Tianjiao Ni,
Hong Zhong,
Shun Zhang,
Jie Cui
Abstract:
Spectrum auction is an effective approach to improving spectrum utilization, by leasing idle spectrum from primary users to secondary users. Recently, a few differentially private spectrum auction mechanisms have been proposed, but, as far as we know, none of them addressed the differential privacy in the setting of double spectrum auctions. In this paper, we combine the concept of differential pr…
▽ More
Spectrum auction is an effective approach to improving spectrum utilization, by leasing idle spectrum from primary users to secondary users. Recently, a few differentially private spectrum auction mechanisms have been proposed, but, as far as we know, none of them addressed the differential privacy in the setting of double spectrum auctions. In this paper, we combine the concept of differential privacy with double spectrum auction design, and present a Differentially private Double spectrum auction mechanism with approximate Social welfare Maximization (DDSM). Specifically, we design the mechanism by employing the exponential mechanism to select clearing prices for the double spectrum auction with probabilities exponentially proportional to the related social welfare values, and then improve the mechanism in several aspects like the designs of the auction algorithm, the utility function and the buyer grou** algorithm. Through theoretical analysis, we prove that DDSM achieves differential privacy, approximate truthfulness, approximate social welfare maximization. Extensive experimental evaluations show that DDSM achieves a good performance in term of social welfare.
△ Less
Submitted 17 October, 2018;
originally announced October 2018.
-
On The Robustness of Price-Anticipating Kelly Mechanism
Authors:
Yuedong Xu,
Zhujun Xiao,
Tianyu Ni,
Jessie Hui Wang,
Xin Wang,
Eitan Altman
Abstract:
The price-anticipating Kelly mechanism (PAKM) is one of the most extensively used strategies to allocate divisible resources for strategic users in communication networks and computing systems. The users are deemed as selfish and also benign, each of which maximizes his individual utility of the allocated resources minus his payment to the network operator. However, in many applications a user can…
▽ More
The price-anticipating Kelly mechanism (PAKM) is one of the most extensively used strategies to allocate divisible resources for strategic users in communication networks and computing systems. The users are deemed as selfish and also benign, each of which maximizes his individual utility of the allocated resources minus his payment to the network operator. However, in many applications a user can use his payment to reduce the utilities of his opponents, thus playing a misbehaving role. It remains mysterious to what extent the misbehaving user can damage or influence the performance of benign users and the network operator. In this work, we formulate a non-cooperative game consisting of a finite amount of benign users and one misbehaving user. The maliciousness of this misbehaving user is captured by his willingness to pay to trade for unit degradation in the utilities of benign users. The network operator allocates resources to all the users via the price-anticipating Kelly mechanism. We present six important performance metrics with regard to the total utility and the total net utility of benign users, and the revenue of network operator under three different scenarios: with and without the misbehaving user, and the maximum. We quantify the robustness of PAKM against the misbehaving actions by deriving the upper and lower bounds of these metrics. With new approaches, all the theoretical bounds are applicable to an arbitrary population of benign users. Our study reveals two important insights: i) the performance bounds are very sensitive to the misbehaving user's willingness to pay at certain ranges; ii) the network operator acquires more revenues in the presence of the misbehaving user which might disincentivize his countermeasures against the misbehaving actions.
△ Less
Submitted 4 October, 2021; v1 submitted 5 February, 2016;
originally announced February 2016.