Search | arXiv e-print repository

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Authors: Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi, Kentaro Inui

Abstract: Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of… ▽ More Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of knowledge deletion and the entities related to the knowledge. Our findings reveal that deleting knowledge related to popular entities can have catastrophic side effects. Furthermore, this research is the first to analyze knowledge deletion in models trained on synthetic knowledge graphs, indicating a new direction for controlled experiments. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2304.03177 [pdf, other]

Mutual Interference Mitigation for MIMO-FMCW Automotive Radar

Authors: Sian **, Pu Perry Wang, Petros Boufounos, Philip V. Orlik, Ryuhei Takahashi, Sumit Roy

Abstract: This paper considers mutual interference mitigation among automotive radars using frequency-modulated continuous wave (FMCW) signal and multiple-input multiple-output (MIMO) virtual arrays. For the first time, we derive a general interference signal model that fully accounts for not only the time-frequency incoherence, e.g., different FMCW configuration parameters and time offsets, but also the sl… ▽ More This paper considers mutual interference mitigation among automotive radars using frequency-modulated continuous wave (FMCW) signal and multiple-input multiple-output (MIMO) virtual arrays. For the first time, we derive a general interference signal model that fully accounts for not only the time-frequency incoherence, e.g., different FMCW configuration parameters and time offsets, but also the slow-time code MIMO incoherence and array configuration differences between the victim and interfering radars. Along with a standard MIMO-FMCW object signal model, we turn the interference mitigation into a spatial-domain object detection under incoherent MIMO-FMCW interference described by the explicit interference signal model, and propose a constant false alarm rate (CFAR) detector. More specifically, the proposed detector exploits the structural property of the derived interference model at both \emph{transmit} and \emph{receive} steering vector space. We also derive analytical closed-form expressions for probabilities of detection and false alarm. Performance evaluation using both synthetic-level and phased array system-level simulation confirms the effectiveness of our proposed detector over selected baseline methods. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: 15 pages, 10 figures;

arXiv:2303.13465 [pdf, other]

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Authors: Itsugun Cho, Ryota Takahashi, Yusaku Yanase, Hiroaki Saito

Abstract: Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presen… ▽ More Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presents theoretical analysis and experiments that reveal the performance of the dialogue policy is positively correlated with the sampling size. To overcome this limitation, we introduce a novel dual-granularity Q-function that explores the most promising response category to intervene in the sampling process. Our approach extracts actions based on a grained hierarchy, thereby achieving the optimum with fewer policy iterations. Additionally, we use offline RL and learn from multiple reward functions designed to capture emotional nuances in human interactions. Empirical studies demonstrate that our algorithm outperforms baselines across automatic metrics and human evaluations. Further testing reveals that our algorithm exhibits both explainability and controllability and generates responses with higher expected rewards. △ Less

Submitted 15 May, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2205.09295 [pdf, other]

Are Prompt-based Models Clueless?

Authors: Pride Kavumba, Ryo Takahashi, Yusuke Oda

Abstract: Finetuning large pre-trained language models with a task-specific head has advanced the state-of-the-art on many natural language understanding benchmarks. However, models with a task-specific head require a lot of training data, making them susceptible to learning and exploiting dataset-specific superficial cues that do not generalize to other datasets. Prompting has reduced the data requirement… ▽ More Finetuning large pre-trained language models with a task-specific head has advanced the state-of-the-art on many natural language understanding benchmarks. However, models with a task-specific head require a lot of training data, making them susceptible to learning and exploiting dataset-specific superficial cues that do not generalize to other datasets. Prompting has reduced the data requirement by reusing the language model head and formatting the task input to match the pre-training objective. Therefore, it is expected that few-shot prompt-based models do not exploit superficial cues. This paper presents an empirical examination of whether few-shot prompt-based models also exploit superficial cues. Analyzing few-shot prompt-based models on MNLI, SNLI, HANS, and COPA has revealed that prompt-based models also exploit superficial cues. While the models perform well on instances with superficial cues, they often underperform or only marginally outperform random accuracy on instances without superficial cues. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

arXiv:2204.07372 [pdf, other]

A Personalized Dialogue Generator with Implicit User Persona Detection

Authors: Itsugun Cho, Dongyang Wang, Ryota Takahashi, Hiroaki Saito

Abstract: Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on infer… ▽ More Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on inferring information about the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting an implicit user persona. Because it is hard to collect a large number of detailed personas for each user, we attempted to model the user's potential persona and its representation from dialogue history, with no external knowledge. The perception and fader variables were conceived using conditional variational inference. The two latent variables simulate the process of people being aware of each other's persona and producing a corresponding expression in conversation. Finally, posterior-discriminated regularization was presented to enhance the training procedure. Empirical studies demonstrate that, compared to state-of-the-art methods, our approach is more concerned with the user's persona and achieves a considerable boost across the evaluations. △ Less

Submitted 21 August, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 9 pages, 7 figures, Accepted by Coling2022

arXiv:2105.11919 [pdf, other]

Minmax-optimal list searching with $O(\log_2\log_2 n)$ average cost

Authors: I. F. D. Oliveira, R. H. C. Takahashi

Abstract: We find a searching method on ordered lists that surprisingly outperforms binary searching with respect to average query complexity while retaining minmax optimality. The method is shown to require $O(\log_2\log_2 n)$ queries on average while never exceeding $\lceil \log_2 n \rceil$ queries in the worst case, i.e. the minmax bound of binary searching. Our average results assume a uniform distribut… ▽ More We find a searching method on ordered lists that surprisingly outperforms binary searching with respect to average query complexity while retaining minmax optimality. The method is shown to require $O(\log_2\log_2 n)$ queries on average while never exceeding $\lceil \log_2 n \rceil$ queries in the worst case, i.e. the minmax bound of binary searching. Our average results assume a uniform distribution hypothesis similar to those of prevous authors under which the expected query complexity of interpolation search of $O(\log_2\log_2 n)$ is known to be optimal. Hence our method turns out to be optimal with respect to both minmax and average performance. We further provide robustness guarantees and perform several numerical experiments with both artificial and real data. Our results suggest that time savings range roughly from a constant factor of 10\% to 50\% to a logarithmic factor spanning orders of magnitude when different metrics are considered. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: under consideration by the Journal of Computer and System Sciences

MSC Class: 68P10; 68W40; 68Q25 ACM Class: F.2.2; F.2.3; H.3.3

arXiv:2105.11845 [pdf, other]

An incremental descent method for multi-objective optimization

Authors: I. F. D. Oliveira, R. H. C. Takahashi

Abstract: Current state-of-the-art multi-objective optimization solvers, by computing gradients of all $m$ objective functions per iteration, produce after $k$ iterations a measure of proximity to critical conditions that is upper-bounded by $O(1/\sqrt{k})$ when the objective functions are assumed to have $L-$Lipschitz continuous gradients; i.e. they require $O(m/ε^2)$ gradient and function computations to… ▽ More Current state-of-the-art multi-objective optimization solvers, by computing gradients of all $m$ objective functions per iteration, produce after $k$ iterations a measure of proximity to critical conditions that is upper-bounded by $O(1/\sqrt{k})$ when the objective functions are assumed to have $L-$Lipschitz continuous gradients; i.e. they require $O(m/ε^2)$ gradient and function computations to produce a measure of proximity to critical conditions bellow some target $ε$. We reduce this to $O(1/ε^2)$ with a method that requires only a constant number of gradient and function computations per iteration; and thus, we obtain for the first time a multi-objective descent-type method with a query complexity cost that is unaffected by increasing values of $m$. For this, a brand new multi-objective descent direction is identified, which we name the \emph{central descent direction}, and, an incremental approach is proposed. Robustness properties of the central descent direction are established, measures of proximity to critical conditions are derived, and, the incremental strategy for finding solutions to the multi-objective problem is shown to attain convergence properties unattained by previous methods. To the best of our knowledge, this is the first method to achieve this with no additional a-priori information on the structure of the problem, such as done by scalarizing techniques, and, with no pre-known information on the regularity of the objective functions other than Lipschitz continuity of the gradients. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: paper pre-submission

MSC Class: 90C26 90C29 ACM Class: G.1.6

arXiv:2103.00535 [pdf, other]

A multi-objective time series analysis of community mobility reduction comparing first and second COVID-19 waves

Authors: Gabriela Cavalcante da Silva, Fernanda Monteiro de Almeida, Sabrina Oliveira, Leonardo C. T. Bezerra, Elizabeth F. Wanner, Ricardo H. C. Takahashi

Abstract: With the logistic challenges faced by most countries for the production, distribution, and application of vaccines for the novel coronavirus disease~(COVID-19), social distancing~(SD) remains the most tangible approach to mitigate the spread of the virus. To assist SD monitoring, several tech companies have made publicly available anonymized mobility data. In this work, we conduct a multi-objectiv… ▽ More With the logistic challenges faced by most countries for the production, distribution, and application of vaccines for the novel coronavirus disease~(COVID-19), social distancing~(SD) remains the most tangible approach to mitigate the spread of the virus. To assist SD monitoring, several tech companies have made publicly available anonymized mobility data. In this work, we conduct a multi-objective mobility reduction rate comparison between the first and second COVID-19 waves in several localities from America and Europe using Google community mobility reports~(CMR) data. Through multi-dimensional visualization, we are able to compare in a Pareto-compliant way the reduction in mobility from the different lockdown periods for each locality selected, simultaneously considering all place categories provided in CMR. In addition, our analysis comprises a 56-day lockdown period for each locality and COVID-19 wave, which we analyze both as 56-day periods and as 14-day consecutive windows. Results vary considerably as a function of the locality considered, particularly when the temporal evolution of the mobility reduction is considered. We thus discuss each locality individually, relating social distancing measures and the reduction observed. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2102.06540 [pdf, other]

Two Training Strategies for Improving Relation Extraction over Universal Graph

Authors: Qin Dai, Naoya Inoue, Ryo Takahashi, Kentaro Inui

Abstract: This paper explores how the Distantly Supervised Relation Extraction (DS-RE) can benefit from the use of a Universal Graph (UG), the combination of a Knowledge Graph (KG) and a large-scale text collection. A straightforward extension of a current state-of-the-art neural model for DS-RE with a UG may lead to degradation in performance. We first report that this degradation is associated with the di… ▽ More This paper explores how the Distantly Supervised Relation Extraction (DS-RE) can benefit from the use of a Universal Graph (UG), the combination of a Knowledge Graph (KG) and a large-scale text collection. A straightforward extension of a current state-of-the-art neural model for DS-RE with a UG may lead to degradation in performance. We first report that this degradation is associated with the difficulty in learning a UG and then propose two training strategies: (1) Path Type Adaptive Pretraining, which sequentially trains the model with different types of UG paths so as to prevent the reliance on a single type of UG path; and (2) Complexity Ranking Guided Attention mechanism, which restricts the attention span according to the complexity of a UG path so as to force the model to extract features not only from simple UG paths but also from complex ones. Experimental results on both biomedical and NYT10 datasets prove the robustness of our methods and achieve a new state-of-the-art result on the NYT10 dataset. The code and datasets used in this paper are available at https://github.com/baodaiqin/UGDSRE. △ Less

Submitted 6 May, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

arXiv:2101.00133 [pdf, other]

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA. △ Less

Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

arXiv:2011.01785 [pdf, other]

Modeling Event Salience in Narratives via Barthes' Cardinal Functions

Authors: Takaki Otake, Sho Yokoi, Naoya Inoue, Ryo Takahashi, Tatsuki Kuribayashi, Kentaro Inui

Abstract: Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained… ▽ More Events in a narrative differ in salience: some are more important to the story than others. Estimating event salience is useful for tasks such as story generation, and as a tool for text analysis in narratology and folkloristics. To compute event salience without any annotations, we adopt Barthes' definition of event salience and propose several unsupervised methods that require only a pre-trained language model. Evaluating the proposed methods on folktales with event salience annotation, we show that the proposed methods outperform baseline methods and find fine-tuning a language model on narrative texts is a key factor in improving the proposed methods. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: accepted to COLING 2020

arXiv:2011.00948 [pdf, other]

An Empirical Study of Contextual Data Augmentation for Japanese Zero Anaphora Resolution

Authors: Ryuto Konno, Yuichiroh Matsubayashi, Shun Kiyono, Hiroki Ouchi, Ryo Takahashi, Kentaro Inui

Abstract: One critical issue of zero anaphora resolution (ZAR) is the scarcity of labeled data. This study explores how effectively this problem can be alleviated by data augmentation. We adopt a state-of-the-art data augmentation method, called the contextual data augmentation (CDA), that generates labeled training instances using a pretrained language model. The CDA has been reported to work well for seve… ▽ More One critical issue of zero anaphora resolution (ZAR) is the scarcity of labeled data. This study explores how effectively this problem can be alleviated by data augmentation. We adopt a state-of-the-art data augmentation method, called the contextual data augmentation (CDA), that generates labeled training instances using a pretrained language model. The CDA has been reported to work well for several other natural language processing tasks, including text classification and machine translation. This study addresses two underexplored issues on CDA, that is, how to reduce the computational cost of data augmentation and how to ensure the quality of the generated data. We also propose two methods to adapt CDA to ZAR: [MASK]-based augmentation and linguistically-controlled masking. Consequently, the experimental results on Japanese ZAR show that our methods contribute to both the accuracy gain and the computation cost reduction. Our closer analysis reveals that the proposed method can improve the quality of the augmented training data when compared to the conventional CDA. △ Less

Submitted 4 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 13 pages, accepted by COLING 2020

arXiv:2004.15003 [pdf, other]

Word Rotator's Distance

Authors: Sho Yokoi, Ryo Takahashi, Reina Akama, Jun Suzuki, Kentaro Inui

Abstract: A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of… ▽ More A key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment. Such alignment-based approaches are intuitive and interpretable; however, they are empirically inferior to the simple cosine similarity between general-purpose sentence vectors. To address this issue, we focus on and demonstrate the fact that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity. Alignment-based approaches do not distinguish them, whereas sentence-vector approaches automatically use the norm as the word importance. Accordingly, we propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity using earth mover's distance (i.e., optimal transport cost), which we refer to as word rotator's distance. Besides, we find how to grow the norm and direction of word vectors (vector converter), which is a new systematic approach derived from sentence-vector estimation methods. On several textual similarity datasets, the combination of these simple proposed methods outperformed not only alignment-based approaches but also strong baselines. The source code is available at https://github.com/eumesy/wrd △ Less

Submitted 16 November, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

Comments: 17 pages, accepted at EMNLP 2020

Journal ref: EMNLP 2020

arXiv:2001.07895 [pdf, other]

Partially-Shared Variational Auto-encoders for Unsupervised Domain Adaptation with Target Shift

Authors: Ryuhei Takahashi, Atsushi Hashimoto, Motoharu Sonogashira, Masaaki Iiyama

Abstract: This paper proposes a novel approach for unsupervised domain adaptation (UDA) with target shift. Target shift is a problem of mismatch in label distribution between source and target domains. Typically it appears as class-imbalance in target domain. In practice, this is an important problem in UDA; as we do not know labels in target domain datasets, we do not know whether or not its distribution i… ▽ More This paper proposes a novel approach for unsupervised domain adaptation (UDA) with target shift. Target shift is a problem of mismatch in label distribution between source and target domains. Typically it appears as class-imbalance in target domain. In practice, this is an important problem in UDA; as we do not know labels in target domain datasets, we do not know whether or not its distribution is identical to that in the source domain dataset. Many traditional approaches achieve UDA with distribution matching by minimizing mean maximum discrepancy or adversarial training; however these approaches implicitly assume a coincidence in the distributions and do not work under situations with target shift. Some recent UDA approaches focus on class boundary and some of them are robust to target shift, but they are only applicable to classification and not to regression. To overcome the target shift problem in UDA, the proposed method, partially shared variational autoencoders (PS-VAEs), uses pair-wise feature alignment instead of feature distribution matching. PS-VAEs inter-convert domain of each sample by a CycleGAN-based architecture while preserving its label-related content. To evaluate the performance of PS-VAEs, we carried out two experiments: UDA with class-unbalanced digits datasets (classification), and UDA from synthesized data to real observation in human-pose-estimation (regression). The proposed method presented its robustness against the class-imbalance in the classification task, and outperformed the other methods in the regression task with a large margin. △ Less

Submitted 25 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

arXiv:1811.09030 [pdf, other]

doi 10.1109/TCSVT.2019.2935128

Data Augmentation using Random Image Crop** and Patching for Deep CNNs

Authors: Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara

Abstract: Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study,… ▽ More Deep convolutional neural networks (CNNs) have achieved remarkable results in image processing tasks. However, their high expression ability risks overfitting. Consequently, data augmentation techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters are rendering traditional data augmentation techniques insufficient. In this study, we propose a new data augmentation technique called random image crop** and patching (RICAP) which randomly crops four images and patches them to create a new training image. Moreover, RICAP mixes the class labels of the four images, resulting in an advantage similar to label smoothing. We evaluated RICAP with current state-of-the-art CNNs (e.g., the shake-shake regularization model) by comparison with competitive data augmentation techniques such as cutout and mixup. RICAP achieves a new state-of-the-art test error of $2.19\%$ on CIFAR-10. We also confirmed that deep CNNs with RICAP achieve better results on classification tasks using CIFAR-100 and ImageNet and an image-caption retrieval task using Microsoft COCO. △ Less

Submitted 27 August, 2019; v1 submitted 22 November, 2018; originally announced November 2018.

Comments: accepted version, 16 pages

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, 2019

arXiv:1805.09547 [pdf, other]

Interpretable and Compositional Relation Learning by Joint Training with an Autoencoder

Authors: Ryo Takahashi, Ran Tian, Kentaro Inui

Abstract: Embedding models for entities and relations are extremely useful for recovering missing facts in a knowledge base. Intuitively, a relation can be modeled by a matrix map** entity vectors. However, relations reside on low dimension sub-manifolds in the parameter space of arbitrary matrices---for one reason, composition of two relations $\boldsymbol{M}_1,\boldsymbol{M}_2$ may match a third… ▽ More Embedding models for entities and relations are extremely useful for recovering missing facts in a knowledge base. Intuitively, a relation can be modeled by a matrix map** entity vectors. However, relations reside on low dimension sub-manifolds in the parameter space of arbitrary matrices---for one reason, composition of two relations $\boldsymbol{M}_1,\boldsymbol{M}_2$ may match a third $\boldsymbol{M}_3$ (e.g. composition of relations currency_of_country and country_of_film usually matches currency_of_film_budget), which imposes compositional constraints to be satisfied by the parameters (i.e. $\boldsymbol{M}_1\cdot \boldsymbol{M}_2\approx \boldsymbol{M}_3$). In this paper we investigate a dimension reduction technique by training relations jointly with an autoencoder, which is expected to better capture compositional constraints. We achieve state-of-the-art on Knowledge Base Completion tasks with strongly improved Mean Rank, and show that joint training with an autoencoder leads to interpretable sparse codings of relations, helps discovering compositional constraints and benefits from compositional training. Our source code is released at github.com/tianran/glimvec. △ Less

Submitted 24 May, 2018; originally announced May 2018.

Comments: Equal contribution from first two authors. Accepted for publication in the ACL 2018

arXiv:1702.03505 [pdf, other]

doi 10.1109/TCSVT.2018.2822773

A Novel Weight-Shared Multi-Stage CNN for Scale Robustness

Authors: Ryo Takahashi, Takashi Matsubara, Kuniaki Uehara

Abstract: Convolutional neural networks (CNNs) have demonstrated remarkable results in image classification for benchmark tasks and practical applications. The CNNs with deeper architectures have achieved even higher performance recently thanks to their robustness to the parallel shift of objects in images as well as their numerous parameters and the resulting high expression ability. However, CNNs have a l… ▽ More Convolutional neural networks (CNNs) have demonstrated remarkable results in image classification for benchmark tasks and practical applications. The CNNs with deeper architectures have achieved even higher performance recently thanks to their robustness to the parallel shift of objects in images as well as their numerous parameters and the resulting high expression ability. However, CNNs have a limited robustness to other geometric transformations such as scaling and rotation. This limits the performance improvement of the deep CNNs, but there is no established solution. This study focuses on scale transformation and proposes a network architecture called the weight-shared multi-stage network (WSMS-Net), which consists of multiple stages of CNNs. The proposed WSMS-Net is easily combined with existing deep CNNs such as ResNet and DenseNet and enables them to acquire robustness to object scaling. Experimental results on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate that existing deep CNNs combined with the proposed WSMS-Net achieve higher accuracies for image classification tasks with only a minor increase in the number of parameters and computation time. △ Less

Submitted 11 April, 2019; v1 submitted 12 February, 2017; originally announced February 2017.

Comments: accepted version, 13 pages

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 4, 2019, pp. 1090-1101

arXiv:1503.02642 [pdf, ps, other]

Security of Power Packet Dispatching Using Differential Chaos Shift Keying

Authors: Yanzi Zhou, Ryo Takahashi, Takashi Hikihara

Abstract: This paper investigates and confirms one advantageous function of a power packet dispatching system, which has been proposed by authors' group with being apart from the conventional power distribution system. Here is focused on the function to establish the security of power packet dispatching for prohibiting not only information but also power of power packet from being stolen by attackers. For t… ▽ More This paper investigates and confirms one advantageous function of a power packet dispatching system, which has been proposed by authors' group with being apart from the conventional power distribution system. Here is focused on the function to establish the security of power packet dispatching for prohibiting not only information but also power of power packet from being stolen by attackers. For the purpose of protecting power packets, we introduce a simple encryption of power packets before sending them. Encryption scheme based on chaotic signal is one possibility for this purpose. This paper adopts the Differential Chaos Shift Keying (DCSK) scheme for the encryption, those are partial power packet encryption and whole power packet encryption. △ Less

Submitted 19 February, 2015; originally announced March 2015.

Comments: 9 pages, 19 figures

Showing 1–18 of 18 results for author: Takahashi, R