Search | arXiv e-print repository

arXiv:2406.19370 [pdf, other]

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Authors: Core Francisco Park, Maya Okawa, Andrew Lee, Ekdeep Singh Lubana, Hidenori Tanaka

Abstract: Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data. However, fundamental questions remain: what determines the concepts a model learns, the order in which it learns them, and its ability to manipulate those concepts? To address these questions, we propose analyzing a model's learn… ▽ More Modern generative models demonstrate impressive capabilities, likely stemming from an ability to identify and manipulate abstract concepts underlying their training data. However, fundamental questions remain: what determines the concepts a model learns, the order in which it learns them, and its ability to manipulate those concepts? To address these questions, we propose analyzing a model's learning dynamics via a framework we call the concept space, where each axis represents an independent concept underlying the data generating process. By characterizing learning dynamics in this space, we identify how the speed at which a concept is learned, and hence the order of concept learning, is controlled by properties of the data we term concept signal. Further, we observe moments of sudden turns in the direction of a model's learning dynamics in concept space. Surprisingly, these points precisely correspond to the emergence of hidden capabilities, i.e., where latent interventions show the model possesses the capability to manipulate a concept, but these capabilities cannot yet be elicited via naive input prompting. While our results focus on synthetically defined toy datasets, we hypothesize a general claim on emergence of hidden capabilities may hold: generative models possess latent capabilities that emerge suddenly and consistently during training, though a model might not exhibit these capabilities under naive input prompting. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.03958 [pdf]

Haptic in-sensor computing device made of carbon nanotube-polydimethylsiloxane nanocomposites

Authors: Kouki Kimizuka, Saman Azhari, Shoshi Tokuno, Ahmet Karacali, Yuki Usami, Shuhei Ikemoto, Hakaru Tamukoh, Hirofumi Tanaka

Abstract: The importance of haptic in-sensor computing devices has been increasing. In this study, we successfully fabricated a haptic sensor with a hierarchical structure via the sacrificial template method, using carbon nanotubes-polydimethylsiloxane (CNTs-PDMS) nanocomposites for in-sensor computing applications. The CNTs-PDMS nanocomposite sensors, with different sensitivities, were obtained by varying… ▽ More The importance of haptic in-sensor computing devices has been increasing. In this study, we successfully fabricated a haptic sensor with a hierarchical structure via the sacrificial template method, using carbon nanotubes-polydimethylsiloxane (CNTs-PDMS) nanocomposites for in-sensor computing applications. The CNTs-PDMS nanocomposite sensors, with different sensitivities, were obtained by varying the amount of CNTs. We transformed the input stimuli into higher-dimensional information, enabling a new path for the CNTs-PDMS nanocomposite application, which was implemented on a robotic hand as an in-sensor computing device by applying a reservoir computing paradigm. The nonlinear output data obtained from the sensors were trained using linear regression and used to classify nine different objects used in everyday life with an object recognition accuracy of >80 % for each object. This approach could enable tactile sensation in robots while reducing the computational cost. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 24 pages, 12 figures

arXiv:2403.03641 [pdf, other]

Online Photon Guiding with 3D Gaussians for Caustics Rendering

Authors: Jiawei Huang, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura

Abstract: In production rendering systems, caustics are typically rendered via photon map** and gathering, a process often hindered by insufficient photon density. In this paper, we propose a novel photon guiding method to improve the photon density and overall quality for caustic rendering. The key insight of our approach is the application of a global 3D Gaussian mixture model, used in conjunction with… ▽ More In production rendering systems, caustics are typically rendered via photon map** and gathering, a process often hindered by insufficient photon density. In this paper, we propose a novel photon guiding method to improve the photon density and overall quality for caustic rendering. The key insight of our approach is the application of a global 3D Gaussian mixture model, used in conjunction with an adaptive light sampler. This combination effectively guides photon emission in expansive 3D scenes with multiple light sources. By employing a global 3D Gaussian mixture, our method precisely models the distribution of the points of interest. To sample emission directions from the distribution at any observation point, we introduce a novel directional transform of the 3D Gaussian, which ensures accurate photon emission guiding. Furthermore, our method integrates a global light cluster tree, which models the contribution distribution of light sources to the image, facilitating effective light source selection. We conduct experiments demonstrating that our approach robustly outperforms existing photon guiding techniques across a variety of scenarios, significantly advancing the quality of caustic rendering. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.11197 [pdf, other]

Centroid-Based Efficient Minimum Bayes Risk Decoding

Authors: Hiroyuki Deguchi, Yusuke Sakai, Hidetaka Kamigaito, Taro Watanabe, Hideki Tanaka, Masao Utiyama

Abstract: Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding.… ▽ More Minimum Bayes risk (MBR) decoding achieved state-of-the-art translation performance by using COMET, a neural metric that has a high correlation with human evaluation. However, MBR decoding requires quadratic time since it computes the expected score between a translation hypothesis and all reference translations. We propose centroid-based MBR (CBMBR) decoding to improve the speed of MBR decoding. Our method clusters the reference translations in the feature space, and then calculates the score using the centroids of each cluster. The experimental results show that our CBMBR not only improved the decoding speed of the expected score calculation 5.7 times, but also outperformed vanilla MBR decoding in translation quality by up to 0.5 COMET in the WMT'22 En$\leftrightarrow$Ja, En$\leftrightarrow$De, En$\leftrightarrow$Zh, and WMT'23 En$\leftrightarrow$Ja translation tasks. △ Less

Submitted 11 June, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

Comments: Accepted at Findings of ACL 2024

arXiv:2402.07757 [pdf, other]

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Authors: Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert Dick, Ekdeep Singh Lubana, Hidenori Tanaka

Abstract: Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems. Despite the significant gain in performance achieved via these protocols, the underlying mechanisms of stepwise inference have remained elusive. To address this, we propose to study autoregressive Transformer models on a syn… ▽ More Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems. Despite the significant gain in performance achieved via these protocols, the underlying mechanisms of stepwise inference have remained elusive. To address this, we propose to study autoregressive Transformer models on a synthetic task that embodies the multi-step nature of problems where stepwise inference is generally most useful. Specifically, we define a graph navigation problem wherein a model is tasked with traversing a path from a start to a goal node on the graph. Despite is simplicity, we find we can empirically reproduce and analyze several phenomena observed at scale: (i) the stepwise inference reasoning gap, the cause of which we find in the structure of the training data; (ii) a diversity-accuracy tradeoff in model generations as sampling temperature varies; (iii) a simplicity bias in the model's output; and (iv) compositional generalization and a primacy bias with in-context exemplars. Overall, our work introduces a grounded, synthetic framework for studying stepwise inference and offers mechanistic hypotheses that can lay the foundation for a deeper understanding of this phenomenon. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.15966 [pdf, ps, other]

Response Generation for Cognitive Behavioral Therapy with Large Language Models: Comparative Study with Socratic Questioning

Authors: Kenta Izumi, Hiroki Tanaka, Kazuhiro Shidara, Hiroyoshi Adachi, Daisuke Kanayama, Takashi Kudo, Satoshi Nakamura

Abstract: Dialogue systems controlled by predefined or rule-based scenarios derived from counseling techniques, such as cognitive behavioral therapy (CBT), play an important role in mental health apps. Despite the need for responsible responses, it is conceivable that using the newly emerging LLMs to generate contextually relevant utterances will enhance these apps. In this study, we construct dialogue modu… ▽ More Dialogue systems controlled by predefined or rule-based scenarios derived from counseling techniques, such as cognitive behavioral therapy (CBT), play an important role in mental health apps. Despite the need for responsible responses, it is conceivable that using the newly emerging LLMs to generate contextually relevant utterances will enhance these apps. In this study, we construct dialogue modules based on a CBT scenario focused on conventional Socratic questioning using two kinds of LLMs: a Transformer-based dialogue model further trained with a social media empathetic counseling dataset, provided by Osaka Prefecture (OsakaED), and GPT-4, a state-of-the art LLM created by OpenAI. By comparing systems that use LLM-generated responses with those that do not, we investigate the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality (e.g., empathy). As a result, no notable improvements are observed when using the OsakaED model. When using GPT-4, the amount of mood change, empathy, and other dialogue qualities improve significantly. Results suggest that GPT-4 possesses a high counseling ability. However, they also indicate that even when using a dialogue model trained with a human counseling dataset, it does not necessarily yield better outcomes compared to scenario-based dialogues. While presenting LLM-generated responses, including GPT-4, and having them interact directly with users in real-life mental health care services may raise ethical issues, it is still possible for human professionals to produce example responses or response templates using LLMs in advance in systems that use rules, scenarios, or example responses. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted by IWSDS2024

arXiv:2312.00246 [pdf, other]

Directions of Curvature as an Explanation for Loss of Plasticity

Authors: Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado

Abstract: Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of p… ▽ More Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for loss of plasticity: Neural networks lose directions of curvature during training and that loss of plasticity can be attributed to this reduction in curvature. To support such a claim, we provide a systematic investigation of loss of plasticity across continual learning tasks using MNIST, CIFAR-10 and ImageNet. Our findings illustrate that loss of curvature directions coincides with loss of plasticity, while also showing that previous explanations are insufficient to explain loss of plasticity in all settings. Lastly, we show that regularizers which mitigate loss of plasticity also preserve curvature, motivating a simple distributional regularizer that proves to be effective across the problem settings we considered. △ Less

Submitted 27 June, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

arXiv:2311.12997 [pdf, other]

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

Authors: Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka

Abstract: Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we train autoregressive Transformer models on… ▽ More Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing basic arithmetic. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we train autoregressive Transformer models on a synthetic data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from small amounts of training data and generalize to exponentially or even combinatorially many functions; (2) generating intermediate outputs when composing functions is more effective for generalizing to new, unseen compositions than not generating any intermediate outputs (3) biases in the order of the compositions in the training data result in Transformers that fail to compose some combinations of functions; and (4) the attention layers select which capability to apply while the feed-forward layers execute the selected capability. △ Less

Submitted 5 February, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.12786 [pdf, other]

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

Abstract: Fine-tuning large pre-trained models has become the de facto strategy for develo** both task-specific and general-purpose machine learning systems, including develo** models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely nov… ▽ More Fine-tuning large pre-trained models has become the de facto strategy for develo** both task-specific and general-purpose machine learning systems, including develo** models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely novel capabilities or does it just modulate existing ones? We address this question empirically in synthetic, controlled settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2310.17639 [pdf, other]

In-Context Learning Dynamics with Random Binary Sequences

Authors: Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman

Abstract: Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context l… ▽ More Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition. △ Less

Submitted 15 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.09494 [pdf, other]

doi 10.1145/3577190.3614132

Computational analyses of linguistic features with schizophrenic and autistic traits along with formal thought disorders

Authors: Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura

Abstract: [See full abstract in the pdf] Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). This paper collected a Japanese audio-report dataset wit… ▽ More [See full abstract in the pdf] Formal Thought Disorder (FTD), which is a group of symptoms in cognition that affects language and thought, can be observed through language. FTD is seen across such developmental or psychiatric disorders as Autism Spectrum Disorder (ASD) or Schizophrenia, and its related Schizotypal Personality Disorder (SPD). This paper collected a Japanese audio-report dataset with score labels related to ASD and SPD through a crowd-sourcing service from the general population. We measured language characteristics with the 2nd edition of the Social Responsiveness Scale (SRS2) and the Schizotypal Personality Questionnaire (SPQ), including an odd speech subscale from SPQ to quantify the FTD symptoms. We investigated the following four research questions through machine-learning-based score predictions: (RQ1) How are schizotypal and autistic measures correlated? (RQ2) What is the most suitable task to elicit FTD symptoms? (RQ3) Does the length of speech affect the elicitation of FTD symptoms? (RQ4) Which features are critical for capturing FTD symptoms? We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly. Our regression analysis indicated that longer speech about a negative memory elicited more FTD symptoms. The ablation study confirmed the importance of function words and both the abstract and temporal features for FTD-related odd speech estimation. In contrast, content words were effective only in the SRS predictions, and content words were effective only in the SPQ predictions, a result that implies the differences between SPD-like and ASD-like symptoms. Data and programs used in this paper can be found here: https://sites.google.com/view/sagatake/resource. △ Less

Submitted 14 October, 2023; originally announced October 2023.

Comments: This is a revised version of the ICMI2023 paper with the same title

ACM Class: J.4; J.3; I.2.1; I.2.7

Journal ref: Proceedings of the 25th International Conference on Multimodal Interaction (2023)

arXiv:2310.09336 [pdf, other]

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Authors: Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

Abstract: Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion model… ▽ More Modern generative models exhibit unprecedented capabilities to generate extremely realistic data. However, given the inherent compositionality of the real world, reliable use of these models in practical applications requires that they exhibit the capability to compose a novel set of concepts to generate outputs not seen in the training data set. Prior work demonstrates that recent diffusion models do exhibit intriguing compositional generalization abilities, but also fail unpredictably. Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution. Our results show: (i) the order in which the ability to generate samples from a concept and compose them emerges is governed by the structure of the underlying data-generating process; (ii) performance on compositional tasks exhibits a sudden "emergence" due to multiplicative reliance on the performance of constituent tasks, partially explaining emergent phenomena seen in generative models; and (iii) composing concepts with lower frequency in the training data to generate out-of-distribution samples requires considerably more optimization steps compared to generating in-distribution samples. Overall, our study lays a foundation for understanding capabilities and compositionality in generative models from a data-centric perspective. △ Less

Submitted 16 February, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 37th Conference on Neural Information Processing Systems (NeurIPS)

arXiv:2306.03491 [pdf, other]

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Authors: Zhishen Yang, Raj Dabre, Hideki Tanaka, Naoaki Okazaki

Abstract: In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augm… ▽ More In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~\cite{hsu-etal-2021-scicap-generating} to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset △ Less

Submitted 6 June, 2023; originally announced June 2023.

Comments: Published in SDU workshop at AAAI23

arXiv:2303.08064 [pdf, other]

doi 10.1145/3649310

Online Neural Path Guiding with Normalized Anisotropic Spherical Gaussians

Authors: Jiawei Huang, Akito Iizuka, Hajime Tanaka, Taku Komura, Yoshifumi Kitamura

Abstract: The variance reduction speed of physically-based rendering is heavily affected by the adopted importance sampling technique. In this paper we propose a novel online framework to learn the spatial-varying density model with a single small neural network using stochastic ray samples. To achieve this task, we propose a novel closed-form density model called the normalized anisotropic spherical gaussi… ▽ More The variance reduction speed of physically-based rendering is heavily affected by the adopted importance sampling technique. In this paper we propose a novel online framework to learn the spatial-varying density model with a single small neural network using stochastic ray samples. To achieve this task, we propose a novel closed-form density model called the normalized anisotropic spherical gaussian mixture, that can express complex irradiance fields with a small number of parameters. Our framework learns the distribution in a progressive manner and does not need any warm-up phases. Due to the compact and expressive representation of our density model, our framework can be implemented entirely on the GPU, allowing it produce high quality images with limited computational resources. △ Less

Submitted 27 February, 2024; v1 submitted 11 March, 2023; originally announced March 2023.

ACM Class: I.3

arXiv:2211.08422 [pdf, other]

Mechanistic Mode Connectivity

Authors: Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

Abstract: We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of… ▽ More We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss. Specifically, we ask the following question: are minimizers that rely on different mechanisms for making their predictions connected via simple paths of low loss? We provide a definition of mechanistic similarity as shared invariances to input transformations and demonstrate that lack of linear connectivity between two models implies they use dissimilar mechanisms for making their predictions. Relevant to practice, this result helps us demonstrate that naive fine-tuning on a downstream dataset can fail to alter a model's mechanisms, e.g., fine-tuning can fail to eliminate a model's reliance on spurious attributes. Our analysis also motivates a method for targeted alteration of a model's mechanisms, named connectivity-based fine-tuning (CBFT), which we analyze using several synthetic datasets for the task of reducing a model's reliance on spurious attributes. △ Less

Submitted 1 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: ICML, 2023

arXiv:2210.00638 [pdf, other]

What shapes the loss landscape of self-supervised learning?

Authors: Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

Abstract: Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL). However, questions remain in our theoretical understanding: When do those collapses occur? What are the mechanisms and causes? We answer these questions by deriving and thoroughly analyzing an analytically tractable theory of SSL loss landscapes. In this the… ▽ More Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL). However, questions remain in our theoretical understanding: When do those collapses occur? What are the mechanisms and causes? We answer these questions by deriving and thoroughly analyzing an analytically tractable theory of SSL loss landscapes. In this theory, we identify the causes of the dimensional collapse and study the effect of normalization and bias. Finally, we leverage the interpretability afforded by the analytical theory to understand how dimensional collapse can be beneficial and what affects the robustness of SSL against data imbalance. △ Less

Submitted 11 March, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

Comments: Published at ICLR 2023

arXiv:2207.00158 [pdf, other]

doi 10.1109/ACCESS.2022.3203997

Experimental Demonstration of Delay-Bounded Wireless Network Based on Precise Time Synchronization

Authors: Haruaki Tanaka, Yusuke Yamasaki, Satoshi Yasuda, Nobuyasu Shiga, Kenichi Takizawa, Nicolas Chauvet, Ryoichi Horisaki, Makoto Naruse

Abstract: Low latency and reliable information transfer are highly demanded in fifth generation (5G) and beyond 5G wireless communications. A novel delay-bounded wireless media access control (MAC) protocol called Carrier Sense Multiple Access with Arbitration Point (CSMA/AP) was established to strictly ensure the upper boundary of communication delay. CSMA/AP enables collision-free and delay-bounded commun… ▽ More Low latency and reliable information transfer are highly demanded in fifth generation (5G) and beyond 5G wireless communications. A novel delay-bounded wireless media access control (MAC) protocol called Carrier Sense Multiple Access with Arbitration Point (CSMA/AP) was established to strictly ensure the upper boundary of communication delay. CSMA/AP enables collision-free and delay-bounded communications with a simple arbitration mechanism exploiting the precise time synchronization achieved by Wireless Two-Way Interferometry (Wi-Wi). Experimental demonstration and proving the feasibility in wireless environments are among the most critical steps before any further discussion of CSMA/AP and extension to various applications can take in place. In this work described in this paper, we experimentally demonstrated the fundamental principles of CSMA/AP by constructing a star-topology wireless network using software-defined radio terminals combined with precise time synchronization devices. We show that CSMA/AP was successfully operated, even with dynamic changes of the spatial position of the terminal or the capability to accommodate mobility, thanks to the real-time adaption to the dynamically changing environment by Wi-Wi. We also experimentally confirmed that the proposed CSMA/AP principle cannot be executed without Wi-Wi, which validates the importance of precise time synchronization. This study paves the way toward realizing delay-bounded wireless communications for future low-latency and highly reliable critical applications. △ Less

Submitted 10 November, 2022; v1 submitted 20 June, 2022; originally announced July 2022.

Comments: 13 pages, 12 figures, LaTeX; revised for journal submission, results not changed

Journal ref: IEEE Access 10 (2022) 94285 - 94297

arXiv:2206.00635 [pdf, other]

doi 10.1109/ICASSP.2019.8682414

Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition

Authors: Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura

Abstract: Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes. To fuel further EEG research with speech production, a method using three-mode tensor decomposition (time x space x freq… ▽ More Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes. To fuel further EEG research with speech production, a method using three-mode tensor decomposition (time x space x frequency) is proposed to perform speech artifact removal. Tensor decomposition enables simultaneous inspection of multiple modes, which suits the multi-way nature of EEG data. In a picture-naming task, we collected raw data with speech artifacts by placing two electrodes near the mouth to record lip EMG. Based on our evaluation, which calculated the correlation values between grand-averaged speech artifacts and the lip EMG, tensor decomposition outperformed the former methods that were based on independent component analysis (ICA) and blind source separation (BSS), both in detecting speech artifact (0.985) and producing clean data (0.101). Our proposed method correctly preserved the components unrelated to speech, which was validated by computing the correlation value between the grand-averaged raw data without EOG and cleaned data before the speech onset (0.92-0.94). △ Less

Submitted 1 June, 2022; originally announced June 2022.

Journal ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2205.03549 [pdf]

doi 10.1021/acsphotonics.2c00572

Deep Learning-enabled Detection and Classification of Bacterial Colonies using a Thin Film Transistor (TFT) Image Sensor

Authors: Yuzhu Li, Tairan Liu, Hatice Ceylan Koydemir, Hongda Wang, Keelan O'Riordan, Bijie Bai, Yuta Haga, Junji Kobashi, Hitoshi Tanaka, Takaya Tamaru, Kazunori Yamaguchi, Aydogan Ozcan

Abstract: Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that s… ▽ More Early detection and identification of pathogenic bacteria such as Escherichia coli (E. coli) is an essential task for public health. The conventional culture-based methods for bacterial colony detection usually take >24 hours to get the final read-out. Here, we demonstrate a bacterial colony-forming-unit (CFU) detection system exploiting a thin-film-transistor (TFT)-based image sensor array that saves ~12 hours compared to the Environmental Protection Agency (EPA)-approved methods. To demonstrate the efficacy of this CFU detection system, a lensfree imaging modality was built using the TFT image sensor with a sample field-of-view of ~10 cm^2. Time-lapse images of bacterial colonies cultured on chromogenic agar plates were automatically collected at 5-minute intervals. Two deep neural networks were used to detect and count the growing colonies and identify their species. When blindly tested with 265 colonies of E. coli and other coliform bacteria (i.e., Citrobacter and Klebsiella pneumoniae), our system reached an average CFU detection rate of 97.3% at 9 hours of incubation and an average recovery rate of 91.6% at ~12 hours. This TFT-based sensor can be applied to various microbiological detection methods. Due to the large scalability, ultra-large field-of-view, and low cost of the TFT-based image sensors, this platform can be integrated with each agar plate to be tested and disposed of after the automated CFU count. The imaging field-of-view of this platform can be cost-effectively increased to >100 cm^2 to provide a massive throughput for CFU detection using, e.g., roll-to-roll manufacturing of TFTs as used in the flexible display industry. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 18 Pages, 6 Figures

Journal ref: ACS Photonics (2022)

arXiv:2203.14384 [pdf, ps, other]

Multimarked Spatial Search by Continuous-Time Quantum Walk

Authors: Pedro H. G. Lugão, Renato Portugal, Mohamed Sabri, Hajime Tanaka

Abstract: The quantum-walk-based spatial search problem aims to find a marked vertex using a quantum walk on a graph with marked vertices. We describe a framework for determining the computational complexity of spatial search by continuous-time quantum walk on arbitrary graphs by providing a recipe for finding the optimal running time and the success probability of the algorithm. The quantum walk is driven… ▽ More The quantum-walk-based spatial search problem aims to find a marked vertex using a quantum walk on a graph with marked vertices. We describe a framework for determining the computational complexity of spatial search by continuous-time quantum walk on arbitrary graphs by providing a recipe for finding the optimal running time and the success probability of the algorithm. The quantum walk is driven by a Hamiltonian derived from the adjacency matrix of the graph modified by the presence of the marked vertices. The success of our framework depends on the knowledge of the eigenvalues and eigenvectors of the adjacency matrix. The spectrum of the Hamiltonian is subsequently obtained from the roots of the determinant of a real symmetric matrix $M$, the dimensions of which depend on the number of marked vertices. The eigenvectors are determined from a basis of the kernel of $M$. We show each step of the framework by solving the spatial searching problem on the Johnson graphs with a fixed diameter and with two marked vertices. Our calculations show that the optimal running time is $O(\sqrt{N})$ with an asymptotic probability of $1+o(1)$, where $N$ is the number of vertices. △ Less

Submitted 17 January, 2024; v1 submitted 27 March, 2022; originally announced March 2022.

Comments: 23 pages

arXiv:2203.03119 [pdf, other]

Fabchain: Managing Audit-able 3D Print Job over Blockchain

Authors: Ryosuke Abe, Shigeya Suzuki, Kenji Saito, Hiroya Tanaka, Osamu Nakamura, Jun Murai

Abstract: Improvements in fabrication devices such as 3D printers are becoming possible for personal fabrication to freely fabricate any products. To clarify who is liable for the product, the fabricator should keep the fabrication history in an immutable and sustainably accessible manner. In this paper, we propose a new scheme, "Fabchain," that can record the fabrication history in such a manner. By utiliz… ▽ More Improvements in fabrication devices such as 3D printers are becoming possible for personal fabrication to freely fabricate any products. To clarify who is liable for the product, the fabricator should keep the fabrication history in an immutable and sustainably accessible manner. In this paper, we propose a new scheme, "Fabchain," that can record the fabrication history in such a manner. By utilizing a scheme that employs a blockchain as an audit-able communication channel, Fabchain manages print jobs for the fabricator's 3D printer over the blockchain, while maintaining a history of a print job. We implemented Fabchain on Ethereum and evaluated the performance for recording a print job. Our results demonstrate that Fabchain can complete communication of a print job sequence in less than 1 minute on the Ethereum test network. We conclude that Fabchain can manage a print job in a reasonable duration for 3D printing, while satisfying the requirements for immutability and sustainability. △ Less

Submitted 6 March, 2022; originally announced March 2022.

arXiv:2201.02496 [pdf, other]

Tower-Complete Problems in Contraction-Free Substructural Logics

Authors: Hiromi Tanaka

Abstract: We investigate the non-elementary computational complexity of a family of substructural logics without contraction. With the aid of the technique pioneered by Lazić and Schmitz (2015), we show that the deducibility problem for full Lambek calculus with exchange and weakening ($\mathbf{FL}_{\mathbf{ew}}$) is not in Elementary (i.e., the class of decision problems that can be decided in time bounded… ▽ More We investigate the non-elementary computational complexity of a family of substructural logics without contraction. With the aid of the technique pioneered by Lazić and Schmitz (2015), we show that the deducibility problem for full Lambek calculus with exchange and weakening ($\mathbf{FL}_{\mathbf{ew}}$) is not in Elementary (i.e., the class of decision problems that can be decided in time bounded by an elementary recursive function), but is in PR (i.e., the class of decision problems that can be decided in time bounded by a primitive recursive function). More precisely, we show that this problem is complete for Tower, which is a non-elementary complexity class forming a part of the fast-growing complexity hierarchy introduced by Schmitz (2016). The same complexity result holds even for deducibility in BCK-logic, i.e., the implicational fragment of $\mathbf{FL}_{\mathbf{ew}}$. We furthermore show the Tower-completeness of the provability problem for elementary affine logic, which was proved to be decidable by Dal Lago and Martini (2004). △ Less

Submitted 20 November, 2022; v1 submitted 7 January, 2022; originally announced January 2022.

Comments: The full version of the paper accepted to CSL 2023

arXiv:2112.03744 [pdf, ps, other]

doi 10.1088/1751-8121/ac6f30

Spatial Search on Johnson Graphs by Discrete-Time Quantum Walk

Authors: Hajime Tanaka, Mohamed Sabri, Renato Portugal

Abstract: The spatial search problem aims to find a marked vertex of a finite graph using a dynamic with two constraints: (1) The walker has no compass and (2) the walker can check whether a vertex is marked only after reaching it. This problem is a generalization of unsorted database search and has many applications to algorithms. Classical algorithms that solve the spatial search problem are based on rand… ▽ More The spatial search problem aims to find a marked vertex of a finite graph using a dynamic with two constraints: (1) The walker has no compass and (2) the walker can check whether a vertex is marked only after reaching it. This problem is a generalization of unsorted database search and has many applications to algorithms. Classical algorithms that solve the spatial search problem are based on random walks and the computational complexity is determined by the hitting time. On the other hand, quantum algorithms are based on quantum walks and the computational complexity is determined not only by the number of steps to reach a marked vertex, but also by the success probability, since we need to perform a measurement at the end of the algorithm to determine the walker's position. In this work, we address the spatial search problem on Johnson graphs using the coined quantum walk model. Since Johnson graphs are vertex- and distance-transitive, we have found an invariant subspace of the Hilbert space, which aids in the calculation of the computational complexity. We have shown that, for every fixed diameter, the asymptotic success probability is $1/2$ after taking $π\sqrt N/(2\sqrt 2)$ steps, where $N$ is the number of vertices of the Johnson graph. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Comments: 15 pages

Journal ref: Journal of Physics A: Mathematical and Theoretical, Vol.55, 255304, 2022

arXiv:2108.01992 [pdf, ps, other]

doi 10.1007/s11128-022-03417-9

Spatial Search on Johnson Graphs by Continuous-Time Quantum Walk

Authors: Hajime Tanaka, Mohamed Sabri, Renato Portugal

Abstract: Spatial search on graphs is one of the most important algorithmic applications of quantum walks. To show that a quantum-walk-based search is more efficient than a random-walk-based search is a difficult problem, which has been addressed in several ways. Usually, graph symmetries aid in the calculation of the algorithm's computational complexity, and Johnson graphs are an interesting class regardin… ▽ More Spatial search on graphs is one of the most important algorithmic applications of quantum walks. To show that a quantum-walk-based search is more efficient than a random-walk-based search is a difficult problem, which has been addressed in several ways. Usually, graph symmetries aid in the calculation of the algorithm's computational complexity, and Johnson graphs are an interesting class regarding symmetries because they are regular, Hamilton-connected, vertex- and distance-transitive. In this work, we show that spatial search on Johnson graphs by continuous-time quantum walk achieves the Grover lower bound $π\sqrt{N}/2$ with success probability $1$ asymptotically for every fixed diameter, where $N$ is the number of vertices. The proof is mathematically rigorous and can be used for other graph classes. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: 12 pages

Journal ref: Quantum Inf Process 21, 74 (2022)

arXiv:2107.09133 [pdf, other]

doi 10.1162/neco_a_01626

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

Authors: Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

Abstract: In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate… ▽ More In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD). As observed previously, long after performance has converged, networks continue to move through parameter space by a process of anomalous diffusion in which distance travelled grows as a power law in the number of gradient updates with a nontrivial exponent. We reveal an intricate interaction between the hyperparameters of optimization, the structure in the gradient noise, and the Hessian matrix at the end of training that explains this anomalous diffusion. To build this understanding, we first derive a continuous-time model for SGD with finite learning rates and batch sizes as an underdamped Langevin equation. We study this equation in the setting of linear regression, where we can derive exact, analytic expressions for the phase space dynamics of the parameters and their instantaneous velocities from initialization to stationarity. Using the Fokker-Planck equation, we show that the key ingredient driving these dynamics is not the original training loss, but rather the combination of a modified loss, which implicitly regularizes the velocity, and probability currents, which cause oscillations in phase space. We identify qualitative and quantitative predictions of this theory in the dynamics of a ResNet-18 model trained on ImageNet. Through the lens of statistical physics, we uncover a mechanistic origin for the anomalous limiting dynamics of deep neural networks trained with SGD. △ Less

Submitted 28 December, 2023; v1 submitted 19 July, 2021; originally announced July 2021.

Comments: 78 pages, 9 figures, Neural Computation 2024

Journal ref: Neural Computation (2024) 36 (1) 151-174

arXiv:2106.05956 [pdf, other]

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

Authors: Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

Abstract: Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first… ▽ More Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning. Recent works have identified a multitude of beneficial properties in BatchNorm to explain its success. However, given the pursuit of alternative normalization layers, these properties need to be generalized so that any given layer's success/failure can be accurately predicted. In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to several recently proposed normalization layers. Our primary findings follow: (i) similar to BatchNorm, activations-based normalization layers can prevent exponential growth of activations in ResNets, but parametric techniques require explicit remedies; (ii) use of GroupNorm can ensure an informative forward propagation, with different samples being assigned dissimilar activations, but increasing group size results in increasingly indistinguishable activations for different samples, explaining slow convergence speed in models with LayerNorm; and (iii) small group sizes result in large gradient norm in earlier layers, hence explaining training instability issues in Instance Normalization and illustrating a speed-stability tradeoff in GroupNorm. Overall, our analysis reveals a unified set of mechanisms that underpin the success of normalization methods in deep learning, providing us with a compass to systematically explore the vast design space of DNN normalization layers. △ Less

Submitted 26 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2105.02716 [pdf, other]

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Authors: Hidenori Tanaka, Daniel Kunin

Abstract: In nature, symmetry governs regularities, while symmetry breaking brings texture. In artificial neural networks, symmetry has been a central design principle to efficiently capture regularities in the world, but the role of symmetry breaking is not well understood. Here, we develop a theoretical framework to study the "geometry of learning dynamics" in neural networks, and reveal a key mechanism o… ▽ More In nature, symmetry governs regularities, while symmetry breaking brings texture. In artificial neural networks, symmetry has been a central design principle to efficiently capture regularities in the world, but the role of symmetry breaking is not well understood. Here, we develop a theoretical framework to study the "geometry of learning dynamics" in neural networks, and reveal a key mechanism of explicit symmetry breaking behind the efficiency and stability of modern neural networks. To build this understanding, we model the discrete learning dynamics of gradient descent using a continuous-time Lagrangian formulation, in which the learning rule corresponds to the kinetic energy and the loss function corresponds to the potential energy. Then, we identify "kinetic symmetry breaking" (KSB), the condition when the kinetic energy explicitly breaks the symmetry of the potential function. We generalize Noether's theorem known in physics to take into account KSB and derive the resulting motion of the Noether charge: "Noether's Learning Dynamics" (NLD). Finally, we apply NLD to neural networks with normalization layers and reveal how KSB introduces a mechanism of "implicit adaptive optimization", establishing an analogy between learning dynamics induced by normalization layers and RMSProp. Overall, through the lens of Lagrangian mechanics, we have established a theoretical foundation to discover geometric design principles for the learning dynamics of neural networks. △ Less

Submitted 2 November, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Journal ref: NeurIPS (Advances in Neural Information Processing Systems), 2021

arXiv:2012.04728 [pdf, other]

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Authors: Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

Abstract: Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoreti… ▽ More Understanding the dynamics of neural network parameters during training is one of the key challenges in building a theoretical foundation for deep learning. A central obstacle is that the motion of a network in high-dimensional parameter space undergoes discrete finite steps along complex stochastic gradients derived from real-world datasets. We circumvent this obstacle through a unifying theoretical framework based on intrinsic symmetries embedded in a network's architecture that are present for any dataset. We show that any such symmetry imposes stringent geometric constraints on gradients and Hessians, leading to an associated conservation law in the continuous-time limit of stochastic gradient descent (SGD), akin to Noether's theorem in physics. We further show that finite learning rates used in practice can actually break these symmetry induced conservation laws. We apply tools from finite difference methods to derive modified gradient flow, a differential equation that better approximates the numerical trajectory taken by SGD at finite learning rates. We combine modified gradient flow with our framework of symmetries to derive exact integral expressions for the dynamics of certain parameter combinations. We empirically validate our analytic expressions for learning dynamics on VGG-16 trained on Tiny ImageNet. Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset. △ Less

Submitted 29 March, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: 30 pages, 17 figures, ICLR 2021

arXiv:2006.05467 [pdf, other]

Pruning neural networks without any data by iteratively conserving synaptic flow

Authors: Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we ide… ▽ More Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important. △ Less

Submitted 18 November, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

Comments: NeurIPS 2020, 18 pages, 10 figures

Journal ref: Advances in Neural Information Processing Systems 2020

arXiv:2003.01501 [pdf, other]

Decision Problems for Propositional Non-associative Linear Logic and Extensions

Authors: Hiromi Tanaka

Abstract: In our previous work, we proposed the logic obtained from full non-associative Lambek calculus by adding a sort of linear-logical modality. We call this logic non-associative non-commutative intuitionistic linear logic ($\mathbf{NACILL}$, for short). In this paper, we establish the decidability and undecidability results for various extensions of $\mathbf{NACILL}$. Regarding the decidability resul… ▽ More In our previous work, we proposed the logic obtained from full non-associative Lambek calculus by adding a sort of linear-logical modality. We call this logic non-associative non-commutative intuitionistic linear logic ($\mathbf{NACILL}$, for short). In this paper, we establish the decidability and undecidability results for various extensions of $\mathbf{NACILL}$. Regarding the decidability results, we show that the deducibility problems for several extensions of $\mathbf{NACILL}$ with the rule of left-weakening are decidable. Regarding the undecidability results, we show that the provability problems for all the extensions of non-associative non-commutative classical linear logic by the rules of contraction and exchange are undecidable. △ Less

Submitted 3 March, 2020; originally announced March 2020.

arXiv:1912.06207 [pdf, other]

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

Authors: Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Abstract: Recently, deep feedforward neural networks have achieved considerable success in modeling biological sensory processing, in terms of reproducing the input-output map of sensory neurons. However, such models raise profound questions about the very nature of explanation in neuroscience. Are we simply replacing one complex system (a biological circuit) with another (a deep network), without understan… ▽ More Recently, deep feedforward neural networks have achieved considerable success in modeling biological sensory processing, in terms of reproducing the input-output map of sensory neurons. However, such models raise profound questions about the very nature of explanation in neuroscience. Are we simply replacing one complex system (a biological circuit) with another (a deep network), without understanding either? Moreover, beyond neural representations, are the deep network's computational mechanisms for generating neural responses the same as those in the brain? Without a systematic approach to extracting and understanding computational mechanisms from deep neural network models, it can be difficult both to assess the degree of utility of deep learning approaches in neuroscience, and to extract experimentally testable hypotheses from deep networks. We develop such a systematic approach by combining dimensionality reduction and modern attribution methods for determining the relative importance of interneurons for specific visual computations. We apply this approach to deep network models of the retina, revealing a conceptual understanding of how the retina acts as a predictive feature extractor that signals deviations from expectations for diverse spatiotemporal stimuli. For each stimulus, our extracted computational mechanisms are consistent with prior scientific literature, and in one case yields a new mechanistic hypothesis. Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Journal ref: Neural Information Processing Systems (NeurIPS), 2019

arXiv:1909.13444 [pdf, ps, other]

A note on undecidability of propositional non-associative linear logics

Authors: Hiromi Tanaka

Abstract: We introduce a non-associative and non-commutative version of propositional intuitionistic linear logic, called propositional non-associative non-commutative intuitionistic linear logic (NACILL for short). We prove that NACILL and any of its extensions by the rules of exchange and/or contraction are undecidable. Furthermore, we introduce two types of classical versions of NACILL, i.e., an involuti… ▽ More We introduce a non-associative and non-commutative version of propositional intuitionistic linear logic, called propositional non-associative non-commutative intuitionistic linear logic (NACILL for short). We prove that NACILL and any of its extensions by the rules of exchange and/or contraction are undecidable. Furthermore, we introduce two types of classical versions of NACILL, i.e., an involutive version of NACILL and a cyclic and involutive version of NACILL. We show that both of these logics are also undecidable. △ Less

Submitted 29 September, 2019; originally announced September 2019.

MSC Class: 03F52

arXiv:1907.04446 [pdf, other]

Let's Keep It Safe: Designing User Interfaces that Allow Everyone to Contribute to AI Safety

Authors: Travis Mandel, Jahnu Best, Randall H. Tanaka, Hiram Temple, Chansen Haili, Kayla Schlectinger, Roy Szeto

Abstract: When AI systems are granted the agency to take impactful actions in the real world, there is an inherent risk that these systems behave in ways that are harmful. Typically, humans specify constraints on the AI system to prevent harmful behavior; however, very little work has studied how best to facilitate this difficult constraint specification process. In this paper, we study how to design user i… ▽ More When AI systems are granted the agency to take impactful actions in the real world, there is an inherent risk that these systems behave in ways that are harmful. Typically, humans specify constraints on the AI system to prevent harmful behavior; however, very little work has studied how best to facilitate this difficult constraint specification process. In this paper, we study how to design user interfaces that make this process more effective and accessible, allowing people with a diversity of backgrounds and levels of expertise to contribute to this task. We first present a task design in which workers evaluate the safety of individual state-action pairs, and propose several variants of this task with improved task design and filtering mechanisms. Although this first design is easy to understand, it scales poorly to large state spaces. Therefore, we develop a new user interface that allows workers to write constraint rules without any programming. Despite its simplicity, we show that our rule construction interface retains full expressiveness. We present experiments utilizing crowdworkers to help address an important real-world AI safety problem in the domain of education. Our results indicate that our novel worker filtering and explanation methods outperform baseline approaches, and our rule-based interface allows workers to be much more efficient while improving data quality. △ Less

Submitted 7 November, 2022; v1 submitted 9 July, 2019; originally announced July 2019.

Comments: The full journal version of this article (published in Proceedings of the ACM on Human-Computer Interaction 4, CSCW2) can be found at https://dl.acm.org/doi/10.1145/3415168. The article is public access

arXiv:1906.06037 [pdf, other]

What is Stablecoin?: A Survey on Its Mechanism and Potential as Decentralized Payment Systems

Authors: Makiko Mita, Kensuke Ito, Shohei Ohsawa, Hideyuki Tanaka

Abstract: Our study provides a survey on how existing stablecoins-- cryptocurrencies aiming at price stabilization-- peg their value to other assets, from the perspective of Decentralized Payment Systems (DPSs). This attempt is important because there has been no preceding surveys focusing on the stablecoin as DPSs, i.e., the one aiming at not only price stabilization but also decentralization. Specifically… ▽ More Our study provides a survey on how existing stablecoins-- cryptocurrencies aiming at price stabilization-- peg their value to other assets, from the perspective of Decentralized Payment Systems (DPSs). This attempt is important because there has been no preceding surveys focusing on the stablecoin as DPSs, i.e., the one aiming at not only price stabilization but also decentralization. Specifically, we first classified existing stablecoins into four types according to their collaterals (fiat, commodity, crypto, and non-collateralized) and pointed out the high potential of non-collateralized stablecoins as DPSs; then, we further classified existing non-collateralized stablecoins into two types according to their intervention layers (protocol, application) and confirmed details of their representative mechanisms. Utilizing concepts such as Quantity Theory of Money (QTM), Tobin tax, and speculative attack, our survey revealed the status quo where, despite the high potential of non-collateralized stablecoins, they have no standard mechanism to achieve the stablecoin for practical DPSs. △ Less

Submitted 22 June, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

Comments: 15 pages, IIAI AAI 2019 accepted

arXiv:1906.03300 [pdf, other]

doi 10.5195/ledger.2019.182

Token-Curated Registry with Citation Graph

Authors: Kensuke Ito, Hideyuki Tanaka

Abstract: In this study, we aim to incorporate the expertise of anonymous curators into a token-curated registry (TCR), a decentralized recommender system for collecting a list of high-quality content. This registry is important, because previous studies on TCRs have not specifically focused on technical content, such as academic papers and patents, whose effective curation requires expertise in relevant fi… ▽ More In this study, we aim to incorporate the expertise of anonymous curators into a token-curated registry (TCR), a decentralized recommender system for collecting a list of high-quality content. This registry is important, because previous studies on TCRs have not specifically focused on technical content, such as academic papers and patents, whose effective curation requires expertise in relevant fields. To measure expertise, curation in our model focuses on both the content and its citation relationships, for which curator assignment uses the Personalized PageRank (PPR) algorithm while reward computation uses a multi-task peer-prediction mechanism. Our proposed CitedTCR bridges the literature on network-based and token-based recommender systems and contributes to the autonomous development of an evolving citation graph for high-quality content. Moreover, we experimentally confirm the incentive for registration and curation in CitedTCR using the simplification of a one-to-one correspondence between users and content (nodes). △ Less

Submitted 5 June, 2019; originally announced June 2019.

Comments: 16 pages, 5 figures

Journal ref: LEDGER, Vol.4, (2019) 191-209

arXiv:1904.09135 [pdf, other]

Data Augmentation Using GANs

Authors: Fabio Henrique Kiyoiti dos Santos Tanaka, Claus Aranha

Abstract: In this paper we propose the use of Generative Adversarial Networks (GAN) to generate artificial training data for machine learning tasks. The generation of artificial training data can be extremely useful in situations such as imbalanced data sets, performing a role similar to SMOTE or ADASYN. It is also useful when the data contains sensitive information, and it is desirable to avoid using the o… ▽ More In this paper we propose the use of Generative Adversarial Networks (GAN) to generate artificial training data for machine learning tasks. The generation of artificial training data can be extremely useful in situations such as imbalanced data sets, performing a role similar to SMOTE or ADASYN. It is also useful when the data contains sensitive information, and it is desirable to avoid using the original data set as much as possible (example: medical data). We test our proposal on benchmark data sets using different network architectures, and show that a Decision Tree (DT) classifier trained using the training data generated by the GAN reached the same, (and surprisingly sometimes better), accuracy and recall than a DT trained on the original data set. △ Less

Submitted 19 April, 2019; originally announced April 2019.

Comments: Submitted for ACML 2019

arXiv:1801.07004 [pdf]

Public Sentiment and Demand for Used Cars after A Large-Scale Disaster: Social Media Sentiment Analysis with Facebook Pages

Authors: Yuya Shibuya, Hideyuki Tanaka

Abstract: There have been various studies analyzing public sentiment after a large-scale disaster. However, few studies have focused on the relationship between public sentiment on social media and its results on people's activities in the real world. In this paper, we conduct a long-term sentiment analysis after the Great East Japan Earthquake and Tsunami of 2011 using Facebook Pages with the aim of invest… ▽ More There have been various studies analyzing public sentiment after a large-scale disaster. However, few studies have focused on the relationship between public sentiment on social media and its results on people's activities in the real world. In this paper, we conduct a long-term sentiment analysis after the Great East Japan Earthquake and Tsunami of 2011 using Facebook Pages with the aim of investigating the correlation between public sentiment and people's actual needs in areas damaged by water disasters. In addition, we try to analyze whether different types of disaster-related communication created different kinds of relationships on people's activities in the physical world. Our analysis reveals that sentiment of geo-info-related communication, which might be affected by sentiment inside a damaged area, had a positive correlation with the prices of used cars in the damaged area. On the other hand, the sentiment of disaster-interest-based-communication, which might be affected more by people who were interested in the disaster, but were outside the damaged area, had a negative correlation with the prices of used cars. The result could be interpreted to mean that when people begin to recover, used-car prices rise because they become more positive in their sentiment. This study suggests that, for long-term disaster-recovery analysis, we need to consider the different characteristics of online communication posted by locals directly affected by the disaster and non-locals not directly affected by the disaster. △ Less

Submitted 22 January, 2018; originally announced January 2018.

arXiv:1801.01449 [pdf, other]

3D Surface-to-Structure Translation using Deep Convolutional Networks

Authors: Takumi Moriya, Kazuyuki Saito, Hiroya Tanaka

Abstract: Our demonstration shows a system that estimates internal body structures from 3D surface models using deep convolutional neural networks trained on CT (computed tomography) images of the human body. To take pictures of structures inside the body, we need to use a CT scanner or an MRI (Magnetic Resonance Imaging) scanner. However, assuming that the mutual information between outer shape of the body… ▽ More Our demonstration shows a system that estimates internal body structures from 3D surface models using deep convolutional neural networks trained on CT (computed tomography) images of the human body. To take pictures of structures inside the body, we need to use a CT scanner or an MRI (Magnetic Resonance Imaging) scanner. However, assuming that the mutual information between outer shape of the body and its inner structure is not zero, we can obtain an approximate internal structure from a 3D surface model based on MRI and CT image database. This suggests that we could know where and what kind of disease a person is likely to have in his/her body simply by 3D scanning surface of the body. As a first prototype, we developed a system for estimating internal body structures from surface models based on Visible Human Project DICOM CT Datasets from the University of Iowa Magnetic Resonance Research Facility. △ Less

Submitted 8 December, 2017; originally announced January 2018.

Comments: 2 pages, 3 figures

arXiv:1706.04302 [pdf, ps, other]

Network Simplex Algorithm associated with the Maximum Flow Problem

Authors: Sennosuke Watanabe, Hodaka Tanaka, Yoshihide Watanabe

Abstract: In the present paper, we apply the network simplex algorithm for solving the minimum cost flow problem, to the maximum flow problem. Then we prove that the cycling phenomenon which causes the infinite loop in the algorithm, does not occur in the network simplex algorithm associated with the maximum flow problem. In the present paper, we apply the network simplex algorithm for solving the minimum cost flow problem, to the maximum flow problem. Then we prove that the cycling phenomenon which causes the infinite loop in the algorithm, does not occur in the network simplex algorithm associated with the maximum flow problem. △ Less

Submitted 13 June, 2017; originally announced June 2017.

Comments: 13pages

arXiv:0911.1826 [pdf, other]

doi 10.46298/dmtcs.2150

Arithmetic completely regular codes

Authors: J. H. Koolen, W. S. Lee, W. J. Martin, H. Tanaka

Abstract: In this paper, we explore completely regular codes in the Hamming graphs and related graphs. Experimental evidence suggests that many completely regular codes have the property that the eigenvalues of the code are in arithmetic progression. In order to better understand these "arithmetic completely regular codes", we focus on cartesian products of completely regular codes and products of their cor… ▽ More In this paper, we explore completely regular codes in the Hamming graphs and related graphs. Experimental evidence suggests that many completely regular codes have the property that the eigenvalues of the code are in arithmetic progression. In order to better understand these "arithmetic completely regular codes", we focus on cartesian products of completely regular codes and products of their corresponding coset graphs in the additive case. Employing earlier results, we are then able to prove a theorem which nearly classifies these codes in the case where the graph admits a completely regular partition into such codes (e.g, the cosets of some additive completely regular code). Connections to the theory of distance-regular graphs are explored and several open questions are posed. △ Less

Submitted 10 February, 2016; v1 submitted 9 November, 2009; originally announced November 2009.

Comments: 26 pages, 1 figure

MSC Class: 05E30

Journal ref: Discrete Math. Theor. Comput. Sci. 17 (2016) 59-76

arXiv:0905.4325 [pdf, other]

Updating Quantum Cryptography Report ver. 1

Authors: Donna Dodson, Mikio Fujiwara, Philippe Grangier, Masahito Hayashi, Kentaro Imafuku, Ken-ichi Kitayama, Prem Kumar, Christian Kurtsiefer, Gaby Lenhart, Norbert Luetkenhaus, Tsutomu Matsumoto, William J. Munro, Tsuyoshi Nishioka, Momtchil Peev, Masahide Sasaki, Yutaka Sata, Atsushi Takada, Masahiro Takeoka, Kiyoshi Tamaki, Hidema Tanaka, Yasuhiro Tokura, Akihisa Tomita, Morio Toyoshima, Rodney van Meter, Atsuhiro Yamagishi , et al. (2 additional authors not shown)

Abstract: Quantum cryptographic technology (QCT) is expected to be a fundamental technology for realizing long-term information security even against as-yet-unknown future technologies. More advanced security could be achieved using QCT together with contemporary cryptographic technologies. To develop and spread the use of QCT, it is necessary to standardize devices, protocols, and security requirements a… ▽ More Quantum cryptographic technology (QCT) is expected to be a fundamental technology for realizing long-term information security even against as-yet-unknown future technologies. More advanced security could be achieved using QCT together with contemporary cryptographic technologies. To develop and spread the use of QCT, it is necessary to standardize devices, protocols, and security requirements and thus enable interoperability in a multi-vendor, multi-network, and multi-service environment. This report is a technical summary of QCT and related topics from the viewpoints of 1) consensual establishment of specifications and requirements of QCT for standardization and commercialization and 2) the promotion of research and design to realize New-Generation Quantum Cryptography. △ Less

Submitted 27 May, 2009; originally announced May 2009.

Comments: 74 pages

arXiv:0710.3185 [pdf, other]

Fuzzy Modeling of Electrical Impedance Tomography Image of the Lungs

Authors: Harki Tanaka, Neli Regina Siqueira Ortega, Mauricio Stanzione Galizia, Joao Batista Borges Sobrinho, Marcelo Britto Passos Amato

Abstract: Electrical Impedance Tomography (EIT) is a functional imaging method that is being developed for bedside use in critical care medicine. Aiming at improving the chest anatomical resolution of EIT images we developed a fuzzy model based on EIT high temporal resolution and the functional information contained in the pulmonary perfusion and ventilation signals. EIT data from an experimental animal m… ▽ More Electrical Impedance Tomography (EIT) is a functional imaging method that is being developed for bedside use in critical care medicine. Aiming at improving the chest anatomical resolution of EIT images we developed a fuzzy model based on EIT high temporal resolution and the functional information contained in the pulmonary perfusion and ventilation signals. EIT data from an experimental animal model were collected during normal ventilation and apnea while an injection of hypertonic saline was used as a reference . The fuzzy model was elaborated in three parts: a modeling of the heart, a pulmonary map from ventilation images and, a pulmonary map from perfusion images. Image segmentation was performed using a threshold method and a ventilation/perfusion map was generated. EIT images treated by the fuzzy model were compared with the hypertonic saline injection method and CT-scan images, presenting good results in both qualitative (the image obtained by the model was very similar to that of the CT-scan) and quantitative (the ROC curve provided an area equal to 0.93) point of view. Undoubtedly, these results represent an important step in the EIT images area, since they open the possibility of develo** EIT-based bedside clinical methods, which are not available nowadays. These achievements could serve as the base to develop EIT diagnosis system for some life-threatening diseases commonly found in critical care medicine. △ Less

Submitted 16 October, 2007; originally announced October 2007.

Comments: 10 pages, 6 figures

arXiv:cs/9910020 [pdf, ps, other]

Selective Sampling for Example-based Word Sense Disambiguation

Authors: Atsushi Fujii, Kentaro Inui, Takenobu Tokunaga, Hozumi Tanaka

Abstract: This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, o… ▽ More This paper proposes an efficient example sampling method for example-based word sense disambiguation systems. To construct a database of practical size, a considerable overhead for manual sense disambiguation (overhead for supervision) is required. In addition, the time complexity of searching a large-sized database poses a considerable problem (overhead for search). To counter these problems, our method selectively samples a smaller-sized effective subset from a given example set for use in word sense disambiguation. Our method is characterized by the reliance on the notion of training utility: the degree to which each example is informative for future example sampling when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectiveness of our method through experiments on about one thousand sentences. Compared to experiments with other example sampling methods, our method reduced both the overhead for supervision and the overhead for search, without the degeneration of the performance of the system. △ Less

Submitted 23 October, 1999; originally announced October 1999.

Comments: 25 pages, 14 Postscript figures

ACM Class: I.2.7

Journal ref: Computational Linguistics, Vol.24, No.4, pp.573-597, 1998

arXiv:cmp-lg/9702010 [pdf, ps, other]

Selective Sampling of Effective Example Sentence Sets for Word Sense Disambiguation

Authors: Atsushi Fujii, Kentaro Inui, Takenobu Tokunaga, Hozumi Tanaka

Abstract: This paper proposes an efficient example selection method for example-based word sense disambiguation systems. To construct a practical size database, a considerable overhead for manual sense disambiguation is required. Our method is characterized by the reliance on the notion of the training utility: the degree to which each example is informative for future example selection when used for the… ▽ More This paper proposes an efficient example selection method for example-based word sense disambiguation systems. To construct a practical size database, a considerable overhead for manual sense disambiguation is required. Our method is characterized by the reliance on the notion of the training utility: the degree to which each example is informative for future example selection when used for the training of the system. The system progressively collects examples by selecting those with greatest utility. The paper reports the effectivity of our method through experiments on about one thousand sentences. Compared to experiments with random example selection, our method reduced the overhead without the degeneration of the performance of the system. △ Less

Submitted 17 February, 1997; originally announced February 1997.

Comments: 14 pages, uses epsbox.sty

Journal ref: Proceedings of the Fourth Workshop on Very Large Corpora WVLC-4, pp. 56-69, 1996

Showing 1–44 of 44 results for author: Tanaka, H