-
Biomedical and Clinical English Model Packages in the Stanza Python NLP Library
Authors:
Yuhao Zhang,
Yuhui Zhang,
Peng Qi,
Christopher D. Manning,
Curtis P. Langlotz
Abstract:
We introduce biomedical and clinical English model packages for the Stanza Python NLP library. These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining Stanza's fully neural architecture with a wide variety of open datasets as well as large-scale unsupervised biomedical and clinical text data. We show via extensive ex…
▽ More
We introduce biomedical and clinical English model packages for the Stanza Python NLP library. These packages offer accurate syntactic analysis and named entity recognition capabilities for biomedical and clinical text, by combining Stanza's fully neural architecture with a wide variety of open datasets as well as large-scale unsupervised biomedical and clinical text data. We show via extensive experiments that our packages achieve syntactic analysis and named entity recognition performance that is on par with or surpasses state-of-the-art results. We further show that these models do not compromise speed compared to existing toolkits when GPU acceleration is available, and are made easy to download and use with Stanza's Python interface. A demonstration of our packages is available at: http://stanza.run/bio.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction
Authors:
Pi Qi,
Xiaoqiang Zhu,
Guorui Zhou,
Yu**g Zhang,
Zhe Wang,
Lejian Ren,
Ying Fan,
Kun Gai
Abstract:
Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by A…
▽ More
Rich user behavior data has been proven to be of great value for click-through rate prediction tasks, especially in industrial applications such as recommender systems and online advertising. Both industry and academy have paid much attention to this topic and propose different approaches to modeling with long sequential user behavior data. Among them, memory network based model MIMN proposed by Alibaba, achieves SOTA with the co-design of both learning algorithm and serving system. MIMN is the first industrial solution that can model sequential user behavior data with length scaling up to 1000. However, MIMN fails to precisely capture user interests given a specific candidate item when the length of user behavior sequence increases further, say, by 10 times or more. This challenge exists widely in previously proposed approaches. In this paper, we tackle this problem by designing a new modeling paradigm, which we name as Search-based Interest Model (SIM). SIM extracts user interests with two cascaded search units: (i) General Search Unit acts as a general search from the raw and arbitrary long sequential behavior data, with query information from candidate item, and gets a Sub user Behavior Sequence which is relevant to candidate item; (ii) Exact Search Unit models the precise relationship between candidate item and SBS. This cascaded search paradigm enables SIM with a better ability to model lifelong sequential behavior data in both scalability and accuracy. Apart from the learning algorithm, we also introduce our hands-on experience on how to implement SIM in large scale industrial systems. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1\% CTR and 4.4\% RPM lift, which is significant to the business. Serving the main traffic in our real system now, SIM models user behavior data with maximum length reaching up to 54000, pushing SOTA to 54x.
△ Less
Submitted 28 June, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Stay Hungry, Stay Focused: Generating Informative and Specific Questions in Information-Seeking Conversations
Authors:
Peng Qi,
Yuhao Zhang,
Christopher D. Manning
Abstract:
We investigate the problem of generating informative questions in information-asymmetric conversations. Unlike previous work on question generation which largely assumes knowledge of what the answer might be, we are interested in the scenario where the questioner is not given the context from which answers are drawn, but must reason pragmatically about how to acquire new information, given the sha…
▽ More
We investigate the problem of generating informative questions in information-asymmetric conversations. Unlike previous work on question generation which largely assumes knowledge of what the answer might be, we are interested in the scenario where the questioner is not given the context from which answers are drawn, but must reason pragmatically about how to acquire new information, given the shared conversation history. We identify two core challenges: (1) formally defining the informativeness of potential questions, and (2) exploring the prohibitively large space of potential questions to find the good candidates. To generate pragmatic questions, we use reinforcement learning to optimize an informativeness metric we propose, combined with a reward function designed to promote more specific questions. We demonstrate that the resulting pragmatic questioner substantially improves the informativeness and specificity of questions generated over a baseline model, as evaluated by our metrics as well as humans.
△ Less
Submitted 20 October, 2020; v1 submitted 29 April, 2020;
originally announced April 2020.
-
Spectral Domain Z-scan Technique
Authors:
Xi Zeng,
Pengfei Qi,
Pin Chen,
Lie Lin,
Weiwei Liu
Abstract:
Characterizing the nonlinear optical properties of various materials plays a prerequisite role in nonlinear optics. Among different methods, the well-known Z-scan technique and the modified versions have been recognized as a simple and accurate method for measuring both the real and imaginary parts of the nonlinear refractive index. However, all the Z-scan methods based on detecting small beam var…
▽ More
Characterizing the nonlinear optical properties of various materials plays a prerequisite role in nonlinear optics. Among different methods, the well-known Z-scan technique and the modified versions have been recognized as a simple and accurate method for measuring both the real and imaginary parts of the nonlinear refractive index. However, all the Z-scan methods based on detecting small beam variations put forward a severe restriction on the roughness of materials. Therefore, measuring nonlinear optical properties of highly scattering media still remain challenging. Inspired by the innovation of conventional Z-scan method that converting the wavefront phase shift to the easily measurable spatial pattern in far-field, the alternative spectral domain Z-scan technique was presented in this paper. It has a great potential for highly scattering medium, based on the scattering efficiency is insensitive to the wavelength for Mie scattering as the wavelengths are far smaller than the roughness. Moreover, to demonstrate the advantages of spectral domain Z-scan technique, the nonlinear refraction of polished slides and frosted slides was measured, which agrees well with previous reports.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
Focusing and Energy Dispersion Properties of a Cylindrically Bent Asymmetric Laue Crystal
Authors:
Peng Qi,
Xianbo Shi,
Nazanin Samadi,
Dean Chapman
Abstract:
Elastically bent single-crystal Laue case diffraction crystals provide interesting new opportunities for imaging and spectroscopy applications. The diffraction properties are well understood, however, the ability to easily model the diffracted beams hinders assessment of the focal, phase and energy dispersive properties needed for many applications. This work begins to collect the elements needed…
▽ More
Elastically bent single-crystal Laue case diffraction crystals provide interesting new opportunities for imaging and spectroscopy applications. The diffraction properties are well understood, however, the ability to easily model the diffracted beams hinders assessment of the focal, phase and energy dispersive properties needed for many applications. This work begins to collect the elements needed to ray trace diffracted beams within bent Laue crystals for the purpose of incorporation into other powerful ray tracing applications such as SHADOW. Specifically, we address the condition in a bent Laue crystal where a cylindrically bent Laue crystal will focus all the polychromatic diffracted beams at a single location when a specific asymmetry angle condition is met for a target x-ray energy - the so-called "magic condition". The focal size of the beam can be minimized, but this condition also results in excellent energy-dispersive properties. The conceptual and mathematical aspects of this interesting focusing and energy dispersive phenomenon is discussed.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Authors:
Peng Qi,
Yuhao Zhang,
Yuhui Zhang,
Jason Bolton,
Christopher D. Manning
Abstract:
We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We…
▽ More
We introduce Stanza, an open-source Python natural language processing toolkit supporting 66 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition. We have trained Stanza on a total of 112 datasets, including the Universal Dependencies treebanks and other multilingual corpora, and show that the same neural architecture generalizes well and achieves competitive performance on all languages tested. Additionally, Stanza includes a native Python interface to the widely used Java Stanford CoreNLP software, which further extends its functionality to cover other tasks such as coreference resolution and relation extraction. Source code, documentation, and pretrained models for 66 languages are available at https://stanfordnlp.github.io/stanza.
△ Less
Submitted 23 April, 2020; v1 submitted 16 March, 2020;
originally announced March 2020.
-
Exploring the Role of Visual Content in Fake News Detection
Authors:
Juan Cao,
Peng Qi,
Qiang Sheng,
Tianyun Yang,
Junbo Guo,
**tao Li
Abstract:
The increasing popularity of social media promotes the proliferation of fake news, which has caused significant negative societal effects. Therefore, fake news detection on social media has recently become an emerging research area of great concern. With the development of multimedia technology, fake news attempts to utilize multimedia content with images or videos to attract and mislead consumers…
▽ More
The increasing popularity of social media promotes the proliferation of fake news, which has caused significant negative societal effects. Therefore, fake news detection on social media has recently become an emerging research area of great concern. With the development of multimedia technology, fake news attempts to utilize multimedia content with images or videos to attract and mislead consumers for rapid dissemination, which makes visual content an important part of fake news. Despite the importance of visual content, our understanding of the role of visual content in fake news detection is still limited. This chapter presents a comprehensive review of the visual content in fake news, including the basic concepts, effective visual features, representative detection methods and challenging issues of multimedia fake news detection. This chapter can help readers to understand the role of visual content in fake news detection, and effectively utilize visual content to assist in detecting multimedia fake news.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Answering Complex Open-domain Questions Through Iterative Query Generation
Authors:
Peng Qi,
Xiaowen Lin,
Leo Mehr,
Zijian Wang,
Christopher D. Manning
Abstract:
It is challenging for current one-step retrieve-and-read question answering (QA) systems to answer questions like "Which novel by the author of 'Armada' will be adapted as a feature film by Steven Spielberg?" because the question seldom contains retrievable clues about the missing entity (here, the author). Answering such a question requires multi-hop reasoning where one must gather information ab…
▽ More
It is challenging for current one-step retrieve-and-read question answering (QA) systems to answer questions like "Which novel by the author of 'Armada' will be adapted as a feature film by Steven Spielberg?" because the question seldom contains retrievable clues about the missing entity (here, the author). Answering such a question requires multi-hop reasoning where one must gather information about the missing entity (or facts) to proceed with further reasoning. We present GoldEn (Gold Entity) Retriever, which iterates between reading context and retrieving more supporting documents to answer open-domain multi-hop questions. Instead of using opaque and computationally expensive neural retrieval models, GoldEn Retriever generates natural language search queries given the question and available context, and leverages off-the-shelf information retrieval systems to query for missing entities. This allows GoldEn Retriever to scale up efficiently for open-domain multi-hop reasoning while maintaining interpretability. We evaluate GoldEn Retriever on the recently proposed open-domain multi-hop QA dataset, HotpotQA, and demonstrate that it outperforms the best previously published model despite not using pretrained language models such as BERT.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Spectrum Sensing Based on Deep Learning Classification for Cognitive Radios
Authors:
Shilian Zheng,
Shichuan Chen,
Peihan Qi,
Huaji Zhou,
Xiaoniu Yang
Abstract:
Spectrum sensing is a key technology for cognitive radios. We present spectrum sensing as a classification problem and propose a sensing method based on deep learning classification. We normalize the received signal power to overcome the effects of noise power uncertainty. We train the model with as many types of signals as possible as well as noise data to enable the trained network model to adap…
▽ More
Spectrum sensing is a key technology for cognitive radios. We present spectrum sensing as a classification problem and propose a sensing method based on deep learning classification. We normalize the received signal power to overcome the effects of noise power uncertainty. We train the model with as many types of signals as possible as well as noise data to enable the trained network model to adapt to untrained new signals. We also use transfer learning strategies to improve the performance for real-world signals. Extensive experiments are conducted to evaluate the performance of this method. The simulation results show that the proposed method performs better than two traditional spectrum sensing methods, i.e., maximum-minimum eigenvalue ratio-based method and frequency domain entropy-based method. In addition, the experimental results of the new untrained signal types show that our method can adapt to the detection of these new signals. Furthermore, the real-world signal detection experiment results show that the detection performance can be further improved by transfer learning. Finally, experiments under colored noise show that our proposed method has superior detection performance under colored noise, while the traditional methods have a significant performance degradation, which further validate the superiority of our method.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
False News Detection on Social Media
Authors:
Juan Cao,
Qiang Sheng,
Peng Qi,
Lei Zhong,
Yanyan Wang,
Xueyao Zhang
Abstract:
Social media has become a major information platform where people consume and share news. However, it has also enabled the wide dissemination of false news, i.e., news posts published on social media that are verifiably false, causing significant negative effects on society. In order to help prevent further propagation of false news on social media, we set up this competition to motivate the devel…
▽ More
Social media has become a major information platform where people consume and share news. However, it has also enabled the wide dissemination of false news, i.e., news posts published on social media that are verifiably false, causing significant negative effects on society. In order to help prevent further propagation of false news on social media, we set up this competition to motivate the development of automated real-time false news detection approaches. Specifically, this competition includes three sub-tasks: false-news text detection, false-news image detection and false-news multi-modal detetcion, which aims to motivate participants to further explore the efficiency of multiple modalities in detecting false news and reasonable fusion approaches of multi-modal contents. To better support this competition, we also construct and publicize a multi-modal data repository about False News on Weibo Social platform(MCG-FNeWS}) to help evaluate the performance of different approaches from participants.
△ Less
Submitted 28 August, 2019;
originally announced August 2019.
-
Exploiting Multi-domain Visual Information for Fake News Detection
Authors:
Peng Qi,
Juan Cao,
Tianyun Yang,
Junbo Guo,
**tao Li
Abstract:
The increasing popularity of social media promotes the proliferation of fake news. With the development of multimedia technology, fake news attempts to utilize multimedia contents with images or videos to attract and mislead readers for rapid dissemination, which makes visual contents an important part of fake news. Fake-news images, images attached in fake news posts,include not only fake images…
▽ More
The increasing popularity of social media promotes the proliferation of fake news. With the development of multimedia technology, fake news attempts to utilize multimedia contents with images or videos to attract and mislead readers for rapid dissemination, which makes visual contents an important part of fake news. Fake-news images, images attached in fake news posts,include not only fake images which are maliciously tampered but also real images which are wrongly used to represent irrelevant events. Hence, how to fully exploit the inherent characteristics of fake-news images is an important but challenging problem for fake news detection. In the real world, fake-news images may have significantly different characteristics from real-news images at both physical and semantic levels, which can be clearly reflected in the frequency and pixel domain, respectively. Therefore, we propose a novel framework Multi-domain Visual Neural Network (MVNN) to fuse the visual information of frequency and pixel domains for detecting fake news. Specifically, we design a CNN-based network to automatically capture the complex patterns of fake-news images in the frequency domain; and utilize a multi-branch CNN-RNN model to extract visual features from different semantic levels in the pixel domain. An attention mechanism is utilized to fuse the feature representations of frequency and pixel domains dynamically. Extensive experiments conducted on a real-world dataset demonstrate that MVNN outperforms existing methods with at least 9.2% in accuracy, and can help improve the performance of multimodal fake news detection by over 5.2%.
△ Less
Submitted 12 August, 2019;
originally announced August 2019.
-
Dynamic Malware Analysis with Feature Engineering and Feature Learning
Authors:
Zhaoqi Zhang,
Panpan Qi,
Wei Wang
Abstract:
Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex fe…
▽ More
Dynamic malware analysis executes the program in an isolated environment and monitors its run-time behaviour (e.g. system API calls) for malware detection. This technique has been proven to be effective against various code obfuscation techniques and newly released ("zero-day") malware. However, existing works typically only consider the API name while ignoring the arguments, or require complex feature engineering operations and expert knowledge to process the arguments. In this paper, we propose a novel and low-cost feature extraction approach, and an effective deep neural network architecture for accurate and fast malware detection. Specifically, the feature representation approach utilizes a feature hashing trick to encode the API call arguments associated with the API name. The deep neural network architecture applies multiple Gated-CNNs (convolutional neural networks) to transform the extracted features of each API call. The outputs are further processed through bidirectional LSTM (long-short term memory networks) to learn the sequential correlation among API calls. Experiments show that our solution outperforms baselines significantly on a large real dataset. Valuable insights about feature engineering and architecture design are derived from the ablation study.
△ Less
Submitted 23 January, 2020; v1 submitted 17 July, 2019;
originally announced July 2019.
-
Artificial Intelligence for Prosthetics - challenge solutions
Authors:
Łukasz Kidziński,
Carmichael Ong,
Sharada Prasanna Mohanty,
Jennifer Hicks,
Sean F. Carroll,
Bo Zhou,
Hongsheng Zeng,
Fan Wang,
Rongzhong Lian,
Hao Tian,
Wojciech Jaśkowski,
Garrett Andersen,
Odd Rune Lykkebø,
Nihat Engin Toklu,
Pranav Shyam,
Rupesh Kumar Srivastava,
Sergey Kolesnikov,
Oleksii Hrinchuk,
Anton Pechenko,
Mattias Ljungström,
Zhen Wang,
Xu Hu,
Zehong Hu,
Minghui Qiu,
Jun Huang
, et al. (25 additional authors not shown)
Abstract:
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many s…
▽ More
In the NeurIPS 2018 Artificial Intelligence for Prosthetics challenge, participants were tasked with building a controller for a musculoskeletal model with a goal of matching a given time-varying velocity vector. Top participants were invited to describe their algorithms. In this work, we describe the challenge and present thirteen solutions that used deep reinforcement learning approaches. Many solutions use similar relaxations and heuristics, such as reward sha**, frame skip**, discretization of the action space, symmetry, and policy blending. However, each team implemented different modifications of the known algorithms by, for example, dividing the task into subtasks, learning low-level control, or by incorporating expert knowledge and using imitation learning.
△ Less
Submitted 6 February, 2019;
originally announced February 2019.
-
Universal Dependency Parsing from Scratch
Authors:
Peng Qi,
Timothy Dozat,
Yuhao Zhang,
Christopher D. Manning
Abstract:
This paper describes Stanford's system at the CoNLL 2018 UD Shared Task. We introduce a complete neural pipeline system that takes raw text as input, and performs all tasks required by the shared task, ranging from tokenization and sentence segmentation, to POS tagging and dependency parsing. Our single system submission achieved very competitive performance on big treebanks. Moreover, after fixin…
▽ More
This paper describes Stanford's system at the CoNLL 2018 UD Shared Task. We introduce a complete neural pipeline system that takes raw text as input, and performs all tasks required by the shared task, ranging from tokenization and sentence segmentation, to POS tagging and dependency parsing. Our single system submission achieved very competitive performance on big treebanks. Moreover, after fixing an unfortunate bug, our corrected system would have placed the 2nd, 1st, and 3rd on the official evaluation metrics LAS,MLAS, and BLEX, and would have outperformed all submission systems on low-resource treebank categories on all metrics by a large margin. We further show the effectiveness of different model components through extensive ablation studies.
△ Less
Submitted 29 January, 2019;
originally announced January 2019.
-
Graph Convolution over Pruned Dependency Trees Improves Relation Extraction
Authors:
Yuhao Zhang,
Peng Qi,
Christopher D. Manning
Abstract:
Dependency trees help relation extraction models capture long-range relations between words. However, existing dependency-based models either neglect crucial information (e.g., negation) by pruning the dependency trees too aggressively, or are computationally inefficient because it is difficult to parallelize over different tree structures. We propose an extension of graph convolutional networks t…
▽ More
Dependency trees help relation extraction models capture long-range relations between words. However, existing dependency-based models either neglect crucial information (e.g., negation) by pruning the dependency trees too aggressively, or are computationally inefficient because it is difficult to parallelize over different tree structures. We propose an extension of graph convolutional networks that is tailored for relation extraction, which pools information over arbitrary dependency structures efficiently in parallel. To incorporate relevant information while maximally removing irrelevant content, we further apply a novel pruning strategy to the input trees by kee** words immediately around the shortest path between the two entities among which a relation might hold. The resulting model achieves state-of-the-art performance on the large-scale TACRED dataset, outperforming existing sequence and dependency-based neural models. We also show through detailed analysis that this model has complementary strengths to sequence models, and combining them further improves the state of the art.
△ Less
Submitted 26 September, 2018;
originally announced September 2018.
-
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Authors:
Zhilin Yang,
Peng Qi,
Saizheng Zhang,
Yoshua Bengio,
William W. Cohen,
Ruslan Salakhutdinov,
Christopher D. Manning
Abstract:
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-ex…
▽ More
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers. We introduce HotpotQA, a new dataset with 113k Wikipedia-based question-answer pairs with four key features: (1) the questions require finding and reasoning over multiple supporting documents to answer; (2) the questions are diverse and not constrained to any pre-existing knowledge bases or knowledge schemas; (3) we provide sentence-level supporting facts required for reasoning, allowing QA systems to reason with strong supervision and explain the predictions; (4) we offer a new type of factoid comparison questions to test QA systems' ability to extract relevant facts and perform necessary comparison. We show that HotpotQA is challenging for the latest QA systems, and the supporting facts enable models to improve performance and make explainable predictions.
△ Less
Submitted 25 September, 2018;
originally announced September 2018.
-
Observation of Landau quantization and standing waves in HfSiS
Authors:
L. Jiao,
Q. N. Xu,
Y. P. Qi,
S. -C. Wu,
Y. Sun,
C. Felser,
S. Wirth
Abstract:
Recently, HfSiS was found to be a new type of Dirac semimetal with a line of Dirac nodes in the band structure. Meanwhile, Rashba-split surface states are also pronounced in this compound. Here we report a systematic study of HfSiS by scanning tunneling microscopy/spectroscopy at low temperature and high magnetic field. The Rashba-split surface states are characterized by measuring Landau quantiza…
▽ More
Recently, HfSiS was found to be a new type of Dirac semimetal with a line of Dirac nodes in the band structure. Meanwhile, Rashba-split surface states are also pronounced in this compound. Here we report a systematic study of HfSiS by scanning tunneling microscopy/spectroscopy at low temperature and high magnetic field. The Rashba-split surface states are characterized by measuring Landau quantization and standing waves, which reveal a quasi-linear dispersive band structure. First-principles calculations based on density-functional theory are conducted and compared with the experimental results. Based on these investigations, the properties of the Rashba-split surface states and their interplay with defects and collective modes are discussed.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Authors:
Urvashi Khandelwal,
He He,
Peng Qi,
Dan Jurafsky
Abstract:
We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200…
▽ More
We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history. The model is highly sensitive to the order of words within the most recent sentence, but ignores word order in the long-range context (beyond 50 tokens), suggesting the distant past is modeled only as a rough semantic field or topic. We further find that the neural caching model (Grave et al., 2017b) especially helps the LSTM to copy words from within this distant context. Overall, our analysis not only provides a better understanding of how neural LMs use their context, but also sheds light on recent success from cache-based models.
△ Less
Submitted 11 May, 2018;
originally announced May 2018.
-
Universally uniformly continuous metric spaces
Authors:
Katrina Gensterblum,
Peikai Qi,
Willie Wong
Abstract:
We answer the question: "on which metric spaces $(M,d)$ are all continuous functions uniformly continuous?" Our characterization theorem improves and generalizes a previous result due to Levine and Saunders, and in particular is applicable to metric spaces which are "infinite dimensional."
We answer the question: "on which metric spaces $(M,d)$ are all continuous functions uniformly continuous?" Our characterization theorem improves and generalizes a previous result due to Levine and Saunders, and in particular is applicable to metric spaces which are "infinite dimensional."
△ Less
Submitted 1 January, 2018; v1 submitted 25 December, 2017;
originally announced December 2017.
-
Arc-swift: A Novel Transition System for Dependency Parsing
Authors:
Peng Qi,
Christopher D. Manning
Abstract:
Transition-based dependency parsers often need sequences of local shift and reduce operations to produce certain attachments. Correct individual decisions hence require global information about the sentence context and mistakes cause error propagation. This paper proposes a novel transition system, arc-swift, that enables direct attachments between tokens farther apart with a single transition. Th…
▽ More
Transition-based dependency parsers often need sequences of local shift and reduce operations to produce certain attachments. Correct individual decisions hence require global information about the sentence context and mistakes cause error propagation. This paper proposes a novel transition system, arc-swift, that enables direct attachments between tokens farther apart with a single transition. This allows the parser to leverage lexical information more directly in transition decisions. Hence, arc-swift can achieve significantly better performance with a very small beam size. Our parsers reduce error by 3.7--7.6% relative to those using existing transition systems on the Penn Treebank dependency parsing task and English Universal Dependencies.
△ Less
Submitted 11 May, 2017;
originally announced May 2017.
-
Dirac Line-nodes and Effect of Spin-orbit Coupling in Non-symmorphic Critical Semimetal MSiS (M=Hf, Zr)
Authors:
C. Chen,
X. Xu,
J. Jiang,
S. -C. Wu,
Y. P. Qi,
L. X. Yang,
M. X. Wang,
Y. Sun,
N. B. M. Schröter,
H. F. Yang,
L. M. Schoop,
Y. Y. Lv,
J. Zhou,
Y. B. Chen,
S. H. Yao,
M. H. Lu,
Y. F. Chen,
C. Felser,
B. H. Yan,
Z. K. Liu,
Y. L. Chen
Abstract:
Topological Dirac semimetals (TDSs) represent a new state of quantum matter recently discovered that offers a platform for realizing many exotic physical phenomena. A TDS is characterized by the linear touching of bulk (conduction and valance) bands at discrete points in the momentum space (i.e. 3D Dirac points), such as in Na3Bi and Cd3As2. More recently, new types of Dirac semimetals with robust…
▽ More
Topological Dirac semimetals (TDSs) represent a new state of quantum matter recently discovered that offers a platform for realizing many exotic physical phenomena. A TDS is characterized by the linear touching of bulk (conduction and valance) bands at discrete points in the momentum space (i.e. 3D Dirac points), such as in Na3Bi and Cd3As2. More recently, new types of Dirac semimetals with robust Dirac line-nodes (with non-trivial topology or near the critical point between topological phase transitions) have been proposed that extends the bulk linear touching from discrete points to 1D lines. In this work, using angle-resolved photoemission spectroscopy (ARPES), we explored the electronic structure of the non-symmorphic crystals MSiS (M=Hf, Zr). Remarkably, by map** out the band structure in the full 3D Brillouin Zone (BZ), we observed two sets of Dirac line-nodes in parallel with the kz-axis and their dispersions. Interestingly, along directions other than the line-nodes in the 3D BZ, the bulk degeneracy is lifted by spin-orbit coupling (SOC) in both compounds with larger magnitude in HfSiS. Our work not only experimentally confirms a new Dirac line-node semimetal family protected by non-symmorphic symmetry, but also helps understanding and further exploring the exotic properties as well as practical applications of the MSiS family of compounds.
△ Less
Submitted 26 January, 2017; v1 submitted 24 January, 2017;
originally announced January 2017.
-
Observation of the Type-II Weyl Semimetal Phase in MoTe2
Authors:
J. Jiang,
Z. K. Liu,
Y. Sun,
H. F. Yang,
R. Rajamathi,
Y. P. Qi,
L. X. Yang,
C. Chen,
H. Peng,
C. -C. Hwang,
S. Z. Sun,
S. -K. Mo,
I. Vobornik,
J. Fujii,
S. S. P. Parkin,
C. Felser,
B. H. Yan,
Y. L. Chen
Abstract:
Topological Weyl semimetal (TWS), a new state of quantum matter, has sparked enormous research interest recently. Possessing unique Weyl fermions in the bulk and Fermi arcs on the surface, TWSs offer a rare platform for realizing many exotic physical phenomena. TWSs can be classified into type-I that respect Lorentz symmetry and type-II that do not. Here, we directly visualize the electronic struc…
▽ More
Topological Weyl semimetal (TWS), a new state of quantum matter, has sparked enormous research interest recently. Possessing unique Weyl fermions in the bulk and Fermi arcs on the surface, TWSs offer a rare platform for realizing many exotic physical phenomena. TWSs can be classified into type-I that respect Lorentz symmetry and type-II that do not. Here, we directly visualize the electronic structure of MoTe2, a recently proposed type-II TWS. Using angle-resolved photoemission spectroscopy (ARPES), we unravel the unique surface Fermi arcs, in good agreement with our ab-initio calculations. From spin-resolved ARPES measurements, we demonstrate the non-degenerate spin-texture of surface Fermi-arcs, thereby proving their non-trivial topological nature. Our work not only lead to new understandings of the unusual properties discovered in this family of compounds, but also allow for the further exploration of exotic properties and practical applications of type-II TWSs, as well as the interplay between superconductivity (MoTe2 was discovered to be superconducting recently) and their topological order.
△ Less
Submitted 1 April, 2016;
originally announced April 2016.
-
Building DNN Acoustic Models for Large Vocabulary Speech Recognition
Authors:
Andrew L. Maas,
Peng Qi,
Ziang Xie,
Awni Y. Hannun,
Christopher T. Lengerich,
Daniel Jurafsky,
Andrew Y. Ng
Abstract:
Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system perfo…
▽ More
Deep neural networks (DNNs) are now a central component of nearly all state-of-the-art speech recognition systems. Building neural network acoustic models requires several design decisions including network architecture, size, and training loss function. This paper offers an empirical investigation on which aspects of DNN acoustic model design are most important for speech recognition system performance. We report DNN classifier performance and final speech recognizer word error rates, and compare DNNs using several metrics to quantify factors influencing differences in task performance. Our first set of experiments use the standard Switchboard benchmark corpus, which contains approximately 300 hours of conversational telephone speech. We compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. We additionally build systems on a corpus of 2,100 hours of training data by combining the Switchboard and Fisher corpora. This larger corpus allows us to more thoroughly examine performance of large DNN models -- with up to ten times more parameters than those typically used in speech recognition systems. Our results suggest that a relatively simple DNN architecture and optimization technique produces strong results. These findings, along with previous work, help establish a set of best practices for building DNN hybrid speech recognition systems with maximum likelihood training. Our experiments in DNN optimization additionally serve as a case study for training DNNs with discriminative loss functions for speech tasks, as well as DNN classifiers more generally.
△ Less
Submitted 20 January, 2015; v1 submitted 30 June, 2014;
originally announced June 2014.
-
Hydrogenation and Hydro-Carbonation and Etching of Single-Walled Carbon Nanotubes
Authors:
Guangyu Zhang,
Pengfei Qi,
Xinran Wang,
Yuerui Lu,
David Mann,
Xiaolin Li,
Hongjie Dai
Abstract:
We present a systematic experimental investigation of the reactions between hydrogen plasma and single-walled carbon nanotubes (SWNTs) at various temperatures. Microscopy, infrared (IR) and Raman spectroscopy and electrical transport measurements are carried out to investigate the properties of SWNTs after hydrogenation. Structural deformations, drastically reduced electrical conductance and inc…
▽ More
We present a systematic experimental investigation of the reactions between hydrogen plasma and single-walled carbon nanotubes (SWNTs) at various temperatures. Microscopy, infrared (IR) and Raman spectroscopy and electrical transport measurements are carried out to investigate the properties of SWNTs after hydrogenation. Structural deformations, drastically reduced electrical conductance and increased semiconducting nature of SWNTs upon sidewall hydrogenation are observed. These changes are reversible upon thermal annealing at 500C via dehydrogenation. Harsh plasma or high temperature reactions lead to etching of nanotube likely via hydro-carbonation. Smaller SWNTs are markedly less stable against hydro-carbonation than larger tubes. The results are fundamental and may have implications to basic and practical applications including hydrogen storage, sensing, band-gap engineering for novel electronics and new methods of manipulation, functionalization and etching of nanotubes.
△ Less
Submitted 3 November, 2006;
originally announced November 2006.
-
The Role of Doppler Broadening in Electromagnetically Induced Transparency and Autler-Townes Splitting in Open Molecular Systems
Authors:
A. Lazoudis,
E. Ahmed,
L. Li,
T. Kirova,
P. Qi,
A. Hansson,
J. Magnes,
F. C. Spano,
A. M. Lyyra
Abstract:
We describe in this Letter how inhomogeneous line broadening affects the Autler-Townes (AT) splitting in a three level open molecular cascade system. For moderate Rabi frequencies in the range of 300 to 500 MHz the fluorescence line shape from the uppermost level |3> in this system depends strongly on the frequency ratio of the two laser fields. However, the fluorescence spectrum of the intermed…
▽ More
We describe in this Letter how inhomogeneous line broadening affects the Autler-Townes (AT) splitting in a three level open molecular cascade system. For moderate Rabi frequencies in the range of 300 to 500 MHz the fluorescence line shape from the uppermost level |3> in this system depends strongly on the frequency ratio of the two laser fields. However, the fluorescence spectrum of the intermediate level |2> appears as expected. We provide a description of the conditions for optimally resolved AT splitting in terms of the probe laser/coupling field frequency ratio and laser propagation geometry based on our theoretical analysis of the Doppler integral. This is important for applications such as molecular angular momentum alignment as well as for the measurement of the transition dipole moment matrix element.
△ Less
Submitted 15 August, 2005;
originally announced August 2005.
-
10 to 50 nm Long Quasi Ballistic Carbon Nanotube Devices Obtained Without Complex Lithography
Authors:
Ali Javey,
Pengfei Qi,
Qian Wang,
Hongjie Dai
Abstract:
A simple method combining photolithography and shadow (or angle) evaporation is developed to fabricate single-walled carbon nanotube (SWCNT) devices with tube lengths L~10-50 nm between metal contacts. Large numbers of such short devices are obtained without the need of complex tools such as electron beam lithography. Metallic SWCNTs with lengths ~ 10 nm, near the mean free path (mfp) of lop~15…
▽ More
A simple method combining photolithography and shadow (or angle) evaporation is developed to fabricate single-walled carbon nanotube (SWCNT) devices with tube lengths L~10-50 nm between metal contacts. Large numbers of such short devices are obtained without the need of complex tools such as electron beam lithography. Metallic SWCNTs with lengths ~ 10 nm, near the mean free path (mfp) of lop~15 nm for optical phonon scattering, exhibit near-ballistic transport at high biases and can carry unprecedented 100 mA currents per tube. Semiconducting SWCNT field-effect transistors (FETs) with ~ 50 nm channel lengths are routinely produced to achieve quasi-ballistic operations for molecular transistors. The results demonstrate highly length-scaled and high-performance interconnects and transistors realized with SWCNTs.
△ Less
Submitted 7 September, 2004;
originally announced September 2004.
-
Miniature Organic Transistors With Carbon Nanotubes as Quasi-One Dimensional Electrodes
Authors:
Pengfei Qi,
Ali Javey,
Marco Rolandi,
Qian Wang,
Erhan Yenilmez,
Hongjie Dai
Abstract:
As the dimensions of electronic devices approach those of molecules, the size, geometry and chemical composition of the contact electrodes play increasingly dominant roles in device functions. It is shown here that single-walled carbon nanotubes (SWNT) can be used as quasi one-dimensional (1D) electrodes to construct organic field effect transistors (FET) with molecular scale width (~2 nm) and c…
▽ More
As the dimensions of electronic devices approach those of molecules, the size, geometry and chemical composition of the contact electrodes play increasingly dominant roles in device functions. It is shown here that single-walled carbon nanotubes (SWNT) can be used as quasi one-dimensional (1D) electrodes to construct organic field effect transistors (FET) with molecular scale width (~2 nm) and channel length (1-3 nm). An important feature owing to the quasi 1D electrode geometry is the favorable gate electrostatics that allows for efficient switching of ultra-short organic channels. This affords room temperature conductance modulation by orders of magnitude for organic transistors that are only several-molecules in length, with switching characteristics superior to similar devices with lithographically patterned metal electrodes. With nanotubes, covalent carbon-carbon bonds could be utilized to form contacts to molecular materials. The unique geometrical, physical and chemical properties of carbon nanotube electrodes may lead to various interesting molecular devices.
△ Less
Submitted 19 August, 2004;
originally announced August 2004.