-
Matryoshka Query Transformer for Large Vision-Language Models
Authors:
Wenbo Hu,
Zi-Yi Dou,
Liunian Harold Li,
Amita Kamath,
Nanyun Peng,
Kai-Wei Chang
Abstract:
Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource…
▽ More
Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources? We answer this with an emphatic yes. Inspired by Matryoshka Representation Learning, we introduce the Matryoshka Query Transformer (MQT), capable of encoding an image into m visual tokens during inference, where m can be any number up to a predefined maximum. This is achieved by employing a query transformer with M latent query tokens to compress the visual embeddings. During each training step, we randomly select m <= M latent query tokens and train the model using only these first m tokens, discarding the rest. Combining MQT with LLaVA, we train a single model once, and flexibly and drastically reduce the number of inference-time visual tokens while maintaining similar or better performance compared to training independent models for each number of tokens. Our model, MQT-LLAVA, matches LLaVA-1.5 performance across 11 benchmarks using a maximum of 256 tokens instead of LLaVA's fixed 576. Reducing to 16 tokens (8x less TFLOPs) only sacrifices the performance by 2.4 points on MMBench. On certain tasks such as ScienceQA and MMMU, we can even go down to only 2 visual tokens with performance drops of just 3% and 6% each. Our exploration of the trade-off between the accuracy and computational cost brought about by the number of visual tokens facilitates future research to achieve the best of both worlds.
△ Less
Submitted 6 June, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Tailoring Self-Rationalizers with Multi-Reward Distillation
Authors:
Sahana Ramnath,
Brihi Joshi,
Skyler Hallinan,
Ximing Lu,
Liunian Harold Li,
Aaron Chan,
Jack Hessel,
Ye** Choi,
Xiang Ren
Abstract:
Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In thi…
▽ More
Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency.
△ Less
Submitted 22 May, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
DesCo: Learning Object Recognition with Rich Language Descriptions
Authors:
Liunian Harold Li,
Zi-Yi Dou,
Nanyun Peng,
Kai-Wei Chang
Abstract:
Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models' adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions th…
▽ More
Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models' adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions that include specifications of fine-grained semantic details, such as attributes, shapes, textures, and relations. However, simply incorporating language descriptions as queries does not guarantee accurate interpretation by the models. In fact, our experiments show that GLIP, the state-of-the-art vision-language model for object detection, often disregards contextual information in the language descriptions and instead relies heavily on detecting objects solely by their names. To tackle the challenges, we propose a new description-conditioned (DesCo) paradigm of learning object recognition models with rich language descriptions consisting of two major innovations: 1) we employ a large language model as a commonsense knowledge engine to generate rich language descriptions of objects based on object names and the raw image-text caption; 2) we design context-sensitive queries to improve the model's ability in deciphering intricate nuances embedded within descriptions and enforce the model to focus on context rather than object names alone. On two novel object detection benchmarks, LVIS and OminiLabel, under the zero-shot detection setting, our approach achieves 34.8 APr minival (+9.1) and 29.3 AP (+3.6), respectively, surpassing the prior state-of-the-art models, GLIP and FIBER, by a large margin.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step
Authors:
Liunian Harold Li,
Jack Hessel,
Youngjae Yu,
Xiang Ren,
Kai-Wei Chang,
Ye** Choi
Abstract:
Chain-of-thought prompting (e.g., "Let's think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M -- 1.3B parameters) can still benefit from chain-of-th…
▽ More
Chain-of-thought prompting (e.g., "Let's think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M -- 1.3B parameters) can still benefit from chain-of-thought prompting. To achieve this, we introduce Symbolic Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model. Experiments across several commonsense benchmarks show that: 1) SCoTD enhances the performance of the student model in both supervised and few-shot settings, and especially for challenge sets; 2) sampling many reasoning chains per instance from the teacher is paramount; and 3) after distillation, student chain-of-thoughts are judged by humans as comparable to the teacher, despite orders of magnitude fewer parameters. We test several hypotheses regarding what properties of chain-of-thought samples are important, e.g., diversity vs. teacher likelihood vs. open-endedness. We release our corpus of chain-of-thought samples and code.
△ Less
Submitted 15 April, 2024; v1 submitted 24 June, 2023;
originally announced June 2023.
-
MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models
Authors:
Masoud Monajatipoor,
Liunian Harold Li,
Mozhdeh Rouhsedaghat,
Lin F. Yang,
Kai-Wei Chang
Abstract:
Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothes…
▽ More
Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite
Authors:
Z. X. Ling,
X. J. Sun,
C. Zhang,
S. L. Sun,
G. **,
S. N. Zhang,
X. F. Zhang,
J. B. Chang,
F. S. Chen,
Y. F. Chen,
Z. W. Cheng,
W. Fu,
Y. X. Han,
H. Li,
J. F. Li,
Y. Li,
Z. D. Li,
P. R. Liu,
Y. H. Lv,
X. H. Ma,
Y. J. Tang,
C. B. Wang,
R. J. Xie,
Y. L. Xue,
A. L. Yan
, et al. (101 additional authors not shown)
Abstract:
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo…
▽ More
The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Evolution of $1/f$ Flux Noise in Superconducting Qubits with Weak Magnetic Fields
Authors:
David A. Rower,
Lamia Ateshian,
Lauren H. Li,
Max Hays,
Dolev Bluvstein,
Leon Ding,
Bharath Kannan,
Aziza Almanakly,
Jochen Braumüller,
David K. Kim,
Alexander Melville,
Bethany M. Niedzielski,
Mollie E. Schwartz,
Jonilyn L. Yoder,
Terry P. Orlando,
Joel I-Jan Wang,
Simon Gustavsson,
Jeffrey A. Grover,
Kyle Serniak,
Riccardo Comin,
William D. Oliver
Abstract:
The microscopic origin of $1/f$ magnetic flux noise in superconducting circuits has remained an open question for several decades despite extensive experimental and theoretical investigation. Recent progress in superconducting devices for quantum information has highlighted the need to mitigate sources of qubit decoherence, driving a renewed interest in understanding the underlying noise mechanism…
▽ More
The microscopic origin of $1/f$ magnetic flux noise in superconducting circuits has remained an open question for several decades despite extensive experimental and theoretical investigation. Recent progress in superconducting devices for quantum information has highlighted the need to mitigate sources of qubit decoherence, driving a renewed interest in understanding the underlying noise mechanism(s). Though a consensus has emerged attributing flux noise to surface spins, their identity and interaction mechanisms remain unclear, prompting further study. Here we apply weak in-plane magnetic fields to a capacitively-shunted flux qubit (where the Zeeman splitting of surface spins lies below the device temperature) and study the flux-noise-limited qubit dephasing, revealing previously unexplored trends that may shed light on the dynamics behind the emergent $1/f$ noise. Notably, we observe an enhancement (suppression) of the spin-echo (Ramsey) pure dephasing time in fields up to $B=100~\text{G}$. With direct noise spectroscopy, we further observe a transition from a $1/f$ to approximately Lorentzian frequency dependence below 10 Hz and a reduction of the noise above 1 MHz with increasing magnetic field. We suggest that these trends are qualitatively consistent with an increase of spin cluster sizes with magnetic field. These results should help to inform a complete microscopic theory of $1/f$ flux noise in superconducting circuits.
△ Less
Submitted 18 January, 2023;
originally announced January 2023.
-
Observation of $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
M. Albrecht,
R. Aliberti,
A. Amoroso,
M. R. An,
Q. An,
X. H. Bai,
Y. Bai,
O. Bakina,
R. Baldini Ferroli,
I. Balossino,
Y. Ban,
V. Batozskaya,
D. Becker,
K. Begzsuren,
N. Berger,
M. Bertani,
D. Bettoni,
F. Bianchi,
J. Bloms,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (545 additional authors not shown)
Abstract:
Using data taken at 29 center-of-mass energies between 4.16 and 4.70 GeV with the BESIII detector at the Bei**g Electron Positron Collider corresponding to a total integrated luminosity of approximately 18.8 $\rm fb^{-1}$, the process $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$ is observed for the first time with a statistical significance of $11.5σ$. The average Born cross sections in the ener…
▽ More
Using data taken at 29 center-of-mass energies between 4.16 and 4.70 GeV with the BESIII detector at the Bei**g Electron Positron Collider corresponding to a total integrated luminosity of approximately 18.8 $\rm fb^{-1}$, the process $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$ is observed for the first time with a statistical significance of $11.5σ$. The average Born cross sections in the energy ranges of (4.160, 4.380) GeV, (4.400, 4.600) GeV and (4.610, 4.700) GeV are measured to be $(21.5\pm5.7\pm1.2)$ fb, $(46.3\pm10.6\pm2.5)$ fb and $(59.0\pm9.4\pm3.2)$ fb, respectively, where the first uncertainties are statistical and the second are systematic. The line shapes of the $\bar{p}\bar{n}$ and $ppπ^-$ invariant mass spectra are consistent with phase space distributions, indicating that no hexaquark or di-baryon state is observed.
△ Less
Submitted 23 November, 2022;
originally announced November 2022.
-
First wide field-of-view X-ray observations by a lobster eye focusing telescope in orbit
Authors:
C. Zhang,
Z. X. Ling,
X. J. Sun,
S. L. Sun,
Y. Liu,
Z. D. Li,
Y. L. Xue,
Y. F. Chen,
Y. F. Dai,
Z. Q. Jia,
H. Y. Liu,
X. F. Zhang,
Y. H. Zhang,
S. N. Zhang,
F. S. Chen,
Z. W. Cheng,
W. Fu,
Y. X. Han,
H. Li,
J. F. Li,
Y. Li,
P. R. Liu,
X. H. Ma,
Y. J. Tang,
C. B. Wang
, et al. (53 additional authors not shown)
Abstract:
As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we repor…
▽ More
As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we report on the first-light results from a flight experiment of the Lobster Eye Imager for Astronomy ($LEIA$), a pathfinder of the wide-field X-ray telescope of the Einstein Probe mission. The piggyback imager, launched in July 2022, has a mostly un-vignetted field of view of $18.6^\circ \times 18.6^\circ $. Its spatial resolution is in the range of 4$-$7 arcmin in FWHM and the focal spot effective area is 2$-$3 cm$^2$, both showing only mild fluctuations across the field of view. We present images of the Galactic center region, Sco X-1 and the diffuse Cygnus Loop nebular taken in snapshot observations over 0.5$-$4 keV. These are truly wide-field X-ray images of celestial bodies observed, for the first time, by a focusing imaging telescope. Initial analyses of the in-flight data show excellent agreement between the observed images and the on-ground calibration and simulations. The instrument and its characterization are briefly described, as well as the flight experiment. The results provide a solid basis for the development of the present and proposed wide-field X-ray missions using lobster eye MPO.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
Specialization at an expanding front
Authors:
Lauren H. Li,
Mehran Kardar
Abstract:
As a population grows, spreading to new environments may favor specialization. In this paper, we introduce and explore a model for specialization at the front of a colony expanding synchronously into new territory. We show through numerical simulations that, by gaining fitness through accumulating mutations, progeny of the initial seed population can differentiate into distinct specialists. With c…
▽ More
As a population grows, spreading to new environments may favor specialization. In this paper, we introduce and explore a model for specialization at the front of a colony expanding synchronously into new territory. We show through numerical simulations that, by gaining fitness through accumulating mutations, progeny of the initial seed population can differentiate into distinct specialists. With competition and selection limited to the growth front, the emerging specialists first segregate into sectors, which then expand to dominate the entire population. We quantify the scaling of the fixation time with the size of the population and observe different behaviors corresponding to distinct universality classes: unbounded and bounded gains in fitness lead to superdiffusive ($z=3/2$) and diffusive ($z=2$) stochastic wanderings of the sector boundaries, respectively.
△ Less
Submitted 13 September, 2023; v1 submitted 11 October, 2022;
originally announced October 2022.
-
GLIPv2: Unifying Localization and Vision-Language Understanding
Authors:
Haotian Zhang,
Pengchuan Zhang,
Xiaowei Hu,
Yen-Chun Chen,
Liunian Harold Li,
Xiyang Dai,
Lijuan Wang,
Lu Yuan,
Jenq-Neng Hwang,
Jianfeng Gao
Abstract:
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, reg…
▽ More
We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at https://github.com/microsoft/GLIP.
△ Less
Submitted 11 October, 2022; v1 submitted 12 June, 2022;
originally announced June 2022.
-
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Authors:
**gnong Qu,
Liunian Harold Li,
Jieyu Zhao,
Sunipa Dev,
Kai-Wei Chang
Abstract:
Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering…
▽ More
Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism. The dataset poses multiple unique challenges: limited data and label imbalance, reliance on external knowledge, multimodal reasoning, layout dependency, and noise from OCR. We test multiple widely-used unimodal and multimodal models on this dataset. The experiments show that the room for improvement is still huge for current models.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models
Authors:
Da Yin,
Hritik Bansal,
Masoud Monajatipoor,
Liunian Harold Li,
Kai-Wei Chang
Abstract:
Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commo…
▽ More
Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs. GeoMLAMA contains 3,125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard multilingual PLMs on GeoMLAMA. Interestingly, we find that 1) larger multilingual PLMs variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) multilingual PLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country. Code and data are released at https://github.com/WadeYin9712/GeoMLAMA.
△ Less
Submitted 29 November, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
On the Paradox of Learning to Reason from Data
Authors:
Honghua Zhang,
Liunian Harold Li,
Tao Meng,
Kai-Wei Chang,
Guy Van den Broeck
Abstract:
Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accurac…
▽ More
Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has in fact learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features.
△ Less
Submitted 24 May, 2022; v1 submitted 23 May, 2022;
originally announced May 2022.
-
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Authors:
Chunyuan Li,
Haotian Liu,
Liunian Harold Li,
Pengchuan Zhang,
Jyoti Aneja,
Jianwei Yang,
** **,
Houdong Hu,
Zicheng Liu,
Yong Jae Lee,
Jianfeng Gao
Abstract:
Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public bench…
▽ More
Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/
△ Less
Submitted 13 October, 2022; v1 submitted 19 April, 2022;
originally announced April 2022.
-
RegionCLIP: Region-based Language-Image Pretraining
Authors:
Yiwu Zhong,
Jianwei Yang,
Pengchuan Zhang,
Chunyuan Li,
Noel Codella,
Liunian Harold Li,
Luowei Zhou,
Xiyang Dai,
Lu Yuan,
Yin Li,
Jianfeng Gao
Abstract:
Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without…
▽ More
Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Authors:
Zhecan Wang,
Haoxuan You,
Liunian Harold Li,
Alireza Zareian,
Suji Park,
Yiqing Liang,
Kai-Wei Chang,
Shih-Fu Chang
Abstract:
Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-mo…
▽ More
Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Grounded Language-Image Pre-training
Authors:
Liunian Harold Li,
Pengchuan Zhang,
Haotian Zhang,
Jianwei Yang,
Chunyuan Li,
Yiwu Zhong,
Lijuan Wang,
Lu Yuan,
Lei Zhang,
Jenq-Neng Hwang,
Kai-Wei Chang,
Jianfeng Gao
Abstract:
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can…
▽ More
This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code is released at https://github.com/microsoft/GLIP.
△ Less
Submitted 17 June, 2022; v1 submitted 7 December, 2021;
originally announced December 2021.
-
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Authors:
Da Yin,
Liunian Harold Li,
Ziniu Hu,
Nanyun Peng,
Kai-Wei Chang
Abstract:
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally o…
▽ More
Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
Quantum kernels with squeezed-state encoding for machine learning
Authors:
Long Hin Li,
Dan-Bo Zhang,
Z. D. Wang
Abstract:
Kernel methods are powerful for machine learning, as they can represent data in feature spaces that similarities between samples may be faithfully captured. Recently, it is realized that machine learning enhanced by quantum computing is closely related to kernel methods, where the exponentially large Hilbert space turns to be a feature space more expressive than classical ones. In this paper, we g…
▽ More
Kernel methods are powerful for machine learning, as they can represent data in feature spaces that similarities between samples may be faithfully captured. Recently, it is realized that machine learning enhanced by quantum computing is closely related to kernel methods, where the exponentially large Hilbert space turns to be a feature space more expressive than classical ones. In this paper, we generalize quantum kernel methods by encoding data into continuous-variable quantum states, which can benefit from the infinite-dimensional Hilbert space of continuous variables. Specially, we propose squeezed-state encoding, in which data is encoded as either in the amplitude or the phase. The kernels can be calculated on a quantum computer and then are combined with classical machine learning, e.g. support vector machine, for training and predicting tasks. Their comparisons with other classical kernels are also addressed. Lastly, we discuss physical implementations of squeezed-state encoding for machine learning in quantum platforms such as trapped ions.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis
Authors:
Masoud Monajatipoor,
Mozhdeh Rouhsedaghat,
Liunian Harold Li,
Aichi Chien,
C. -C. Jay Kuo,
Fabien Scalzo,
Kai-Wei Chang
Abstract:
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the do…
▽ More
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art (SOTA) while it is trained on a 9 times smaller dataset.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
How Much Can CLIP Benefit Vision-and-Language Tasks?
Authors:
Sheng Shen,
Liunian Harold Li,
Hao Tan,
Mohit Bansal,
Anna Rohrbach,
Kai-Wei Chang,
Zhewei Yao,
Kurt Keutzer
Abstract:
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amou…
▽ More
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amount of image-caption pairs, has shown a strong zero-shot capability on various vision tasks. To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two typical scenarios: 1) plugging CLIP into task-specific fine-tuning; 2) combining CLIP with V&L pre-training and transferring to downstream tasks. We show that CLIP significantly outperforms widely-used visual encoders trained with in-domain annotated data, such as BottomUp-TopDown. We achieve competitive or better results on diverse V&L tasks, while establishing new state-of-the-art results on Visual Question Answering, Visual Entailment, and V&L Navigation tasks. We release our code at https://github.com/clip-vil/CLIP-ViL.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Mechanical Properties of Atomically Thin Tungsten Dichalcogenides: WS$_2$, WSe$_2$ and WTe$_2$
Authors:
Alexey Falin,
Matthew Holwill,
Haifeng Lv,
Wei Gan,
Jun Cheng,
Rui Zhang,
Dong Qian,
Matthew R. Barnett,
Elton J. G. Santos,
Konstantin S. Novoselov,
Tao Tao,
Xiaojun Wu,
Lu Hua Li
Abstract:
Two-dimensional (2D) tungsten disulfide (WS$_2$), tungsten diselenide (WSe$_2$), and tungsten ditelluride (WTe$_2$) draw increasing attention due to their attractive properties deriving from the heavy tungsten and chalcogenide atoms, but their mechanical properties are still mostly unknown. Here, we determine the intrinsic and air-aged mechanical properties of mono-, bi-, and trilayer (1-3L) WS…
▽ More
Two-dimensional (2D) tungsten disulfide (WS$_2$), tungsten diselenide (WSe$_2$), and tungsten ditelluride (WTe$_2$) draw increasing attention due to their attractive properties deriving from the heavy tungsten and chalcogenide atoms, but their mechanical properties are still mostly unknown. Here, we determine the intrinsic and air-aged mechanical properties of mono-, bi-, and trilayer (1-3L) WS$_2$, WSe$_2$ and WTe$_2$ using a complementary suite of experiments and theoretical calculations. High-quality 1L WS$_2$ has the highest Young's modulus (302.4+-24.1 GPa) and strength (47.0+-8.6 GPa) of the entire family, overpassing those of 1L WSe$_2$ (258.6+-38.3 and 38.0+-6.0 GPa, respectively) and WTe$_2$ (149.1+-9.4 and 6.4+-3.3 GPa, respectively). However, the elasticity and strength of WS$_2$ decrease most dramatically with increased thickness among the three materials. We interpret the phenomenon by the different tendencies for interlayer sliding in equilibrium state and under in-plane strain and out-of-plane compression conditions in the indentation process, revealed by finite element method (FEM) and density functional theory (DFT) calculations including van der Waals (vdW) interactions. We also demonstrate that the mechanical properties of the high-quality 1-3L WS$_2$ and WSe$_2$ are largely stable in the air for up to 20 weeks. Intriguingly, the 1-3L WSe$_2$ shows increased modulus and strength values with aging in the air. This is ascribed to oxygen do**, which reinforces the structure. The present study will facilitate the design and use of 2D tungsten dichalcogenides in applications, such as strain engineering and flexible field-effect transistors (FETs).
△ Less
Submitted 28 January, 2021;
originally announced January 2021.
-
Layer-dependent mechanical properties and enhanced plasticity in the van der Waals chromium trihalide magnets
Authors:
Fernando Cantos-Prieto,
Alexey Falin,
Martin Alliati,
Dong Qian,
Rui Zhang,
Tao Tao,
Matthew R. Barnett,
Elton J. G. Santos,
Lu Hua Li,
Efren Navarro-Moratalla
Abstract:
The mechanical properties of magnetic materials are instrumental for the development of the magnetoelastic theory and the optimization of strain-modulated magnetic devices. In particular, two-dimensional (2D) magnets hold promise to enlarge these concepts into the realm of low-dimensional physics and ultrathin devices. However, no experimental study on the intrinsic mechanical properties of the ar…
▽ More
The mechanical properties of magnetic materials are instrumental for the development of the magnetoelastic theory and the optimization of strain-modulated magnetic devices. In particular, two-dimensional (2D) magnets hold promise to enlarge these concepts into the realm of low-dimensional physics and ultrathin devices. However, no experimental study on the intrinsic mechanical properties of the archetypal 2D magnet family of the chromium trihalides has thus far been performed. Here, we report the room temperature layer-dependent mechanical properties of atomically thin CrI3 and CrCl3, finding that bilayers of CrI3 and CrCl3 have Young's moduli of 62.1 GPa and 43.4 GPa, with the highest sustained strain of 6.09% and 6.49% and breaking strengths of 3.6 GPa and 2.2 GPa, respectively. Both the elasticity and strength of the two materials decrease with increased thickness, which is attributed to a weak interlayer interaction that enables interlayer sliding under low levels of applied load. The mechanical properties observed in the few-layer chromium trihalide crystals provide evidence of outstanding plasticity in these materials, which is qualitatively demonstrated in their bulk counterparts. This study will contribute to various applications of the van der Waals magnetic materials, especially for their use in magnetostrictive and flexible devices.
△ Less
Submitted 1 April, 2021; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Authors:
Liunian Harold Li,
Haoxuan You,
Zhecan Wang,
Alireza Zareian,
Shih-Fu Chang,
Kai-Wei Chang
Abstract:
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through u…
▽ More
Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora. In particular, we propose to conduct ``mask-and-predict'' pre-training on text-only and image-only corpora and introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. We find that such a simple approach achieves performance close to a model pre-trained with aligned data, on four English V&L benchmarks. Our work challenges the widely held notion that aligned data is necessary for V&L pre-training, while significantly reducing the amount of supervision needed for V&L models.
△ Less
Submitted 11 April, 2021; v1 submitted 24 October, 2020;
originally announced October 2020.
-
Mechanical Properties of Atomically Thin Boron Nitride and the Role of Interlayer Interactions
Authors:
Aleksey Falin,
Qiran Cai,
Elton J. G. Santos,
Declan Scullion,
Dong Qian,
Rui Zhang,
Zhi Yang,
Shaoming Huang,
Kenji Watanabe,
Takashi Taniguchi,
Matthew R. Barnett,
Ying Chen,
Rodney S. Ruoff,
Lu Hua Li
Abstract:
Atomically thin boron nitride (BN) nanosheets are important two-dimensional nanomaterials with many unique properties distinct from those of graphene, but the investigation of their mechanical properties still greatly lacks. Here we report that high-quality single-crystalline mono- and few-layer BN nanosheets are one of the strongest electrically insulating materials. More intriguingly, few-layer…
▽ More
Atomically thin boron nitride (BN) nanosheets are important two-dimensional nanomaterials with many unique properties distinct from those of graphene, but the investigation of their mechanical properties still greatly lacks. Here we report that high-quality single-crystalline mono- and few-layer BN nanosheets are one of the strongest electrically insulating materials. More intriguingly, few-layer BN shows mechanical behaviors quite different from those of few-layer graphene under indentation. In striking contrast to graphene, whose strength decreases by more than 30% when the number of layers increases from 1 to 8, the mechanical strength of BN nanosheets is not sensitive to increasing thickness. We attribute this difference to the distinct interlayer interactions and hence sliding tendencies in these two materials under indentation. The significantly better mechanical integrity of BN nanosheets makes them a more attractive candidate than graphene for several applications, e.g. as mechanical reinforcements.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Raman Signature and Phonon Dispersion of Atomically Thin Boron Nitride
Authors:
Qiran Cai,
Declan Scullion,
Aleksey Falin,
Kenji Watanabe,
Takashi Taniguchi,
Ying Chen,
Elton J. G. Santos,
Lu Hua Li
Abstract:
Raman spectroscopy has become an essential technique to characterize and investigate graphene and many other two-dimensional materials. However, there still lacks consensus on the Raman signature and phonon dispersion of atomically thin boron nitride (BN), which has many unique properties distinct from graphene. Such a knowledge gap greatly affects the understanding of basic physical and chemical…
▽ More
Raman spectroscopy has become an essential technique to characterize and investigate graphene and many other two-dimensional materials. However, there still lacks consensus on the Raman signature and phonon dispersion of atomically thin boron nitride (BN), which has many unique properties distinct from graphene. Such a knowledge gap greatly affects the understanding of basic physical and chemical properties of atomically thin BN as well as the use of Raman spectroscopy to study these nanomaterials. Here, we use both experiment and simulation to reveal the intrinsic Raman signature of monolayer and few-layer BN. We find experimentally that atomically thin BN without interaction with substrate has a G band frequency similar to that of bulk hexagonal BN, but strain induced by substrate can cause pronounced Raman shifts. This is in excellent agreement with our first-principles density functional theory (DFT) calculations at two levels of theory, including van der Waals dispersion forces (opt-vdW) and a fractional of the exact exchange from Hartree-Fock (HF) theory through hybrid HSE06 functional. Both calculations demonstrate that the intrinsic E2g mode of BN does not depend sensibly on the number of layers. Our simulations also suggest the importance of the exact exchange mixing parameter in calculating the vibrational modes in BN, as it determines the fraction of HF exchange included in the DFT calculations.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Atomically Thin Boron Nitride as an Ideal Spacer for Metal-Enhanced Fluorescence
Authors:
Wei Gan,
Christos Tserkezis,
Qiran Cai,
Alexey Falin,
Srikanth Mateti,
Minh Nguyen,
Igor Aharonovich,
Kenji Watanabe,
Takashi Taniguchi,
Fumin Huang,
Li Song,
Lingxue Kong,
Ying Chen,
Lu Hua Li
Abstract:
The metal-enhanced fluorescence (MEF) considerably enhances the luminescence for various applications, but its performance largely depends on the dielectric spacer between the fluorophore and plasmonic system. It is still challenging to produce a defect-free spacer having an optimized thickness with a subnanometer accuracy that enables reusability without affecting the enhancement. In this study,…
▽ More
The metal-enhanced fluorescence (MEF) considerably enhances the luminescence for various applications, but its performance largely depends on the dielectric spacer between the fluorophore and plasmonic system. It is still challenging to produce a defect-free spacer having an optimized thickness with a subnanometer accuracy that enables reusability without affecting the enhancement. In this study, we demonstrate the use of atomically thin hexagonal boron nitride (BN) as an ideal MEF spacer owing to its multifold advantages over the traditional dielectric thin films. With rhodamine 6G as a representative fluorophore, it largely improves the enhancement factor (up to ~95+-5), sensitivity (10^-8 M), reproducibility, and reusability (~90% of the plasmonic activity is retained after 30 cycles of heating at 350 °C in air) of MEF. This can be attributed to its two-dimensional structure, thickness control at the atomic level, defect-free quality, high affinities to aromatic fluorophores, good thermal stability, and excellent impermeability. The atomically thin BN spacers could increase the use of MEF in different fields and industries.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Two-dimensional van der Waals Heterostructures for Synergistically Improved Surface Enhanced Raman Spectroscopy
Authors:
Qiran Cai,
Wei Gan,
Alexey Falin,
Kenji Watanabe,
Takashi Taniguchi,
**cheng Zhuang,
Weichang Hao,
Shaoming Huang,
Tao Tao,
Ying Chen,
Lu Hua Li
Abstract:
Surface enhanced Raman spectroscopy (SERS) is a precise and non-invasive analytical technique that is widely used in chemical analysis, environmental protection, food processing, pharmaceutics, and diagnostic biology. However, it is still a challenge to produce highly sensitive and reusable SERS substrates with minimum fluorescence background. In this work, we propose the use of van der Waals hete…
▽ More
Surface enhanced Raman spectroscopy (SERS) is a precise and non-invasive analytical technique that is widely used in chemical analysis, environmental protection, food processing, pharmaceutics, and diagnostic biology. However, it is still a challenge to produce highly sensitive and reusable SERS substrates with minimum fluorescence background. In this work, we propose the use of van der Waals heterostructures of two-dimensional materials (2D materials) to cover plasmonic metal nanoparticles to solve this challenge. The heterostructures of atomically thin boron nitride (BN) and graphene provide synergistic effects: (1) electrons could tunnel through the atomically thin BN, allowing the charge transfer between graphene and probe molecules to suppress fluorescence background; (2) the SERS sensitivity is enhanced by graphene via chemical enhancement mechanism (CM) in addition to electromagnetic field mechanism (EM); (3) the atomically thin BN protects the underlying graphene and Ag nanoparticles from oxidation during heating for regeneration at 360 °C in the air so that the SERS substrates could be reused. These advances will facilitate wider applications of SERS, especially on the detection of fluorescent molecules with higher sensitivity.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
Outstanding Thermal Conductivity of Single Atomic Layer Isotope-Modified Boron Nitride
Authors:
Qiran Cai,
Declan Scullion,
Wei Gan,
Alexey Falin,
Pavel Cizek,
Song Liu,
James H. Edgar,
Rong Liu,
Bruce C. C. Cowie,
Elton J. G. Santos,
Lu Hua Li
Abstract:
Materials with high thermal conductivities (k) is valuable to solve the challenge of waste heat dissipation in highly integrated and miniaturized modern devices. Herein, we report the first synthesis of atomically thin isotopically pure hexagonal boron nitride (BN) and its one of the highest k among all semiconductors and electric insulators. Single atomic layer (1L) BN enriched with 11B has a k u…
▽ More
Materials with high thermal conductivities (k) is valuable to solve the challenge of waste heat dissipation in highly integrated and miniaturized modern devices. Herein, we report the first synthesis of atomically thin isotopically pure hexagonal boron nitride (BN) and its one of the highest k among all semiconductors and electric insulators. Single atomic layer (1L) BN enriched with 11B has a k up to 1009 W/mK at room temperature. We find that the isotope engineering mainly suppresses the out-of-plane optical (ZO) phonon scatterings in BN, which subsequently reduces acoustic-optical scatterings between ZO and transverse acoustic (TA) and longitudinal acoustic (LA) phonons. On the other hand, reducing the thickness to single atomic layer diminishes the interlayer interactions and hence Umklapp scatterings of the out-of-plane acoustic (ZA) phonons, though this thickness-induced k enhancement is not as dramatic as that in naturally occurring BN. With many of its unique properties, atomically thin monoisotopic BN is promising on heat management in van der Waals (vdW) devices and future flexible electronics. The isotope engineering of atomically thin BN may also open up other appealing applications and opportunities in 2D materials yet to be explored.
△ Less
Submitted 21 August, 2020; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Electronic polarizability as the fundamental variable in the dielectric properties of two-dimensional materials
Authors:
Tian Tian,
Declan Scullion,
Dale Hughes,
Lu Hua Li,
Chih-Jen Shih,
Jonathan Coleman,
Manish Chhowalla,
Elton J. G. Santos
Abstract:
The dielectric constant, which defines the polarization of the media, is a key quantity in condensed matter. It determines several electronic and optoelectronic properties important for a plethora of modern technologies from computer memory to field effect transistors and communication circuits. Moreover, the importance of the dielectric constant in describing electromagnetic interactions through…
▽ More
The dielectric constant, which defines the polarization of the media, is a key quantity in condensed matter. It determines several electronic and optoelectronic properties important for a plethora of modern technologies from computer memory to field effect transistors and communication circuits. Moreover, the importance of the dielectric constant in describing electromagnetic interactions through screening plays a critical role in understanding fundamental molecular interactions. Here we show that despite its fundamental transcendence, the dielectric constant does not define unequivocally the dielectric properties of two-dimensional (2D) materials due to the locality of their electrostatic screening. Instead, the electronic polarizability correctly captures the dielectric nature of a 2D material which is united to other physical quantities in an atomically thin layer. We reveal a long-sought universal formalism where electronic, geometrical and dielectric properties are intrinsically correlated through the polarizability opening the door to probe quantities yet not directly measurable including the real covalent thickness of a layer. We unify the concept of dielectric properties in any material dimension finding a global dielectric anisotropy index defining their controllability through dimensionality.
△ Less
Submitted 22 December, 2019;
originally announced December 2019.
-
VisualBERT: A Simple and Performant Baseline for Vision and Language
Authors:
Liunian Harold Li,
Mark Yatskar,
Da Yin,
Cho-Jui Hsieh,
Kai-Wei Chang
Abstract:
We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experim…
▽ More
We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments.
△ Less
Submitted 9 August, 2019;
originally announced August 2019.
-
High thermal conductivity of high-quality monolayer boron nitride and its thermal expansion
Authors:
Qiran Cai,
Declan Scullion,
Wei Gan,
Aleksey Falin,
Shunying Zhang,
Kenji Watanabe,
Takashi Taniguchi,
Ying Chen,
Elton J. G. Santos,
Lu Hua Li
Abstract:
Heat management becomes more and more critical, especially in miniaturized modern devices, so the exploration of highly thermally conductive materials with electrical insulation and favorable mechanical properties is of great importance. Here, we report that high-quality monolayer boron nitride (BN) has a thermal conductivity (\k{appa}) of 751 W/mK at room temperature. Though smaller than that of…
▽ More
Heat management becomes more and more critical, especially in miniaturized modern devices, so the exploration of highly thermally conductive materials with electrical insulation and favorable mechanical properties is of great importance. Here, we report that high-quality monolayer boron nitride (BN) has a thermal conductivity (\k{appa}) of 751 W/mK at room temperature. Though smaller than that of graphene, this value is larger than that of cubic boron nitride (cBN) and only second to those of diamond and lately discovered cubic boron arsenide (BAs). Monolayer BN has the second largest \k{appa} per unit weight among all semiconductors and insulators, just behind diamond, if density is considered. The \k{appa} of atomically thin BN decreases with increased thickness. Our large-scale molecular dynamic simulations using Green-Kubo formalism accurately reproduce this trend, and the density functional theory (DFT) calculations reveal the main scattering mechanism. The thermal expansion coefficients (TECs) of monolayer to trilayer BN at 300-400 K are also experimentally measured, and the results are comparable to atomistic ab initio DFT calculations in a wider range of temperatures. Thanks to its wide bandgap, high thermal conductivity, outstanding strength, good flexibility, and excellent thermal and chemical stability, atomically thin BN is a strong candidate for heat dissipation applications, especially in the next generation of flexible electronic devices.
△ Less
Submitted 26 March, 2019; v1 submitted 21 March, 2019;
originally announced March 2019.
-
Efficient Contextual Representation Learning Without Softmax Layer
Authors:
Liunian Harold Li,
Patrick H. Chen,
Cho-Jui Hsieh,
Kai-Wei Chang
Abstract:
Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. T…
▽ More
Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. Therefore, we redesign the learning objective and propose an efficient framework for training contextual representation models. Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings. Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary. When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.
△ Less
Submitted 28 February, 2019;
originally announced February 2019.
-
Giant optical nonlinearity cancellation in quantum wells
Authors:
S. Houver,
A. Lebreton,
T. A. S. Pereira,
G. Xu,
R. Colombelli,
I. Kundu,
L. H. Li,
E. H. Linfield,
A. G. Davies,
J. Mangeney,
J. Tignon,
R. Ferreira,
S. S. Dhillon
Abstract:
Second-order optical nonlinearities can be greatly enhanced by orders of magnitude in resonantly excited nanostructures, theoretically predicted and experimentally investigated in a variety of semiconductor systems. These resonant nonlinearities continually attract attention, particularly in newly discovered materials, but tend not to be as efficient as currently predicted. This limits their explo…
▽ More
Second-order optical nonlinearities can be greatly enhanced by orders of magnitude in resonantly excited nanostructures, theoretically predicted and experimentally investigated in a variety of semiconductor systems. These resonant nonlinearities continually attract attention, particularly in newly discovered materials, but tend not to be as efficient as currently predicted. This limits their exploitation in frequency conversion. Here, we present a clear-cut theoretical and experimental demonstration that the second-order nonlinear susceptibility can vary by orders of magnitude as a result of giant cancellation effects in systems with many confined quantum states. Using terahertz quantum cascade lasers as a model source to investigate interband and intersubband resonant nonlinearities, we show that these giant cancellations are a result of interfering second-order nonlinear contributions of light and heavy hole states. As well as of importance to understand and engineer the resonant optical properties of materials, this work can be employed as a new, extremely sensitive tool to elucidate the bandstructure properties of complex quantum well systems.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Asymmetric Electric Field Screening in van der Waals Heterostructures
Authors:
Lu Hua Li,
Tian Tian,
Qiran Cai,
Chih-Jen Shih,
Elton J. G. Santos
Abstract:
Electric field screening plays an important role in the physical and chemical properties of materials and their devices. Here, we use a compelling set of theoretical and experimental techniques involving van der Waals (vdW) ab initio density functional theory (DFT) simulations, quantum capacitance-based classical model and electric force microscopy (EFM) to elucidate the intrinsic dielectric scree…
▽ More
Electric field screening plays an important role in the physical and chemical properties of materials and their devices. Here, we use a compelling set of theoretical and experimental techniques involving van der Waals (vdW) ab initio density functional theory (DFT) simulations, quantum capacitance-based classical model and electric force microscopy (EFM) to elucidate the intrinsic dielectric screening properties of vdW heterostructures (vdWHs) formed by MoS2 and graphene layers. We experimentally observed an asymmetric electric response in the MoS2/Graphene vdWHs under different directions of the external electric field. That is, when the electric fields are shed towards graphene, a large amount of polarized charges screen the fields, but as the sign of the field was reversed, a strong depolarization field was present, and a partial screening was detected. This effect is thickness-dependent, in particular on the number of the MoS2 layers; whereas increased thickness of graphene showed a small effect on their electrical and screening behavior. Our results indicate that asymmetric dipolar contributions at the interface between graphene and MoS2 are the main cause to the unusual field-effect screening in the vdWHs. This work not only provides new insights on the screening properties of a vast amount of heterojunction fabricated so far, but also uncovers the great potential of controlling a fundamental property, such as screening, for device applications.
△ Less
Submitted 7 February, 2018;
originally announced February 2018.
-
Ultrafast terahertz detectors based on three-dimensional meta-atoms
Authors:
B. Paulillo,
S. Pirotta,
H. Nong,
P. Crozat,
S. Guilet,
G. Xu,
S. Dhillon,
L. H. Li,
A. G. Davies,
E. H. Linfield,
R. Colombelli
Abstract:
Terahertz (THz) and sub-THz frequency emitter and detector technologies are receiving increasing attention, underpinned by emerging applications in ultra-fast THz physics, frequency-combs technology and pulsed laser development in this relatively unexplored region of the electromagnetic spectrum. In particular, semiconductor-based ultrafast THz receivers are required for compact, ultrafast spectro…
▽ More
Terahertz (THz) and sub-THz frequency emitter and detector technologies are receiving increasing attention, underpinned by emerging applications in ultra-fast THz physics, frequency-combs technology and pulsed laser development in this relatively unexplored region of the electromagnetic spectrum. In particular, semiconductor-based ultrafast THz receivers are required for compact, ultrafast spectroscopy and communication systems, and to date, quantum well infrared photodetectors (QWIPs) have proved to be an excellent technology to address this given their intrinsic ps-range response However, with research focused on diffraction-limited QWIP structures (lambda/2), RC constants cannot be reduced indefinitely, and detection speeds are bound to eventually meet un upper limit. The key to an ultra-fast response with no intrinsic upper limit even at tens of GHz is an aggressive reduction in device size, below the diffraction limit. Here we demonstrate sub-wavelength (lambda/10) THz QWIP detectors based on a 3D split-ring geometry, yielding ultra-fast operation at a wavelength of around 100 μm. Each sensing meta-atom pixel features a suspended loop antenna that feeds THz radiation in the ~20 m3 active volume. Arrays of detectors as well as single-pixel detectors have been implemented with this new architecture, with the latter exhibiting ultra-low dark currents below the nA level. This extremely small resonator architecture leads to measured optical response speeds - on arrays of 300 devices - of up to ~3 GHz and an expected device operation of up to tens of GHz, based on the measured S-parameters on single devices and arrays.
△ Less
Submitted 19 January, 2018;
originally announced January 2018.
-
Hunting micrometer-sized graphene flakes on gold substrate
Authors:
Joel M. Katzen,
Matěj Velický,
Yuefeng Huang,
Stacey Drakeley,
William Hendren,
Robert M. Bowman,
Qiran Cai,
Ying Chen,
Lu Hua Li,
Fumin Huang
Abstract:
Gold is widely used as the substrate material in many graphene devices, due to its superior optoelectronic properties and chemical stability. However, there has been little experimental investigation on the optical contrast of graphene films on Au substrates. Here we report accurate measurement of the optical contrast spectra of few-layer graphene flakes on bulk Au. We used a high-resolution optic…
▽ More
Gold is widely used as the substrate material in many graphene devices, due to its superior optoelectronic properties and chemical stability. However, there has been little experimental investigation on the optical contrast of graphene films on Au substrates. Here we report accurate measurement of the optical contrast spectra of few-layer graphene flakes on bulk Au. We used a high-resolution optical microscopy with a 100x magnification objective, accurately determining the thickness of flakes as small as one micrometer in lateral size, which are highly desired in many applications. The results are in excellent agreement with theoretical calculations and confirmed by Raman and AFM measurements. Furthermore, we demonstrate that the optical contrast spectroscopy is sensitive enough to detect the adsorption of a sub-monolayer airborne hydrocarbon molecules, which can reveal whether graphene is con-taminated and opens the opportunity to develop miniaturized and ultrasensitive molecular sensors.
△ Less
Submitted 17 January, 2018;
originally announced January 2018.
-
Molecule-Induced Conformational Change in Boron Nitride Nanosheets with Enhanced Surface Adsorption
Authors:
Qiran Cai,
Aijun Du,
Guo** Gao,
Srikanth Mateti,
Bruce C. C. Cowie,
Dong Qian,
Shuang Zhang,
Yuerui Lu,
Lan Fu,
Takashi Taniguchi,
Shaoming Huang,
Ying Chen,
Rodney S. Ruoff,
Lu Hua Li
Abstract:
Surface interaction is extremely important to both fundamental research and practical application. Physisorption can induce shape and structural distortion (i.e. conformational changes) in macromolecular and biomolecular adsorbates, but such phenomenon has rarely been observed on adsorbents. Here, we demonstrate theoretically and experimentally that atomically thin boron nitride (BN) nanosheets as…
▽ More
Surface interaction is extremely important to both fundamental research and practical application. Physisorption can induce shape and structural distortion (i.e. conformational changes) in macromolecular and biomolecular adsorbates, but such phenomenon has rarely been observed on adsorbents. Here, we demonstrate theoretically and experimentally that atomically thin boron nitride (BN) nanosheets as an adsorbent experience conformational changes upon surface adsorption of molecules, increasing adsorption energy and efficiency. The study not only provides new perspectives on the strong adsorption capability of BN nanosheets and many other two-dimensional nanomaterials but also opens up possibilities for many novel applications. For example, we demonstrate that BN nanosheets with the same surface area as bulk hBN particles are more effective in purification and sensing.
△ Less
Submitted 8 December, 2016;
originally announced December 2016.
-
Boron Nitride Nanosheets Improve Sensitivity and Reusability of Surface Enhanced Raman Spectroscopy
Authors:
Qiran Cai,
Srikanth Mateti,
Wenrong Yang,
Rob Jones,
Kenji Watanabe,
Takashi Taniguchi,
Shaoming Huang,
Ying Chen,
Lu Hua Li
Abstract:
Surface enhanced Raman spectroscopy (SERS) is a useful multidisciplinary analytic technique. However, it is still a challenge to produce SERS substrates that are highly sensitive, reproducible, stable, reusable, and scalable. Here, we demonstrate that atomically thin boron nitride (BN) nanosheets have many unique and desirable properties to help solve this challenge. The synergic effect of the ato…
▽ More
Surface enhanced Raman spectroscopy (SERS) is a useful multidisciplinary analytic technique. However, it is still a challenge to produce SERS substrates that are highly sensitive, reproducible, stable, reusable, and scalable. Here, we demonstrate that atomically thin boron nitride (BN) nanosheets have many unique and desirable properties to help solve this challenge. The synergic effect of the atomic thickness, high flexibility, stronger surface adsorption capability, electrical insulation, impermeability, high thermal and chemical stability of BN nanosheets can increase the Raman sensitivity by up to two orders, and in the meantime attain long-term stability and extraordinary reusability not achievable by other materials. These advances will greatly facilitate the wider use of SERS in many fields.
△ Less
Submitted 12 July, 2016;
originally announced July 2016.
-
Boron Nitride Nanosheet Veiled Gold Nanoparticles for Surface Enhanced Raman Scattering
Authors:
Qiran Cai,
Srikanth Mateti,
Kenji Watanabe,
Takashi Taniguchi,
Shaoming Huang,
Ying Chen,
Lu Hua Li
Abstract:
Atomically thin boron nitride (BN) nanosheets have many properties desirable for surface enhanced Raman spectroscopy (SERS). BN nanosheets have a strong surface adsorption capability towards airborne hydrocarbon and aromatic molecules. For maximized adsorption area and hence SERS sensitivity, atomically thin BN nanosheet covered gold nanoparticles have been prepared for the first time. When placed…
▽ More
Atomically thin boron nitride (BN) nanosheets have many properties desirable for surface enhanced Raman spectroscopy (SERS). BN nanosheets have a strong surface adsorption capability towards airborne hydrocarbon and aromatic molecules. For maximized adsorption area and hence SERS sensitivity, atomically thin BN nanosheet covered gold nanoparticles have been prepared for the first time. When placed on top of metal nanoparticles, atomically thin BN nanosheets closely follow their contours so that the plasmonic hot spots are retained. Electrically insulating BN nanosheets also act as a barrier layer to eliminate metal-induced disturbance in SERS. Moreover, the SERS substrates veiled by BN nanosheets show an outstanding reusability in the long term. As the result, the sensitivity, reproducibility and reusability of SERS substrates can be greatly improved. We also demonstrate that large BN nanosheets produced by chemical vapor deposition can be used to scale up the proposed SERS substrate for practical application.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Distributed feedback terahertz frequency quantum cascade lasers with dual periodicity gratings
Authors:
F. Castellano,
S. Zanotto,
L. H. Li,
A. Pitanti,
A. Tredicucci,
E. H. Linfield,
A. G. Davies,
M. S. Vitiello
Abstract:
We have developed terahertz frequency quantum cascade lasers that exploit a double-periodicity distributed feedback grating to control the emission frequency and the output beam direction independently. The spatial refractive index modulation of the gratings necessary to provide optical feedback at a fixed frequency and, simultaneously, a far-field emission pattern centered at controlled angles, w…
▽ More
We have developed terahertz frequency quantum cascade lasers that exploit a double-periodicity distributed feedback grating to control the emission frequency and the output beam direction independently. The spatial refractive index modulation of the gratings necessary to provide optical feedback at a fixed frequency and, simultaneously, a far-field emission pattern centered at controlled angles, was designed through use of an appropriate wavevector scattering model. Single mode THz emission at angles tuned by design between 0° and 50° was realized, leading to an original phase-matching approach, lithographically independent, for highly collimated THz QCLs.
△ Less
Submitted 30 May, 2016;
originally announced May 2016.
-
Atomically Thin Boron Nitride: Unique Properties and Applications
Authors:
Lu Hua Li,
Ying Chen
Abstract:
Atomically thin boron nitride (BN) is an important two-dimensional (2D) nanomaterial, with many properties distinct from graphene. In this feature article, these unique properties and associated applications often not possible from graphene are outlined. The article starts with characterization and identification of atomically thin BN. It is followed by demonstrating their strong oxidation resista…
▽ More
Atomically thin boron nitride (BN) is an important two-dimensional (2D) nanomaterial, with many properties distinct from graphene. In this feature article, these unique properties and associated applications often not possible from graphene are outlined. The article starts with characterization and identification of atomically thin BN. It is followed by demonstrating their strong oxidation resistance at high temperatures and applications in protecting metals from oxidation and corrosion. As flat insulators, BN nanosheets are ideal dielectric substrates for surface enhanced Raman spectroscopy (SERS) and electronic devices based on 2D heterostructures. The light emission of BN nanosheets in the deep ultraviolet (DUV) and ultraviolet (UV) regions are also included for its scientific and technological importance. The last part is dedicated to synthesis, characterization, and optical properties of BN nanoribbons, a special form of nanosheets.
△ Less
Submitted 4 May, 2016;
originally announced May 2016.
-
Quantum Emission from Defects in Single Crystal Hexagonal Boron Nitride
Authors:
Toan Trong Tran,
Cameron Zachreson,
Amanuel Michael Berhane,
Kerem Bray,
Russell Guy Sandstrom,
Lu Hua Li,
Takashi Taniguchi,
Kenji Watanabe,
Igor Aharonovich,
Milos Toth
Abstract:
Bulk hexagonal boron nitride (hBN) is a highly nonlinear natural hyperbolic material that attracts major attention in modern nanophotonics applications. However, studies of its optical properties in the visible part of the spectrum and quantum emitters hosted by bulk hBN have not been reported to date. In this work we study the emission properties of hBN crystals in the red spectral range using su…
▽ More
Bulk hexagonal boron nitride (hBN) is a highly nonlinear natural hyperbolic material that attracts major attention in modern nanophotonics applications. However, studies of its optical properties in the visible part of the spectrum and quantum emitters hosted by bulk hBN have not been reported to date. In this work we study the emission properties of hBN crystals in the red spectral range using sub-bandgap optical excitation. Quantum emission from defects is observed at room temperature and characterized in detail. Our results advance the use of hBN in quantum nanophotonics technologies and enhance our fundamental understanding of its optical properties.
△ Less
Submitted 7 March, 2016;
originally announced March 2016.
-
Boron Nitride Nanosheets as Improved and Reusable Substrates for Gold Nanoparticles Enabled Surface Enhanced Raman Spectroscopy
Authors:
Qiran Cai,
Lu Hua Li,
Yuanlie Yu,
Yun Liu,
Shaoming Huang,
Ying Chen,
Kenji Watanabe,
Takashi Taniguchi
Abstract:
Atomically thin boron nitride (BN) nanosheets have been found an excellent substrate for noble metal particles enabled surface enhanced Raman spectroscopy (SERS), thanks to their good adsorption of aromatic molecules, high thermal stability and weak Raman scattering. Faceted gold (Au) nanoparticles have been synthesized on BN nanosheets by a simple but controllable and reproducible sputtering and…
▽ More
Atomically thin boron nitride (BN) nanosheets have been found an excellent substrate for noble metal particles enabled surface enhanced Raman spectroscopy (SERS), thanks to their good adsorption of aromatic molecules, high thermal stability and weak Raman scattering. Faceted gold (Au) nanoparticles have been synthesized on BN nanosheets by a simple but controllable and reproducible sputtering and annealing method. The size and density of the Au particles can be controlled by sputtering time, current and annealing temperature etc. Under the same sputtering and annealing conditions, the Au particles on BN of different thicknesses show various sizes because the surface diffusion coefficients of Au depends on the thickness of BN. Intriguingly, decorated with similar morphology and distribution of Au particles, BN nanosheets exhibit better Raman enhancements than silicon substrate as well as bulk BN crystals. Additionally, BN nanosheets show no noticeable SERS signal and hence cause no interference to the Raman signal of analyte. The Au/BN substrates can be reused by heating in air to remove adsorbed analyte without loss of SERS enhancement.
△ Less
Submitted 11 March, 2015;
originally announced March 2015.
-
Boron Nitride Nanosheets for Metal Protection
Authors:
Lu Hua Li,
Tan Xing,
Ying Chen,
Rob Jones
Abstract:
Although the high impermeability of graphene makes it an excellent barrier to inhibit metal oxidation and corrosion, graphene can form a galvanic cell with the underlying metal that promotes corrosion of the metal in the long term. Boron nitride (BN) nanosheets which have a similar impermeability could be a better choice as protective barrier, because they are more thermally and chemically stable…
▽ More
Although the high impermeability of graphene makes it an excellent barrier to inhibit metal oxidation and corrosion, graphene can form a galvanic cell with the underlying metal that promotes corrosion of the metal in the long term. Boron nitride (BN) nanosheets which have a similar impermeability could be a better choice as protective barrier, because they are more thermally and chemically stable than graphene and, more importantly, do not cause galvanic corrosion due to their electrical insulation. In this study, the performance of commercially available BN nanosheets grown by chemical vapor deposition as a protective coating on metal has been investigated. The heating of the copper foil covered with the BN nanosheet at 250 °C in air over 100 h results in dramatically less oxidation than the bare copper foil heated for 2 h under the same conditions. The electrochemical analyses reveal that the BN nanosheet coating can increase open circuit potential and possibly reduce oxidation of the underlying copper foil in sodium chloride solution. These results indicate that BN nanosheets are a good candidate for oxidation and corrosion protection, although conductive atomic force microscopy analyses show that the effectiveness of the protection relies on the quality of BN nanosheets.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
Electric Contributions to Magnetic Force Microscopy Response from Graphene and MoS2 Nanosheets
Authors:
Lu Hua Li,
Ying Chen
Abstract:
Magnetic force microscopy (MFM) signals have recently been detected from whole pieces of mechanically exfoliated graphene and molybdenum disulfide (MoS2) nanosheets and magnetism of the two nanomaterials was claimed based on these observations. However, non-magnetic interactions or artefacts are commonly associated with MFM signals, which makes the interpretation of MFM signals not straightforward…
▽ More
Magnetic force microscopy (MFM) signals have recently been detected from whole pieces of mechanically exfoliated graphene and molybdenum disulfide (MoS2) nanosheets and magnetism of the two nanomaterials was claimed based on these observations. However, non-magnetic interactions or artefacts are commonly associated with MFM signals, which makes the interpretation of MFM signals not straightforward. A systematic investigation has been done to examine possible sources of the MFM signals from graphene and MoS2 nanosheets and whether the MFM signals can be correlated with magnetism. It is found that the MFM signals have significant non-magnetic contributions due to capacitive and electrostatic interactions between the nanosheets and conductive cantilever tip, as demonstrated by electric force microscopy (EFM) and scanning Kevin probe microscopy (SKPM) analyses. In addition, the MFM signals of graphene and MoS2 nanosheets are not responsive to reversed magnetic field of the magnetic cantilever tip. Therefore, the observed MFM response is mainly from electric artefacts and not compelling enough to correlate with magnetism of graphene and MoS2 nanosheets.
△ Less
Submitted 4 March, 2015;
originally announced March 2015.
-
Dielectric Screening in Atomically Thin Boron Nitride Nanosheets
Authors:
Lu Hua Li,
Elton J. G. Santos,
Tan Xing,
Emmanuele Cappelluti,
Rafael Roldán,
Ying Chen,
Kenji Watanabe,
Takashi Taniguchi
Abstract:
Two-dimensional (2D) hexagonal boron nitride (BN) nanosheets are excellent dielectric substrate for graphene, molybdenum disulfide and many other 2D nanomaterials based electronic and photonic devices. To optimize the performance of these 2D devices, it is essential to understand the dielectric screening properties of BN nanosheets as a function of the thickness. Here, electric force microscopy al…
▽ More
Two-dimensional (2D) hexagonal boron nitride (BN) nanosheets are excellent dielectric substrate for graphene, molybdenum disulfide and many other 2D nanomaterials based electronic and photonic devices. To optimize the performance of these 2D devices, it is essential to understand the dielectric screening properties of BN nanosheets as a function of the thickness. Here, electric force microscopy along with theoretical calculations based on both state-of-the-art first-principles calculations with van der Waals interactions under consideration and non-linear Thomas-Fermi theory models are used to investigate the dielectric screening in high-quality BN nanosheets of different thicknesses. It is found that atomically thin BN nanosheets are less effective in electric field screening, but the screening capability of BN shows a relatively weak dependence on the layer thickness.
△ Less
Submitted 1 March, 2015;
originally announced March 2015.
-
Observation of Active Sites for Oxygen Reduction Reaction on Nitrogen-doped Multilayer Graphene
Authors:
Tan Xing,
Yao Zheng,
Lu Hua Li,
Bruce C. C. Cowie,
Daniel Gunzelmann,
Shi Zhang Qiao,
Shaoming Huang,
Ying Chen
Abstract:
Active sites and catalytic mechanism of nitrogen-doped graphene in oxygen reduction reaction (ORR) have been extensively studied but are still inconclusive, partly due to the lack of an experimental method that can detect the active sites. It is proposed in this report that the active sites on nitrogen-doped graphene can be determined via the examination of its chemical composition change before a…
▽ More
Active sites and catalytic mechanism of nitrogen-doped graphene in oxygen reduction reaction (ORR) have been extensively studied but are still inconclusive, partly due to the lack of an experimental method that can detect the active sites. It is proposed in this report that the active sites on nitrogen-doped graphene can be determined via the examination of its chemical composition change before and after ORR. Synchrotron-based X-ray photoelectron spectroscopy analyses of three nitrogen-doped multilayer graphene samples reveal that oxygen reduction intermediate OH(ads) which should chemically attach to the active sites remains on the carbon atoms neighboring pyridinic nitrogen after ORR. In addition, a high amount of the OH(ads) attachment after ORR corresponds to a high catalytic efficiency and vice versa. These pinpoint that the carbon atoms close to pyridinic nitrogen are the main active sites among the different nitrogen do** configurations.
△ Less
Submitted 28 October, 2014;
originally announced October 2014.
-
Strong Oxidation Resistance of Atomically Thin Boron Nitride Nanosheets
Authors:
Lu Hua Li,
Jiri Cervenka,
Kenji Watanabe,
Takashi Taniguchi,
Ying Chen
Abstract:
Investigation on oxidation resistance of two-dimensional (2D) materials is critical for many of their applications, because 2D materials could have higher oxidation kinetics than their bulk counterparts due to predominant surface atoms and structural distortions. In this study, the oxidation behavior of high-quality boron nitride (BN) nanosheets of 1-4 layer thick has been examined by heating in a…
▽ More
Investigation on oxidation resistance of two-dimensional (2D) materials is critical for many of their applications, because 2D materials could have higher oxidation kinetics than their bulk counterparts due to predominant surface atoms and structural distortions. In this study, the oxidation behavior of high-quality boron nitride (BN) nanosheets of 1-4 layer thick has been examined by heating in air. Atomic force microscopy and Raman spectroscopy analyses reveal that monolayer BN nanosheets can sustain up to 850 °C and the starting temperature of oxygen do**/oxidation of BN nanosheets only slightly increases with the increase of nanosheet layer and depends on heating conditions. Elongated etch lines are found on the oxidized monolayer BN nanosheets, suggesting that the BN nanosheets are first cut along the chemisorbed oxygen chains and then the oxidative etching grows perpendicularly to these cut lines. The stronger oxidation resistance of BN nanosheets suggests that they are more preferable for high-temperature applications than graphene.
△ Less
Submitted 4 March, 2014;
originally announced March 2014.