Search | arXiv e-print repository

arXiv:2405.19315 [pdf, other]

Matryoshka Query Transformer for Large Vision-Language Models

Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources? We answer this with an emphatic yes. Inspired by Matryoshka Representation Learning, we introduce the Matryoshka Query Transformer (MQT), capable of encoding an image into m visual tokens during inference, where m can be any number up to a predefined maximum. This is achieved by employing a query transformer with M latent query tokens to compress the visual embeddings. During each training step, we randomly select m <= M latent query tokens and train the model using only these first m tokens, discarding the rest. Combining MQT with LLaVA, we train a single model once, and flexibly and drastically reduce the number of inference-time visual tokens while maintaining similar or better performance compared to training independent models for each number of tokens. Our model, MQT-LLAVA, matches LLaVA-1.5 performance across 11 benchmarks using a maximum of 256 tokens instead of LLaVA's fixed 576. Reducing to 16 tokens (8x less TFLOPs) only sacrifices the performance by 2.4 points on MMBench. On certain tasks such as ScienceQA and MMMU, we can even go down to only 2 visual tokens with performance drops of just 3% and 6% each. Our exploration of the trade-off between the accuracy and computational cost brought about by the number of visual tokens facilitates future research to achieve the best of both worlds. △ Less

Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

arXiv:2311.02805 [pdf, other]

Tailoring Self-Rationalizers with Multi-Reward Distillation

Authors: Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Ye** Choi, Xiang Ren

Abstract: Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In thi… ▽ More Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency. △ Less

Submitted 22 May, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

Journal ref: The Twelfth International Conference on Learning Representations, 2024

arXiv:2306.14060 [pdf, other]

DesCo: Learning Object Recognition with Rich Language Descriptions

Authors: Liunian Harold Li, Zi-Yi Dou, Nanyun Peng, Kai-Wei Chang

Abstract: Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models' adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions th… ▽ More Recent development in vision-language approaches has instigated a paradigm shift in learning visual recognition models from language supervision. These approaches align objects with language queries (e.g. "a photo of a cat") and improve the models' adaptability to identify novel objects and domains. Recently, several studies have attempted to query these models with complex language expressions that include specifications of fine-grained semantic details, such as attributes, shapes, textures, and relations. However, simply incorporating language descriptions as queries does not guarantee accurate interpretation by the models. In fact, our experiments show that GLIP, the state-of-the-art vision-language model for object detection, often disregards contextual information in the language descriptions and instead relies heavily on detecting objects solely by their names. To tackle the challenges, we propose a new description-conditioned (DesCo) paradigm of learning object recognition models with rich language descriptions consisting of two major innovations: 1) we employ a large language model as a commonsense knowledge engine to generate rich language descriptions of objects based on object names and the raw image-text caption; 2) we design context-sensitive queries to improve the model's ability in deciphering intricate nuances embedded within descriptions and enforce the model to focus on context rather than object names alone. On two novel object detection benchmarks, LVIS and OminiLabel, under the zero-shot detection setting, our approach achieves 34.8 APr minival (+9.1) and 29.3 AP (+3.6), respectively, surpassing the prior state-of-the-art models, GLIP and FIBER, by a large margin. △ Less

Submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.14050 [pdf, other]

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step

Authors: Liunian Harold Li, Jack Hessel, Youngjae Yu, Xiang Ren, Kai-Wei Chang, Ye** Choi

Abstract: Chain-of-thought prompting (e.g., "Let's think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M -- 1.3B parameters) can still benefit from chain-of-th… ▽ More Chain-of-thought prompting (e.g., "Let's think step-by-step") primes large language models to verbalize rationalization for their predictions. While chain-of-thought can lead to dramatic performance gains, benefits appear to emerge only for sufficiently large models (beyond 50B parameters). We show that orders-of-magnitude smaller models (125M -- 1.3B parameters) can still benefit from chain-of-thought prompting. To achieve this, we introduce Symbolic Chain-of-Thought Distillation (SCoTD), a method to train a smaller student model on rationalizations sampled from a significantly larger teacher model. Experiments across several commonsense benchmarks show that: 1) SCoTD enhances the performance of the student model in both supervised and few-shot settings, and especially for challenge sets; 2) sampling many reasoning chains per instance from the teacher is paramount; and 3) after distillation, student chain-of-thoughts are judged by humans as comparable to the teacher, despite orders of magnitude fewer parameters. We test several hypotheses regarding what properties of chain-of-thought samples are important, e.g., diversity vs. teacher likelihood vs. open-endedness. We release our corpus of chain-of-thought samples and code. △ Less

Submitted 15 April, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

Comments: ACL 2023

arXiv:2306.01311 [pdf, other]

MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models

Authors: Masoud Monajatipoor, Liunian Harold Li, Mozhdeh Rouhsedaghat, Lin F. Yang, Kai-Wei Chang

Abstract: Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothes… ▽ More Large-scale language models have shown the ability to adapt to a new task via conditioning on a few demonstrations (i.e., in-context learning). However, in the vision-language domain, most large-scale pre-trained vision-language (VL) models do not possess the ability to conduct in-context learning. How can we enable in-context learning for VL models? In this paper, we study an interesting hypothesis: can we transfer the in-context learning ability from the language domain to VL domain? Specifically, we first meta-trains a language model to perform in-context learning on NLP tasks (as in MetaICL); then we transfer this model to perform VL tasks by attaching a visual encoder. Our experiments suggest that indeed in-context learning ability can be transferred cross modalities: our model considerably improves the in-context learning capability on VL tasks and can even compensate for the size of the model significantly. On VQA, OK-VQA, and GQA, our method could outperform the baseline model while having 20 times fewer parameters. △ Less

Submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.14895 [pdf, other]

doi 10.1088/1674-4527/acd593

The Lobster Eye Imager for Astronomy Onboard the SATech-01 Satellite

Authors: Z. X. Ling, X. J. Sun, C. Zhang, S. L. Sun, G. **, S. N. Zhang, X. F. Zhang, J. B. Chang, F. S. Chen, Y. F. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, Z. D. Li, P. R. Liu, Y. H. Lv, X. H. Ma, Y. J. Tang, C. B. Wang, R. J. Xie, Y. L. Xue, A. L. Yan , et al. (101 additional authors not shown)

Abstract: The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (Fo… ▽ More The Lobster Eye Imager for Astronomy (LEIA), a pathfinder of the Wide-field X-ray Telescope of the Einstein Probe (EP) mission, was successfully launched onboard the SATech-01 satellite of the Chinese Academy of Sciences on 27 July 2022. In this paper, we introduce the design and on-ground test results of the LEIA instrument. Using state-of-the-art Micro-Pore Optics (MPO), a wide field-of-view (FoV) of 346 square degrees (18.6 degrees * 18.6 degrees) of the X-ray imager is realized. An optical assembly composed of 36 MPO chips is used to focus incident X-ray photons, and four large-format complementary metal-oxide semiconductor (CMOS) sensors, each of 6 cm * 6 cm, are used as the focal plane detectors. The instrument has an angular resolution of 4 - 8 arcmin (in FWHM) for the central focal spot of the point spread function, and an effective area of 2 - 3 cm2 at 1 keV in essentially all the directions within the field of view. The detection passband is 0.5 - 4 keV in the soft X-rays and the sensitivity is 2 - 3 * 10-11 erg s-1 cm-2 (about 1 mini-Crab) at 1,000 second observation. The total weight of LEIA is 56 kg and the power is 85 W. The satellite, with a design lifetime of 2 years, operates in a Sun-synchronous orbit of 500 km with an orbital period of 95 minutes. LEIA is paving the way for future missions by verifying in flight the technologies of both novel focusing imaging optics and CMOS sensors for X-ray observation, and by optimizing the working setups of the instrumental parameters. In addition, LEIA is able to carry out scientific observations to find new transients and to monitor known sources in the soft X-ray band, albeit limited useful observing time available. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted by RAA

arXiv:2301.07804 [pdf, other]

doi 10.1103/PhysRevLett.130.220602

Evolution of $1/f$ Flux Noise in Superconducting Qubits with Weak Magnetic Fields

Authors: David A. Rower, Lamia Ateshian, Lauren H. Li, Max Hays, Dolev Bluvstein, Leon Ding, Bharath Kannan, Aziza Almanakly, Jochen Braumüller, David K. Kim, Alexander Melville, Bethany M. Niedzielski, Mollie E. Schwartz, Jonilyn L. Yoder, Terry P. Orlando, Joel I-Jan Wang, Simon Gustavsson, Jeffrey A. Grover, Kyle Serniak, Riccardo Comin, William D. Oliver

Abstract: The microscopic origin of $1/f$ magnetic flux noise in superconducting circuits has remained an open question for several decades despite extensive experimental and theoretical investigation. Recent progress in superconducting devices for quantum information has highlighted the need to mitigate sources of qubit decoherence, driving a renewed interest in understanding the underlying noise mechanism… ▽ More The microscopic origin of $1/f$ magnetic flux noise in superconducting circuits has remained an open question for several decades despite extensive experimental and theoretical investigation. Recent progress in superconducting devices for quantum information has highlighted the need to mitigate sources of qubit decoherence, driving a renewed interest in understanding the underlying noise mechanism(s). Though a consensus has emerged attributing flux noise to surface spins, their identity and interaction mechanisms remain unclear, prompting further study. Here we apply weak in-plane magnetic fields to a capacitively-shunted flux qubit (where the Zeeman splitting of surface spins lies below the device temperature) and study the flux-noise-limited qubit dephasing, revealing previously unexplored trends that may shed light on the dynamics behind the emergent $1/f$ noise. Notably, we observe an enhancement (suppression) of the spin-echo (Ramsey) pure dephasing time in fields up to $B=100~\text{G}$. With direct noise spectroscopy, we further observe a transition from a $1/f$ to approximately Lorentzian frequency dependence below 10 Hz and a reduction of the noise above 1 MHz with increasing magnetic field. We suggest that these trends are qualitatively consistent with an increase of spin cluster sizes with magnetic field. These results should help to inform a complete microscopic theory of $1/f$ flux noise in superconducting circuits. △ Less

Submitted 18 January, 2023; originally announced January 2023.

arXiv:2211.13390 [pdf, ps, other]

doi 10.1088/1674-1137/acb6eb

Observation of $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, M. Albrecht, R. Aliberti, A. Amoroso, M. R. An, Q. An, X. H. Bai, Y. Bai, O. Bakina, R. Baldini Ferroli, I. Balossino, Y. Ban, V. Batozskaya, D. Becker, K. Begzsuren, N. Berger, M. Bertani, D. Bettoni, F. Bianchi, J. Bloms, A. Bortone, I. Boyko, R. A. Briere , et al. (545 additional authors not shown)

Abstract: Using data taken at 29 center-of-mass energies between 4.16 and 4.70 GeV with the BESIII detector at the Bei**g Electron Positron Collider corresponding to a total integrated luminosity of approximately 18.8 $\rm fb^{-1}$, the process $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$ is observed for the first time with a statistical significance of $11.5σ$. The average Born cross sections in the ener… ▽ More Using data taken at 29 center-of-mass energies between 4.16 and 4.70 GeV with the BESIII detector at the Bei**g Electron Positron Collider corresponding to a total integrated luminosity of approximately 18.8 $\rm fb^{-1}$, the process $e^+e^- \to p p \bar{p} \bar{n} π^{-} + c.c.$ is observed for the first time with a statistical significance of $11.5σ$. The average Born cross sections in the energy ranges of (4.160, 4.380) GeV, (4.400, 4.600) GeV and (4.610, 4.700) GeV are measured to be $(21.5\pm5.7\pm1.2)$ fb, $(46.3\pm10.6\pm2.5)$ fb and $(59.0\pm9.4\pm3.2)$ fb, respectively, where the first uncertainties are statistical and the second are systematic. The line shapes of the $\bar{p}\bar{n}$ and $ppπ^-$ invariant mass spectra are consistent with phase space distributions, indicating that no hexaquark or di-baryon state is observed. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: 10 pages, 4 figures. Submitted to Chinese Physics C

Journal ref: Chinese Physics C 47, (2023) 043001

arXiv:2211.10007 [pdf, other]

doi 10.3847/2041-8213/aca32f

First wide field-of-view X-ray observations by a lobster eye focusing telescope in orbit

Authors: C. Zhang, Z. X. Ling, X. J. Sun, S. L. Sun, Y. Liu, Z. D. Li, Y. L. Xue, Y. F. Chen, Y. F. Dai, Z. Q. Jia, H. Y. Liu, X. F. Zhang, Y. H. Zhang, S. N. Zhang, F. S. Chen, Z. W. Cheng, W. Fu, Y. X. Han, H. Li, J. F. Li, Y. Li, P. R. Liu, X. H. Ma, Y. J. Tang, C. B. Wang , et al. (53 additional authors not shown)

Abstract: As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we repor… ▽ More As a novel X-ray focusing technology, lobster eye micro-pore optics (MPO) feature both a wide observing field of view and true imaging capability, promising sky monitoring with significantly improved sensitivity and spatial resolution in soft X-rays. Since first proposed by Angel (1979), the optics have been extensively studied, developed and trialed over the past decades. In this Letter, we report on the first-light results from a flight experiment of the Lobster Eye Imager for Astronomy ($LEIA$), a pathfinder of the wide-field X-ray telescope of the Einstein Probe mission. The piggyback imager, launched in July 2022, has a mostly un-vignetted field of view of $18.6^\circ \times 18.6^\circ $. Its spatial resolution is in the range of 4$-$7 arcmin in FWHM and the focal spot effective area is 2$-$3 cm$^2$, both showing only mild fluctuations across the field of view. We present images of the Galactic center region, Sco X-1 and the diffuse Cygnus Loop nebular taken in snapshot observations over 0.5$-$4 keV. These are truly wide-field X-ray images of celestial bodies observed, for the first time, by a focusing imaging telescope. Initial analyses of the in-flight data show excellent agreement between the observed images and the on-ground calibration and simulations. The instrument and its characterization are briefly described, as well as the flight experiment. The results provide a solid basis for the development of the present and proposed wide-field X-ray missions using lobster eye MPO. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: 11 pages, 4 figures. Accepted for publication in Astrophysical Journal Letter

arXiv:2210.05531 [pdf, other]

doi 10.1103/PhysRevE.108.L032402

Specialization at an expanding front

Authors: Lauren H. Li, Mehran Kardar

Abstract: As a population grows, spreading to new environments may favor specialization. In this paper, we introduce and explore a model for specialization at the front of a colony expanding synchronously into new territory. We show through numerical simulations that, by gaining fitness through accumulating mutations, progeny of the initial seed population can differentiate into distinct specialists. With c… ▽ More As a population grows, spreading to new environments may favor specialization. In this paper, we introduce and explore a model for specialization at the front of a colony expanding synchronously into new territory. We show through numerical simulations that, by gaining fitness through accumulating mutations, progeny of the initial seed population can differentiate into distinct specialists. With competition and selection limited to the growth front, the emerging specialists first segregate into sectors, which then expand to dominate the entire population. We quantify the scaling of the fixation time with the size of the population and observe different behaviors corresponding to distinct universality classes: unbounded and bounded gains in fitness lead to superdiffusive ($z=3/2$) and diffusive ($z=2$) stochastic wanderings of the sector boundaries, respectively. △ Less

Submitted 13 September, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 6+3 pages, 3+3 figures

Journal ref: Phys. Rev. E 108, L032402 (2023)

arXiv:2206.05836 [pdf, other]

GLIPv2: Unifying Localization and Vision-Language Understanding

Authors: Haotian Zhang, Pengchuan Zhang, Xiaowei Hu, Yen-Chun Chen, Liunian Harold Li, Xiyang Dai, Lijuan Wang, Lu Yuan, Jenq-Neng Hwang, Jianfeng Gao

Abstract: We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, reg… ▽ More We present GLIPv2, a grounded VL understanding model, that serves both localization tasks (e.g., object detection, instance segmentation) and Vision-Language (VL) understanding tasks (e.g., VQA, image captioning). GLIPv2 elegantly unifies localization pre-training and Vision-Language Pre-training (VLP) with three pre-training tasks: phrase grounding as a VL reformulation of the detection task, region-word contrastive learning as a novel region-word level contrastive learning task, and the masked language modeling. This unification not only simplifies the previous multi-stage VLP procedure but also achieves mutual benefits between localization and understanding tasks. Experimental results show that a single GLIPv2 model (all model weights are shared) achieves near SoTA performance on various localization and understanding tasks. The model also shows (1) strong zero-shot and few-shot adaption performance on open-vocabulary object detection tasks and (2) superior grounding capability on VL understanding tasks. Code will be released at https://github.com/microsoft/GLIP. △ Less

Submitted 11 October, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022; updated with reviewers' comments addressed; Code is released at https://github.com/microsoft/GLIP

arXiv:2205.12617 [pdf, other]

DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation

Authors: **gnong Qu, Liunian Harold Li, Jieyu Zhao, Sunipa Dev, Kai-Wei Chang

Abstract: Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering… ▽ More Disinformation has become a serious problem on social media. In particular, given their short format, visual attraction, and humorous nature, memes have a significant advantage in dissemination among online communities, making them an effective vehicle for the spread of disinformation. We present DisinfoMeme to help detect disinformation memes. The dataset contains memes mined from Reddit covering three current topics: the COVID-19 pandemic, the Black Lives Matter movement, and veganism/vegetarianism. The dataset poses multiple unique challenges: limited data and label imbalance, reliance on external knowledge, multimodal reasoning, layout dependency, and noise from OCR. We test multiple widely-used unimodal and multimodal models on this dataset. The experiments show that the room for improvement is still huge for current models. △ Less

Submitted 25 May, 2022; originally announced May 2022.

arXiv:2205.12247 [pdf, other]

GeoMLAMA: Geo-Diverse Commonsense Probing on Multilingual Pre-Trained Language Models

Authors: Da Yin, Hritik Bansal, Masoud Monajatipoor, Liunian Harold Li, Kai-Wei Chang

Abstract: Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commo… ▽ More Recent work has shown that Pre-trained Language Models (PLMs) store the relational knowledge learned from data and utilize it for performing downstream tasks. However, commonsense knowledge across different regions may vary. For instance, the color of bridal dress is white in American weddings whereas it is red in Chinese weddings. In this paper, we introduce a benchmark dataset, Geo-Diverse Commonsense Multilingual Language Models Analysis (GeoMLAMA), for probing the diversity of the relational knowledge in multilingual PLMs. GeoMLAMA contains 3,125 prompts in English, Chinese, Hindi, Persian, and Swahili, with a wide coverage of concepts shared by people from American, Chinese, Indian, Iranian and Kenyan cultures. We benchmark 11 standard multilingual PLMs on GeoMLAMA. Interestingly, we find that 1) larger multilingual PLMs variants do not necessarily store geo-diverse concepts better than its smaller variant; 2) multilingual PLMs are not intrinsically biased towards knowledge from the Western countries (the United States); 3) the native language of a country may not be the best language to probe its knowledge and 4) a language may better probe knowledge about a non-native country than its native country. Code and data are released at https://github.com/WadeYin9712/GeoMLAMA. △ Less

Submitted 29 November, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: EMNLP 2022. Code and data are released at https://github.com/WadeYin9712/GeoMLAMA/

arXiv:2205.11502 [pdf, other]

On the Paradox of Learning to Reason from Data

Authors: Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, Guy Van den Broeck

Abstract: Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accurac… ▽ More Logical reasoning is needed in a wide range of NLP tasks. Can a BERT model be trained end-to-end to solve logical reasoning problems presented in natural language? We attempt to answer this question in a confined problem space where there exists a set of parameters that perfectly simulates logical reasoning. We make observations that seem to contradict each other: BERT attains near-perfect accuracy on in-distribution test examples while failing to generalize to other data distributions over the exact same problem space. Our study provides an explanation for this paradox: instead of learning to emulate the correct reasoning function, BERT has in fact learned statistical features that inherently exist in logical reasoning problems. We also show that it is infeasible to jointly remove statistical features from data, illustrating the difficulty of learning to reason in general. Our result naturally extends to other neural models and unveils the fundamental difference between learning to reason and learning to achieve high performance on NLP benchmarks using statistical features. △ Less

Submitted 24 May, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: Table 1 & 2 numbers were out-dated in v1; we have updated them; the observations and conclusions remain unchanged

arXiv:2204.08790 [pdf, other]

ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Authors: Chunyuan Li, Haotian Liu, Liunian Harold Li, Pengchuan Zhang, Jyoti Aneja, Jianwei Yang, ** **, Houdong Hu, Zicheng Liu, Yong Jae Lee, Jianfeng Gao

Abstract: Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public bench… ▽ More Learning visual representations from natural language supervision has recently shown great promise in a number of pioneering works. In general, these language-augmented visual models demonstrate strong transferability to a variety of datasets and tasks. However, it remains challenging to evaluate the transferablity of these models due to the lack of easy-to-use evaluation toolkits and public benchmarks. To tackle this, we build ELEVATER (Evaluation of Language-augmented Visual Task-level Transfer), the first benchmark and toolkit for evaluating(pre-trained) language-augmented visual models. ELEVATER is composed of three components. (i) Datasets. As downstream evaluation suites, it consists of 20 image classification datasets and 35 object detection datasets, each of which is augmented with external knowledge. (ii) Toolkit. An automatic hyper-parameter tuning toolkit is developed to facilitate model evaluation on downstream tasks. (iii) Metrics. A variety of evaluation metrics are used to measure sample-efficiency (zero-shot and few-shot) and parameter-efficiency (linear probing and full model fine-tuning). ELEVATER is a platform for Computer Vision in the Wild (CVinW), and is publicly released at at https://computer-vision-in-the-wild.github.io/ELEVATER/ △ Less

Submitted 13 October, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

Comments: NeurIPS 2022 (Datasets and Benchmarks Track). The first two authors contribute equally. Benchmark page: https://computer-vision-in-the-wild.github.io/ELEVATER/

arXiv:2112.09106 [pdf, other]

RegionCLIP: Region-based Language-Image Pretraining

Authors: Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao

Abstract: Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without… ▽ More Contrastive language-image pretraining (CLIP) using image-text pairs has achieved impressive results on image classification in both zero-shot and transfer learning settings. However, we show that directly applying such models to recognize image regions for object detection leads to poor performance due to a domain shift: CLIP was trained to match an image as a whole to a text description, without capturing the fine-grained alignment between image regions and text spans. To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions and textual concepts. Our method leverages a CLIP model to match image regions with template captions and then pretrains our model to align these region-text pairs in the feature space. When transferring our pretrained model to the open-vocabulary object detection tasks, our method significantly outperforms the state of the art by 3.8 AP50 and 2.2 AP for novel categories on COCO and LVIS datasets, respectively. Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets. Our code is available at https://github.com/microsoft/RegionCLIP. △ Less

Submitted 16 December, 2021; originally announced December 2021.

Comments: Technical report

arXiv:2112.08587 [pdf, other]

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Authors: Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang

Abstract: Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-mo… ▽ More Answering complex questions about images is an ambitious goal for machine intelligence, which requires a joint understanding of images, text, and commonsense knowledge, as well as a strong reasoning ability. Recently, multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning (VCR), by jointly understanding visual objects and text tokens through layers of cross-modality attention. However, these approaches do not utilize the rich structure of the scene and the interactions between objects which are essential in answering complex commonsense questions. We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graphs in commonsense reasoning. To exploit the scene graph structure, at the model structure level, we propose a multihop graph transformer for regularizing attention interaction among hops. As for pre-training, a scene-graph-aware pre-training method is proposed to leverage structure knowledge extracted in the visual scene graph. Moreover, we introduce a method to train and generate domain-relevant visual scene graphs using textual annotations in a weakly-supervised manner. Extensive experiments on VCR and other tasks show a significant performance boost compared with the state-of-the-art methods and prove the efficacy of each proposed component. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: AAAI 2022

Journal ref: AAAI 2022

arXiv:2112.03857 [pdf, other]

Grounded Language-Image Pre-training

Authors: Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, Chunyuan Li, Yiwu Zhong, Lijuan Wang, Lu Yuan, Lei Zhang, Jenq-Neng Hwang, Kai-Wei Chang, Jianfeng Gao

Abstract: This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can… ▽ More This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code is released at https://github.com/microsoft/GLIP. △ Less

Submitted 17 June, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: CVPR 2022; updated visualizations; fixed hyper-parameters in Appendix C.1

arXiv:2109.06860 [pdf, other]

Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning

Authors: Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, Kai-Wei Chang

Abstract: Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally o… ▽ More Commonsense is defined as the knowledge that is shared by everyone. However, certain types of commonsense knowledge are correlated with culture and geographic locations and they are only shared locally. For example, the scenarios of wedding ceremonies vary across regions due to different customs influenced by historical and religious factors. Such regional characteristics, however, are generally omitted in prior work. In this paper, we construct a Geo-Diverse Visual Commonsense Reasoning dataset (GD-VCR) to test vision-and-language models' ability to understand cultural and geo-location-specific commonsense. In particular, we study two state-of-the-art Vision-and-Language models, VisualBERT and ViLBERT trained on VCR, a standard multimodal commonsense benchmark with images primarily from Western regions. We then evaluate how well the trained models can generalize to answering the questions in GD-VCR. We find that the performance of both models for non-Western regions including East Asia, South Asia, and Africa is significantly lower than that for Western region. We analyze the reasons behind the performance disparity and find that the performance gap is larger on QA pairs that: 1) are concerned with culture-related scenarios, e.g., weddings, religious activities, and festivals; 2) require high-level geo-diverse commonsense reasoning rather than low-order perception and recognition. Dataset and code are released at https://github.com/WadeYin9712/GD-VCR. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: EMNLP 2021. Code and data are available at https://github.com/WadeYin9712/GD-VCR

arXiv:2108.11114 [pdf, other]

doi 10.1016/j.physleta.2022.128088

Quantum kernels with squeezed-state encoding for machine learning

Authors: Long Hin Li, Dan-Bo Zhang, Z. D. Wang

Abstract: Kernel methods are powerful for machine learning, as they can represent data in feature spaces that similarities between samples may be faithfully captured. Recently, it is realized that machine learning enhanced by quantum computing is closely related to kernel methods, where the exponentially large Hilbert space turns to be a feature space more expressive than classical ones. In this paper, we g… ▽ More Kernel methods are powerful for machine learning, as they can represent data in feature spaces that similarities between samples may be faithfully captured. Recently, it is realized that machine learning enhanced by quantum computing is closely related to kernel methods, where the exponentially large Hilbert space turns to be a feature space more expressive than classical ones. In this paper, we generalize quantum kernel methods by encoding data into continuous-variable quantum states, which can benefit from the infinite-dimensional Hilbert space of continuous variables. Specially, we propose squeezed-state encoding, in which data is encoded as either in the amplitude or the phase. The kernels can be calculated on a quantum computer and then are combined with classical machine learning, e.g. support vector machine, for training and predicting tasks. Their comparisons with other classical kernels are also addressed. Lastly, we discuss physical implementations of squeezed-state encoding for machine learning in quantum platforms such as trapped ions. △ Less

Submitted 25 August, 2021; originally announced August 2021.

Comments: 5 pages, 4 figures

Journal ref: Physics Letters A,436,128088(2022)

arXiv:2108.04938 [pdf, other]

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Authors: Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi Chien, C. -C. Jay Kuo, Fabien Scalzo, Kai-Wei Chang

Abstract: Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the do… ▽ More Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. Prior studies show that pre-trained V&L models can significantly improve the model performance for downstream tasks such as Visual Question Answering (VQA). However, V&L models are less effective when applied in the medical domain (e.g., on X-ray images and clinical notes) due to the domain gap. In this paper, we investigate the challenges of applying pre-trained V&L models in medical applications. In particular, we identify that the visual representation in general V&L models is not suitable for processing medical data. To overcome this limitation, we propose BERTHop, a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities. Experiments on the OpenI dataset, a commonly used thoracic disease diagnosis benchmark, show that BERTHop achieves an average Area Under the Curve (AUC) of 98.12% which is 1.62% higher than state-of-the-art (SOTA) while it is trained on a 9 times smaller dataset. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: 10 pages, 8 figures, Accepted in ICCV workshop

arXiv:2107.06383 [pdf, other]

How Much Can CLIP Benefit Vision-and-Language Tasks?

Authors: Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

Abstract: Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amou… ▽ More Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world. However, it has been observed that large-scale pretraining usually can result in better generalization performance, e.g., CLIP (Contrastive Language-Image Pre-training), trained on a massive amount of image-caption pairs, has shown a strong zero-shot capability on various vision tasks. To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two typical scenarios: 1) plugging CLIP into task-specific fine-tuning; 2) combining CLIP with V&L pre-training and transferring to downstream tasks. We show that CLIP significantly outperforms widely-used visual encoders trained with in-domain annotated data, such as BottomUp-TopDown. We achieve competitive or better results on diverse V&L tasks, while establishing new state-of-the-art results on Visual Question Answering, Visual Entailment, and V&L Navigation tasks. We release our code at https://github.com/clip-vil/CLIP-ViL. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Comments: 14 pages

arXiv:2101.11869 [pdf]

doi 10.1021/acsnano.0c07430

Mechanical Properties of Atomically Thin Tungsten Dichalcogenides: WS$_2$, WSe$_2$ and WTe$_2$

Authors: Alexey Falin, Matthew Holwill, Haifeng Lv, Wei Gan, Jun Cheng, Rui Zhang, Dong Qian, Matthew R. Barnett, Elton J. G. Santos, Konstantin S. Novoselov, Tao Tao, Xiaojun Wu, Lu Hua Li

Abstract: Two-dimensional (2D) tungsten disulfide (WS$_2$), tungsten diselenide (WSe$_2$), and tungsten ditelluride (WTe$_2$) draw increasing attention due to their attractive properties deriving from the heavy tungsten and chalcogenide atoms, but their mechanical properties are still mostly unknown. Here, we determine the intrinsic and air-aged mechanical properties of mono-, bi-, and trilayer (1-3L) WS… ▽ More Two-dimensional (2D) tungsten disulfide (WS$_2$), tungsten diselenide (WSe$_2$), and tungsten ditelluride (WTe$_2$) draw increasing attention due to their attractive properties deriving from the heavy tungsten and chalcogenide atoms, but their mechanical properties are still mostly unknown. Here, we determine the intrinsic and air-aged mechanical properties of mono-, bi-, and trilayer (1-3L) WS$_2$, WSe$_2$ and WTe$_2$ using a complementary suite of experiments and theoretical calculations. High-quality 1L WS$_2$ has the highest Young's modulus (302.4+-24.1 GPa) and strength (47.0+-8.6 GPa) of the entire family, overpassing those of 1L WSe$_2$ (258.6+-38.3 and 38.0+-6.0 GPa, respectively) and WTe$_2$ (149.1+-9.4 and 6.4+-3.3 GPa, respectively). However, the elasticity and strength of WS$_2$ decrease most dramatically with increased thickness among the three materials. We interpret the phenomenon by the different tendencies for interlayer sliding in equilibrium state and under in-plane strain and out-of-plane compression conditions in the indentation process, revealed by finite element method (FEM) and density functional theory (DFT) calculations including van der Waals (vdW) interactions. We also demonstrate that the mechanical properties of the high-quality 1-3L WS$_2$ and WSe$_2$ are largely stable in the air for up to 20 weeks. Intriguingly, the 1-3L WSe$_2$ shows increased modulus and strength values with aging in the air. This is ascribed to oxygen do**, which reinforces the structure. The present study will facilitate the design and use of 2D tungsten dichalcogenides in applications, such as strain engineering and flexible field-effect transistors (FETs). △ Less

Submitted 28 January, 2021; originally announced January 2021.

Journal ref: ACS Nano 2021

arXiv:2011.00942 [pdf]

doi 10.1021/acs.nanolett.0c04794

Layer-dependent mechanical properties and enhanced plasticity in the van der Waals chromium trihalide magnets

Authors: Fernando Cantos-Prieto, Alexey Falin, Martin Alliati, Dong Qian, Rui Zhang, Tao Tao, Matthew R. Barnett, Elton J. G. Santos, Lu Hua Li, Efren Navarro-Moratalla

Abstract: The mechanical properties of magnetic materials are instrumental for the development of the magnetoelastic theory and the optimization of strain-modulated magnetic devices. In particular, two-dimensional (2D) magnets hold promise to enlarge these concepts into the realm of low-dimensional physics and ultrathin devices. However, no experimental study on the intrinsic mechanical properties of the ar… ▽ More The mechanical properties of magnetic materials are instrumental for the development of the magnetoelastic theory and the optimization of strain-modulated magnetic devices. In particular, two-dimensional (2D) magnets hold promise to enlarge these concepts into the realm of low-dimensional physics and ultrathin devices. However, no experimental study on the intrinsic mechanical properties of the archetypal 2D magnet family of the chromium trihalides has thus far been performed. Here, we report the room temperature layer-dependent mechanical properties of atomically thin CrI3 and CrCl3, finding that bilayers of CrI3 and CrCl3 have Young's moduli of 62.1 GPa and 43.4 GPa, with the highest sustained strain of 6.09% and 6.49% and breaking strengths of 3.6 GPa and 2.2 GPa, respectively. Both the elasticity and strength of the two materials decrease with increased thickness, which is attributed to a weak interlayer interaction that enables interlayer sliding under low levels of applied load. The mechanical properties observed in the few-layer chromium trihalide crystals provide evidence of outstanding plasticity in these materials, which is qualitatively demonstrated in their bulk counterparts. This study will contribute to various applications of the van der Waals magnetic materials, especially for their use in magnetostrictive and flexible devices. △ Less

Submitted 1 April, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: Main text and supplementary information

arXiv:2010.12831 [pdf, other]

Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions

Authors: Liunian Harold Li, Haoxuan You, Zhecan Wang, Alireza Zareian, Shih-Fu Chang, Kai-Wei Chang

Abstract: Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through u… ▽ More Pre-trained contextual vision-and-language (V&L) models have achieved impressive performance on various benchmarks. However, existing models require a large amount of parallel image-caption data for pre-training. Such data are costly to collect and require cumbersome curation. Inspired by unsupervised machine translation, we investigate if a strong V&L representation model can be learned through unsupervised pre-training without image-caption corpora. In particular, we propose to conduct ``mask-and-predict'' pre-training on text-only and image-only corpora and introduce the object tags detected by an object recognition model as anchor points to bridge two modalities. We find that such a simple approach achieves performance close to a model pre-trained with aligned data, on four English V&L benchmarks. Our work challenges the widely held notion that aligned data is necessary for V&L pre-training, while significantly reducing the amount of supervision needed for V&L models. △ Less

Submitted 11 April, 2021; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: NAACL 2021 Camera Ready

arXiv:2008.01657 [pdf]

doi 10.1038/ncomms15815

Mechanical Properties of Atomically Thin Boron Nitride and the Role of Interlayer Interactions

Authors: Aleksey Falin, Qiran Cai, Elton J. G. Santos, Declan Scullion, Dong Qian, Rui Zhang, Zhi Yang, Shaoming Huang, Kenji Watanabe, Takashi Taniguchi, Matthew R. Barnett, Ying Chen, Rodney S. Ruoff, Lu Hua Li

Abstract: Atomically thin boron nitride (BN) nanosheets are important two-dimensional nanomaterials with many unique properties distinct from those of graphene, but the investigation of their mechanical properties still greatly lacks. Here we report that high-quality single-crystalline mono- and few-layer BN nanosheets are one of the strongest electrically insulating materials. More intriguingly, few-layer… ▽ More Atomically thin boron nitride (BN) nanosheets are important two-dimensional nanomaterials with many unique properties distinct from those of graphene, but the investigation of their mechanical properties still greatly lacks. Here we report that high-quality single-crystalline mono- and few-layer BN nanosheets are one of the strongest electrically insulating materials. More intriguingly, few-layer BN shows mechanical behaviors quite different from those of few-layer graphene under indentation. In striking contrast to graphene, whose strength decreases by more than 30% when the number of layers increases from 1 to 8, the mechanical strength of BN nanosheets is not sensitive to increasing thickness. We attribute this difference to the distinct interlayer interactions and hence sliding tendencies in these two materials under indentation. The significantly better mechanical integrity of BN nanosheets makes them a more attractive candidate than graphene for several applications, e.g. as mechanical reinforcements. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Journal ref: NATURE COMMUNICATIONS | 8:15815 | 2017

arXiv:2008.01656 [pdf]

doi 10.1039/c6nr09312d

Raman Signature and Phonon Dispersion of Atomically Thin Boron Nitride

Authors: Qiran Cai, Declan Scullion, Aleksey Falin, Kenji Watanabe, Takashi Taniguchi, Ying Chen, Elton J. G. Santos, Lu Hua Li

Abstract: Raman spectroscopy has become an essential technique to characterize and investigate graphene and many other two-dimensional materials. However, there still lacks consensus on the Raman signature and phonon dispersion of atomically thin boron nitride (BN), which has many unique properties distinct from graphene. Such a knowledge gap greatly affects the understanding of basic physical and chemical… ▽ More Raman spectroscopy has become an essential technique to characterize and investigate graphene and many other two-dimensional materials. However, there still lacks consensus on the Raman signature and phonon dispersion of atomically thin boron nitride (BN), which has many unique properties distinct from graphene. Such a knowledge gap greatly affects the understanding of basic physical and chemical properties of atomically thin BN as well as the use of Raman spectroscopy to study these nanomaterials. Here, we use both experiment and simulation to reveal the intrinsic Raman signature of monolayer and few-layer BN. We find experimentally that atomically thin BN without interaction with substrate has a G band frequency similar to that of bulk hexagonal BN, but strain induced by substrate can cause pronounced Raman shifts. This is in excellent agreement with our first-principles density functional theory (DFT) calculations at two levels of theory, including van der Waals dispersion forces (opt-vdW) and a fractional of the exact exchange from Hartree-Fock (HF) theory through hybrid HSE06 functional. Both calculations demonstrate that the intrinsic E2g mode of BN does not depend sensibly on the number of layers. Our simulations also suggest the importance of the exact exchange mixing parameter in calculating the vibrational modes in BN, as it determines the fraction of HF exchange included in the DFT calculations. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Journal ref: Nanoscale 9, 3059-3067(2017)

arXiv:2008.00451 [pdf]

doi 10.1021/acsnano.9b06858

Atomically Thin Boron Nitride as an Ideal Spacer for Metal-Enhanced Fluorescence

Authors: Wei Gan, Christos Tserkezis, Qiran Cai, Alexey Falin, Srikanth Mateti, Minh Nguyen, Igor Aharonovich, Kenji Watanabe, Takashi Taniguchi, Fumin Huang, Li Song, Lingxue Kong, Ying Chen, Lu Hua Li

Abstract: The metal-enhanced fluorescence (MEF) considerably enhances the luminescence for various applications, but its performance largely depends on the dielectric spacer between the fluorophore and plasmonic system. It is still challenging to produce a defect-free spacer having an optimized thickness with a subnanometer accuracy that enables reusability without affecting the enhancement. In this study,… ▽ More The metal-enhanced fluorescence (MEF) considerably enhances the luminescence for various applications, but its performance largely depends on the dielectric spacer between the fluorophore and plasmonic system. It is still challenging to produce a defect-free spacer having an optimized thickness with a subnanometer accuracy that enables reusability without affecting the enhancement. In this study, we demonstrate the use of atomically thin hexagonal boron nitride (BN) as an ideal MEF spacer owing to its multifold advantages over the traditional dielectric thin films. With rhodamine 6G as a representative fluorophore, it largely improves the enhancement factor (up to ~95+-5), sensitivity (10^-8 M), reproducibility, and reusability (~90% of the plasmonic activity is retained after 30 cycles of heating at 350 °C in air) of MEF. This can be attributed to its two-dimensional structure, thickness control at the atomic level, defect-free quality, high affinities to aromatic fluorophores, good thermal stability, and excellent impermeability. The atomically thin BN spacers could increase the use of MEF in different fields and industries. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Journal ref: ACS Nano 13, 12184-12191(2019)

arXiv:2008.00447 [pdf]

doi 10.1021/acsami.0c01157

Two-dimensional van der Waals Heterostructures for Synergistically Improved Surface Enhanced Raman Spectroscopy

Authors: Qiran Cai, Wei Gan, Alexey Falin, Kenji Watanabe, Takashi Taniguchi, **cheng Zhuang, Weichang Hao, Shaoming Huang, Tao Tao, Ying Chen, Lu Hua Li

Abstract: Surface enhanced Raman spectroscopy (SERS) is a precise and non-invasive analytical technique that is widely used in chemical analysis, environmental protection, food processing, pharmaceutics, and diagnostic biology. However, it is still a challenge to produce highly sensitive and reusable SERS substrates with minimum fluorescence background. In this work, we propose the use of van der Waals hete… ▽ More Surface enhanced Raman spectroscopy (SERS) is a precise and non-invasive analytical technique that is widely used in chemical analysis, environmental protection, food processing, pharmaceutics, and diagnostic biology. However, it is still a challenge to produce highly sensitive and reusable SERS substrates with minimum fluorescence background. In this work, we propose the use of van der Waals heterostructures of two-dimensional materials (2D materials) to cover plasmonic metal nanoparticles to solve this challenge. The heterostructures of atomically thin boron nitride (BN) and graphene provide synergistic effects: (1) electrons could tunnel through the atomically thin BN, allowing the charge transfer between graphene and probe molecules to suppress fluorescence background; (2) the SERS sensitivity is enhanced by graphene via chemical enhancement mechanism (CM) in addition to electromagnetic field mechanism (EM); (3) the atomically thin BN protects the underlying graphene and Ag nanoparticles from oxidation during heating for regeneration at 360 °C in the air so that the SERS substrates could be reused. These advances will facilitate wider applications of SERS, especially on the detection of fluorescent molecules with higher sensitivity. △ Less

Submitted 2 August, 2020; originally announced August 2020.

Journal ref: ACS Applied Materials Interfaces 12, 21985-21991(2020)

arXiv:2008.00443 [pdf]

doi 10.1103/PhysRevLett.125.085902

Outstanding Thermal Conductivity of Single Atomic Layer Isotope-Modified Boron Nitride

Authors: Qiran Cai, Declan Scullion, Wei Gan, Alexey Falin, Pavel Cizek, Song Liu, James H. Edgar, Rong Liu, Bruce C. C. Cowie, Elton J. G. Santos, Lu Hua Li

Abstract: Materials with high thermal conductivities (k) is valuable to solve the challenge of waste heat dissipation in highly integrated and miniaturized modern devices. Herein, we report the first synthesis of atomically thin isotopically pure hexagonal boron nitride (BN) and its one of the highest k among all semiconductors and electric insulators. Single atomic layer (1L) BN enriched with 11B has a k u… ▽ More Materials with high thermal conductivities (k) is valuable to solve the challenge of waste heat dissipation in highly integrated and miniaturized modern devices. Herein, we report the first synthesis of atomically thin isotopically pure hexagonal boron nitride (BN) and its one of the highest k among all semiconductors and electric insulators. Single atomic layer (1L) BN enriched with 11B has a k up to 1009 W/mK at room temperature. We find that the isotope engineering mainly suppresses the out-of-plane optical (ZO) phonon scatterings in BN, which subsequently reduces acoustic-optical scatterings between ZO and transverse acoustic (TA) and longitudinal acoustic (LA) phonons. On the other hand, reducing the thickness to single atomic layer diminishes the interlayer interactions and hence Umklapp scatterings of the out-of-plane acoustic (ZA) phonons, though this thickness-induced k enhancement is not as dramatic as that in naturally occurring BN. With many of its unique properties, atomically thin monoisotopic BN is promising on heat management in van der Waals (vdW) devices and future flexible electronics. The isotope engineering of atomically thin BN may also open up other appealing applications and opportunities in 2D materials yet to be explored. △ Less

Submitted 21 August, 2020; v1 submitted 2 August, 2020; originally announced August 2020.

Journal ref: Physical Review Letters 125, 085902 (2020)

arXiv:1912.10543 [pdf, other]

doi 10.1021/acs.nanolett.9b02982

Electronic polarizability as the fundamental variable in the dielectric properties of two-dimensional materials

Authors: Tian Tian, Declan Scullion, Dale Hughes, Lu Hua Li, Chih-Jen Shih, Jonathan Coleman, Manish Chhowalla, Elton J. G. Santos

Abstract: The dielectric constant, which defines the polarization of the media, is a key quantity in condensed matter. It determines several electronic and optoelectronic properties important for a plethora of modern technologies from computer memory to field effect transistors and communication circuits. Moreover, the importance of the dielectric constant in describing electromagnetic interactions through… ▽ More The dielectric constant, which defines the polarization of the media, is a key quantity in condensed matter. It determines several electronic and optoelectronic properties important for a plethora of modern technologies from computer memory to field effect transistors and communication circuits. Moreover, the importance of the dielectric constant in describing electromagnetic interactions through screening plays a critical role in understanding fundamental molecular interactions. Here we show that despite its fundamental transcendence, the dielectric constant does not define unequivocally the dielectric properties of two-dimensional (2D) materials due to the locality of their electrostatic screening. Instead, the electronic polarizability correctly captures the dielectric nature of a 2D material which is united to other physical quantities in an atomically thin layer. We reveal a long-sought universal formalism where electronic, geometrical and dielectric properties are intrinsically correlated through the polarizability opening the door to probe quantities yet not directly measurable including the real covalent thickness of a layer. We unify the concept of dielectric properties in any material dimension finding a global dielectric anisotropy index defining their controllability through dimensionality. △ Less

Submitted 22 December, 2019; originally announced December 2019.

arXiv:1908.03557 [pdf, other]

VisualBERT: A Simple and Performant Baseline for Vision and Language

Authors: Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang

Abstract: We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experim… ▽ More We propose VisualBERT, a simple and flexible framework for modeling a broad range of vision-and-language tasks. VisualBERT consists of a stack of Transformer layers that implicitly align elements of an input text and regions in an associated input image with self-attention. We further propose two visually-grounded language model objectives for pre-training VisualBERT on image caption data. Experiments on four vision-and-language tasks including VQA, VCR, NLVR2, and Flickr30K show that VisualBERT outperforms or rivals with state-of-the-art models while being significantly simpler. Further analysis demonstrates that VisualBERT can ground elements of language to image regions without any explicit supervision and is even sensitive to syntactic relationships, tracking, for example, associations between verbs and image regions corresponding to their arguments. △ Less

Submitted 9 August, 2019; originally announced August 2019.

Comments: Work in Progress

arXiv:1903.08862 [pdf]

doi 10.1126/sciadv.aav0129

High thermal conductivity of high-quality monolayer boron nitride and its thermal expansion

Authors: Qiran Cai, Declan Scullion, Wei Gan, Aleksey Falin, Shunying Zhang, Kenji Watanabe, Takashi Taniguchi, Ying Chen, Elton J. G. Santos, Lu Hua Li

Abstract: Heat management becomes more and more critical, especially in miniaturized modern devices, so the exploration of highly thermally conductive materials with electrical insulation and favorable mechanical properties is of great importance. Here, we report that high-quality monolayer boron nitride (BN) has a thermal conductivity (\k{appa}) of 751 W/mK at room temperature. Though smaller than that of… ▽ More Heat management becomes more and more critical, especially in miniaturized modern devices, so the exploration of highly thermally conductive materials with electrical insulation and favorable mechanical properties is of great importance. Here, we report that high-quality monolayer boron nitride (BN) has a thermal conductivity (\k{appa}) of 751 W/mK at room temperature. Though smaller than that of graphene, this value is larger than that of cubic boron nitride (cBN) and only second to those of diamond and lately discovered cubic boron arsenide (BAs). Monolayer BN has the second largest \k{appa} per unit weight among all semiconductors and insulators, just behind diamond, if density is considered. The \k{appa} of atomically thin BN decreases with increased thickness. Our large-scale molecular dynamic simulations using Green-Kubo formalism accurately reproduce this trend, and the density functional theory (DFT) calculations reveal the main scattering mechanism. The thermal expansion coefficients (TECs) of monolayer to trilayer BN at 300-400 K are also experimentally measured, and the results are comparable to atomistic ab initio DFT calculations in a wider range of temperatures. Thanks to its wide bandgap, high thermal conductivity, outstanding strength, good flexibility, and excellent thermal and chemical stability, atomically thin BN is a strong candidate for heat dissipation applications, especially in the next generation of flexible electronic devices. △ Less

Submitted 26 March, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Journal ref: Science Advances 2019

arXiv:1902.11269 [pdf, ps, other]

Efficient Contextual Representation Learning Without Softmax Layer

Authors: Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang

Abstract: Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. T… ▽ More Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. Therefore, we redesign the learning objective and propose an efficient framework for training contextual representation models. Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings. Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary. When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks. △ Less

Submitted 28 February, 2019; originally announced February 2019.

Comments: Work in progress

arXiv:1901.01033 [pdf]

Giant optical nonlinearity cancellation in quantum wells

Authors: S. Houver, A. Lebreton, T. A. S. Pereira, G. Xu, R. Colombelli, I. Kundu, L. H. Li, E. H. Linfield, A. G. Davies, J. Mangeney, J. Tignon, R. Ferreira, S. S. Dhillon

Abstract: Second-order optical nonlinearities can be greatly enhanced by orders of magnitude in resonantly excited nanostructures, theoretically predicted and experimentally investigated in a variety of semiconductor systems. These resonant nonlinearities continually attract attention, particularly in newly discovered materials, but tend not to be as efficient as currently predicted. This limits their explo… ▽ More Second-order optical nonlinearities can be greatly enhanced by orders of magnitude in resonantly excited nanostructures, theoretically predicted and experimentally investigated in a variety of semiconductor systems. These resonant nonlinearities continually attract attention, particularly in newly discovered materials, but tend not to be as efficient as currently predicted. This limits their exploitation in frequency conversion. Here, we present a clear-cut theoretical and experimental demonstration that the second-order nonlinear susceptibility can vary by orders of magnitude as a result of giant cancellation effects in systems with many confined quantum states. Using terahertz quantum cascade lasers as a model source to investigate interband and intersubband resonant nonlinearities, we show that these giant cancellations are a result of interfering second-order nonlinear contributions of light and heavy hole states. As well as of importance to understand and engineer the resonant optical properties of materials, this work can be employed as a new, extremely sensitive tool to elucidate the bandstructure properties of complex quantum well systems. △ Less

Submitted 4 January, 2019; originally announced January 2019.

arXiv:1802.02501 [pdf]

doi 10.1038/s41467-018-03592-3

Asymmetric Electric Field Screening in van der Waals Heterostructures

Authors: Lu Hua Li, Tian Tian, Qiran Cai, Chih-Jen Shih, Elton J. G. Santos

Abstract: Electric field screening plays an important role in the physical and chemical properties of materials and their devices. Here, we use a compelling set of theoretical and experimental techniques involving van der Waals (vdW) ab initio density functional theory (DFT) simulations, quantum capacitance-based classical model and electric force microscopy (EFM) to elucidate the intrinsic dielectric scree… ▽ More Electric field screening plays an important role in the physical and chemical properties of materials and their devices. Here, we use a compelling set of theoretical and experimental techniques involving van der Waals (vdW) ab initio density functional theory (DFT) simulations, quantum capacitance-based classical model and electric force microscopy (EFM) to elucidate the intrinsic dielectric screening properties of vdW heterostructures (vdWHs) formed by MoS2 and graphene layers. We experimentally observed an asymmetric electric response in the MoS2/Graphene vdWHs under different directions of the external electric field. That is, when the electric fields are shed towards graphene, a large amount of polarized charges screen the fields, but as the sign of the field was reversed, a strong depolarization field was present, and a partial screening was detected. This effect is thickness-dependent, in particular on the number of the MoS2 layers; whereas increased thickness of graphene showed a small effect on their electrical and screening behavior. Our results indicate that asymmetric dipolar contributions at the interface between graphene and MoS2 are the main cause to the unusual field-effect screening in the vdWHs. This work not only provides new insights on the screening properties of a vast amount of heterojunction fabricated so far, but also uncovers the great potential of controlling a fundamental property, such as screening, for device applications. △ Less

Submitted 7 February, 2018; originally announced February 2018.

Comments: Final version to appear in Nature Communications (In Press)

arXiv:1801.06459 [pdf]

doi 10.1364/OPTICA.4.001451

Ultrafast terahertz detectors based on three-dimensional meta-atoms

Authors: B. Paulillo, S. Pirotta, H. Nong, P. Crozat, S. Guilet, G. Xu, S. Dhillon, L. H. Li, A. G. Davies, E. H. Linfield, R. Colombelli

Abstract: Terahertz (THz) and sub-THz frequency emitter and detector technologies are receiving increasing attention, underpinned by emerging applications in ultra-fast THz physics, frequency-combs technology and pulsed laser development in this relatively unexplored region of the electromagnetic spectrum. In particular, semiconductor-based ultrafast THz receivers are required for compact, ultrafast spectro… ▽ More Terahertz (THz) and sub-THz frequency emitter and detector technologies are receiving increasing attention, underpinned by emerging applications in ultra-fast THz physics, frequency-combs technology and pulsed laser development in this relatively unexplored region of the electromagnetic spectrum. In particular, semiconductor-based ultrafast THz receivers are required for compact, ultrafast spectroscopy and communication systems, and to date, quantum well infrared photodetectors (QWIPs) have proved to be an excellent technology to address this given their intrinsic ps-range response However, with research focused on diffraction-limited QWIP structures (lambda/2), RC constants cannot be reduced indefinitely, and detection speeds are bound to eventually meet un upper limit. The key to an ultra-fast response with no intrinsic upper limit even at tens of GHz is an aggressive reduction in device size, below the diffraction limit. Here we demonstrate sub-wavelength (lambda/10) THz QWIP detectors based on a 3D split-ring geometry, yielding ultra-fast operation at a wavelength of around 100 μm. Each sensing meta-atom pixel features a suspended loop antenna that feeds THz radiation in the ~20 m3 active volume. Arrays of detectors as well as single-pixel detectors have been implemented with this new architecture, with the latter exhibiting ultra-low dark currents below the nA level. This extremely small resonator architecture leads to measured optical response speeds - on arrays of 300 devices - of up to ~3 GHz and an expected device operation of up to tens of GHz, based on the measured S-parameters on single devices and arrays. △ Less

Submitted 19 January, 2018; originally announced January 2018.

Journal ref: Optica 4, 1451-1456 (2017)

arXiv:1801.05622 [pdf]

Hunting micrometer-sized graphene flakes on gold substrate

Authors: Joel M. Katzen, Matěj Velický, Yuefeng Huang, Stacey Drakeley, William Hendren, Robert M. Bowman, Qiran Cai, Ying Chen, Lu Hua Li, Fumin Huang

Abstract: Gold is widely used as the substrate material in many graphene devices, due to its superior optoelectronic properties and chemical stability. However, there has been little experimental investigation on the optical contrast of graphene films on Au substrates. Here we report accurate measurement of the optical contrast spectra of few-layer graphene flakes on bulk Au. We used a high-resolution optic… ▽ More Gold is widely used as the substrate material in many graphene devices, due to its superior optoelectronic properties and chemical stability. However, there has been little experimental investigation on the optical contrast of graphene films on Au substrates. Here we report accurate measurement of the optical contrast spectra of few-layer graphene flakes on bulk Au. We used a high-resolution optical microscopy with a 100x magnification objective, accurately determining the thickness of flakes as small as one micrometer in lateral size, which are highly desired in many applications. The results are in excellent agreement with theoretical calculations and confirmed by Raman and AFM measurements. Furthermore, we demonstrate that the optical contrast spectroscopy is sensitive enough to detect the adsorption of a sub-monolayer airborne hydrocarbon molecules, which can reveal whether graphene is con-taminated and opens the opportunity to develop miniaturized and ultrasensitive molecular sensors. △ Less

Submitted 17 January, 2018; originally announced January 2018.

arXiv:1612.02883 [pdf]

doi 10.1002/adfm.201603160

Molecule-Induced Conformational Change in Boron Nitride Nanosheets with Enhanced Surface Adsorption

Authors: Qiran Cai, Aijun Du, Guo** Gao, Srikanth Mateti, Bruce C. C. Cowie, Dong Qian, Shuang Zhang, Yuerui Lu, Lan Fu, Takashi Taniguchi, Shaoming Huang, Ying Chen, Rodney S. Ruoff, Lu Hua Li

Abstract: Surface interaction is extremely important to both fundamental research and practical application. Physisorption can induce shape and structural distortion (i.e. conformational changes) in macromolecular and biomolecular adsorbates, but such phenomenon has rarely been observed on adsorbents. Here, we demonstrate theoretically and experimentally that atomically thin boron nitride (BN) nanosheets as… ▽ More Surface interaction is extremely important to both fundamental research and practical application. Physisorption can induce shape and structural distortion (i.e. conformational changes) in macromolecular and biomolecular adsorbates, but such phenomenon has rarely been observed on adsorbents. Here, we demonstrate theoretically and experimentally that atomically thin boron nitride (BN) nanosheets as an adsorbent experience conformational changes upon surface adsorption of molecules, increasing adsorption energy and efficiency. The study not only provides new perspectives on the strong adsorption capability of BN nanosheets and many other two-dimensional nanomaterials but also opens up possibilities for many novel applications. For example, we demonstrate that BN nanosheets with the same surface area as bulk hBN particles are more effective in purification and sensing. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Journal ref: Adv. Funct. Mater. 26, 8202-8210 (2016)

arXiv:1607.03555 [pdf]

doi 10.1002/anie.201600517

Boron Nitride Nanosheets Improve Sensitivity and Reusability of Surface Enhanced Raman Spectroscopy

Authors: Qiran Cai, Srikanth Mateti, Wenrong Yang, Rob Jones, Kenji Watanabe, Takashi Taniguchi, Shaoming Huang, Ying Chen, Lu Hua Li

Abstract: Surface enhanced Raman spectroscopy (SERS) is a useful multidisciplinary analytic technique. However, it is still a challenge to produce SERS substrates that are highly sensitive, reproducible, stable, reusable, and scalable. Here, we demonstrate that atomically thin boron nitride (BN) nanosheets have many unique and desirable properties to help solve this challenge. The synergic effect of the ato… ▽ More Surface enhanced Raman spectroscopy (SERS) is a useful multidisciplinary analytic technique. However, it is still a challenge to produce SERS substrates that are highly sensitive, reproducible, stable, reusable, and scalable. Here, we demonstrate that atomically thin boron nitride (BN) nanosheets have many unique and desirable properties to help solve this challenge. The synergic effect of the atomic thickness, high flexibility, stronger surface adsorption capability, electrical insulation, impermeability, high thermal and chemical stability of BN nanosheets can increase the Raman sensitivity by up to two orders, and in the meantime attain long-term stability and extraordinary reusability not achievable by other materials. These advances will greatly facilitate the wider use of SERS in many fields. △ Less

Submitted 12 July, 2016; originally announced July 2016.

Journal ref: Angew. Chem. Int. Ed. 2016, 55, 8405-8409

arXiv:1606.07183 [pdf]

doi 10.1021/acsami.6b04320

Boron Nitride Nanosheet Veiled Gold Nanoparticles for Surface Enhanced Raman Scattering

Authors: Qiran Cai, Srikanth Mateti, Kenji Watanabe, Takashi Taniguchi, Shaoming Huang, Ying Chen, Lu Hua Li

Abstract: Atomically thin boron nitride (BN) nanosheets have many properties desirable for surface enhanced Raman spectroscopy (SERS). BN nanosheets have a strong surface adsorption capability towards airborne hydrocarbon and aromatic molecules. For maximized adsorption area and hence SERS sensitivity, atomically thin BN nanosheet covered gold nanoparticles have been prepared for the first time. When placed… ▽ More Atomically thin boron nitride (BN) nanosheets have many properties desirable for surface enhanced Raman spectroscopy (SERS). BN nanosheets have a strong surface adsorption capability towards airborne hydrocarbon and aromatic molecules. For maximized adsorption area and hence SERS sensitivity, atomically thin BN nanosheet covered gold nanoparticles have been prepared for the first time. When placed on top of metal nanoparticles, atomically thin BN nanosheets closely follow their contours so that the plasmonic hot spots are retained. Electrically insulating BN nanosheets also act as a barrier layer to eliminate metal-induced disturbance in SERS. Moreover, the SERS substrates veiled by BN nanosheets show an outstanding reusability in the long term. As the result, the sensitivity, reproducibility and reusability of SERS substrates can be greatly improved. We also demonstrate that large BN nanosheets produced by chemical vapor deposition can be used to scale up the proposed SERS substrate for practical application. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Journal ref: ACS Appl. Mater. Interfaces 2016, 8(24), 15630-15636

arXiv:1605.09416 [pdf]

doi 10.1063/1.4905338

Distributed feedback terahertz frequency quantum cascade lasers with dual periodicity gratings

Authors: F. Castellano, S. Zanotto, L. H. Li, A. Pitanti, A. Tredicucci, E. H. Linfield, A. G. Davies, M. S. Vitiello

Abstract: We have developed terahertz frequency quantum cascade lasers that exploit a double-periodicity distributed feedback grating to control the emission frequency and the output beam direction independently. The spatial refractive index modulation of the gratings necessary to provide optical feedback at a fixed frequency and, simultaneously, a far-field emission pattern centered at controlled angles, w… ▽ More We have developed terahertz frequency quantum cascade lasers that exploit a double-periodicity distributed feedback grating to control the emission frequency and the output beam direction independently. The spatial refractive index modulation of the gratings necessary to provide optical feedback at a fixed frequency and, simultaneously, a far-field emission pattern centered at controlled angles, was designed through use of an appropriate wavevector scattering model. Single mode THz emission at angles tuned by design between 0° and 50° was realized, leading to an original phase-matching approach, lithographically independent, for highly collimated THz QCLs. △ Less

Submitted 30 May, 2016; originally announced May 2016.

arXiv:1605.01136 [pdf]

doi 10.1002/adfm.201504606

Atomically Thin Boron Nitride: Unique Properties and Applications

Authors: Lu Hua Li, Ying Chen

Abstract: Atomically thin boron nitride (BN) is an important two-dimensional (2D) nanomaterial, with many properties distinct from graphene. In this feature article, these unique properties and associated applications often not possible from graphene are outlined. The article starts with characterization and identification of atomically thin BN. It is followed by demonstrating their strong oxidation resista… ▽ More Atomically thin boron nitride (BN) is an important two-dimensional (2D) nanomaterial, with many properties distinct from graphene. In this feature article, these unique properties and associated applications often not possible from graphene are outlined. The article starts with characterization and identification of atomically thin BN. It is followed by demonstrating their strong oxidation resistance at high temperatures and applications in protecting metals from oxidation and corrosion. As flat insulators, BN nanosheets are ideal dielectric substrates for surface enhanced Raman spectroscopy (SERS) and electronic devices based on 2D heterostructures. The light emission of BN nanosheets in the deep ultraviolet (DUV) and ultraviolet (UV) regions are also included for its scientific and technological importance. The last part is dedicated to synthesis, characterization, and optical properties of BN nanoribbons, a special form of nanosheets. △ Less

Submitted 4 May, 2016; originally announced May 2016.

Journal ref: Advanced Functional Materials 26, 2594-2608, 2016

arXiv:1603.02305 [pdf]

Quantum Emission from Defects in Single Crystal Hexagonal Boron Nitride

Authors: Toan Trong Tran, Cameron Zachreson, Amanuel Michael Berhane, Kerem Bray, Russell Guy Sandstrom, Lu Hua Li, Takashi Taniguchi, Kenji Watanabe, Igor Aharonovich, Milos Toth

Abstract: Bulk hexagonal boron nitride (hBN) is a highly nonlinear natural hyperbolic material that attracts major attention in modern nanophotonics applications. However, studies of its optical properties in the visible part of the spectrum and quantum emitters hosted by bulk hBN have not been reported to date. In this work we study the emission properties of hBN crystals in the red spectral range using su… ▽ More Bulk hexagonal boron nitride (hBN) is a highly nonlinear natural hyperbolic material that attracts major attention in modern nanophotonics applications. However, studies of its optical properties in the visible part of the spectrum and quantum emitters hosted by bulk hBN have not been reported to date. In this work we study the emission properties of hBN crystals in the red spectral range using sub-bandgap optical excitation. Quantum emission from defects is observed at room temperature and characterized in detail. Our results advance the use of hBN in quantum nanophotonics technologies and enhance our fundamental understanding of its optical properties. △ Less

Submitted 7 March, 2016; originally announced March 2016.

Comments: In press in Physical Review Applied

arXiv:1503.03498 [pdf]

doi 10.1039/C5CP00532A

Boron Nitride Nanosheets as Improved and Reusable Substrates for Gold Nanoparticles Enabled Surface Enhanced Raman Spectroscopy

Authors: Qiran Cai, Lu Hua Li, Yuanlie Yu, Yun Liu, Shaoming Huang, Ying Chen, Kenji Watanabe, Takashi Taniguchi

Abstract: Atomically thin boron nitride (BN) nanosheets have been found an excellent substrate for noble metal particles enabled surface enhanced Raman spectroscopy (SERS), thanks to their good adsorption of aromatic molecules, high thermal stability and weak Raman scattering. Faceted gold (Au) nanoparticles have been synthesized on BN nanosheets by a simple but controllable and reproducible sputtering and… ▽ More Atomically thin boron nitride (BN) nanosheets have been found an excellent substrate for noble metal particles enabled surface enhanced Raman spectroscopy (SERS), thanks to their good adsorption of aromatic molecules, high thermal stability and weak Raman scattering. Faceted gold (Au) nanoparticles have been synthesized on BN nanosheets by a simple but controllable and reproducible sputtering and annealing method. The size and density of the Au particles can be controlled by sputtering time, current and annealing temperature etc. Under the same sputtering and annealing conditions, the Au particles on BN of different thicknesses show various sizes because the surface diffusion coefficients of Au depends on the thickness of BN. Intriguingly, decorated with similar morphology and distribution of Au particles, BN nanosheets exhibit better Raman enhancements than silicon substrate as well as bulk BN crystals. Additionally, BN nanosheets show no noticeable SERS signal and hence cause no interference to the Raman signal of analyte. The Au/BN substrates can be reused by heating in air to remove adsorbed analyte without loss of SERS enhancement. △ Less

Submitted 11 March, 2015; originally announced March 2015.

Comments: Complementary Info included

Journal ref: Physical Chemistry Chemical Physics 17, 7761-7766, 2015

arXiv:1503.01295 [pdf]

doi 10.1002/admi.201300132

Boron Nitride Nanosheets for Metal Protection

Authors: Lu Hua Li, Tan Xing, Ying Chen, Rob Jones

Abstract: Although the high impermeability of graphene makes it an excellent barrier to inhibit metal oxidation and corrosion, graphene can form a galvanic cell with the underlying metal that promotes corrosion of the metal in the long term. Boron nitride (BN) nanosheets which have a similar impermeability could be a better choice as protective barrier, because they are more thermally and chemically stable… ▽ More Although the high impermeability of graphene makes it an excellent barrier to inhibit metal oxidation and corrosion, graphene can form a galvanic cell with the underlying metal that promotes corrosion of the metal in the long term. Boron nitride (BN) nanosheets which have a similar impermeability could be a better choice as protective barrier, because they are more thermally and chemically stable than graphene and, more importantly, do not cause galvanic corrosion due to their electrical insulation. In this study, the performance of commercially available BN nanosheets grown by chemical vapor deposition as a protective coating on metal has been investigated. The heating of the copper foil covered with the BN nanosheet at 250 °C in air over 100 h results in dramatically less oxidation than the bare copper foil heated for 2 h under the same conditions. The electrochemical analyses reveal that the BN nanosheet coating can increase open circuit potential and possibly reduce oxidation of the underlying copper foil in sodium chloride solution. These results indicate that BN nanosheets are a good candidate for oxidation and corrosion protection, although conductive atomic force microscopy analyses show that the effectiveness of the protection relies on the quality of BN nanosheets. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: With Supporting Information

Journal ref: Advanced Materials Interfaces 2014, 1, 1300132

arXiv:1503.01292 [pdf]

doi 10.1063/1.4903040

Electric Contributions to Magnetic Force Microscopy Response from Graphene and MoS2 Nanosheets

Authors: Lu Hua Li, Ying Chen

Abstract: Magnetic force microscopy (MFM) signals have recently been detected from whole pieces of mechanically exfoliated graphene and molybdenum disulfide (MoS2) nanosheets and magnetism of the two nanomaterials was claimed based on these observations. However, non-magnetic interactions or artefacts are commonly associated with MFM signals, which makes the interpretation of MFM signals not straightforward… ▽ More Magnetic force microscopy (MFM) signals have recently been detected from whole pieces of mechanically exfoliated graphene and molybdenum disulfide (MoS2) nanosheets and magnetism of the two nanomaterials was claimed based on these observations. However, non-magnetic interactions or artefacts are commonly associated with MFM signals, which makes the interpretation of MFM signals not straightforward. A systematic investigation has been done to examine possible sources of the MFM signals from graphene and MoS2 nanosheets and whether the MFM signals can be correlated with magnetism. It is found that the MFM signals have significant non-magnetic contributions due to capacitive and electrostatic interactions between the nanosheets and conductive cantilever tip, as demonstrated by electric force microscopy (EFM) and scanning Kevin probe microscopy (SKPM) analyses. In addition, the MFM signals of graphene and MoS2 nanosheets are not responsive to reversed magnetic field of the magnetic cantilever tip. Therefore, the observed MFM response is mainly from electric artefacts and not compelling enough to correlate with magnetism of graphene and MoS2 nanosheets. △ Less

Submitted 4 March, 2015; originally announced March 2015.

Comments: With Supporting Information

Journal ref: Journal of Applied Physics 116, 213904 (2014)

arXiv:1503.00380 [pdf]

doi 10.1021/nl503411a

Dielectric Screening in Atomically Thin Boron Nitride Nanosheets

Authors: Lu Hua Li, Elton J. G. Santos, Tan Xing, Emmanuele Cappelluti, Rafael Roldán, Ying Chen, Kenji Watanabe, Takashi Taniguchi

Abstract: Two-dimensional (2D) hexagonal boron nitride (BN) nanosheets are excellent dielectric substrate for graphene, molybdenum disulfide and many other 2D nanomaterials based electronic and photonic devices. To optimize the performance of these 2D devices, it is essential to understand the dielectric screening properties of BN nanosheets as a function of the thickness. Here, electric force microscopy al… ▽ More Two-dimensional (2D) hexagonal boron nitride (BN) nanosheets are excellent dielectric substrate for graphene, molybdenum disulfide and many other 2D nanomaterials based electronic and photonic devices. To optimize the performance of these 2D devices, it is essential to understand the dielectric screening properties of BN nanosheets as a function of the thickness. Here, electric force microscopy along with theoretical calculations based on both state-of-the-art first-principles calculations with van der Waals interactions under consideration and non-linear Thomas-Fermi theory models are used to investigate the dielectric screening in high-quality BN nanosheets of different thicknesses. It is found that atomically thin BN nanosheets are less effective in electric field screening, but the screening capability of BN shows a relatively weak dependence on the layer thickness. △ Less

Submitted 1 March, 2015; originally announced March 2015.

Journal ref: Nano Letters 15(1), 218-223, 2015

arXiv:1410.7529 [pdf]

doi 10.1021/nn501506p

Observation of Active Sites for Oxygen Reduction Reaction on Nitrogen-doped Multilayer Graphene

Authors: Tan Xing, Yao Zheng, Lu Hua Li, Bruce C. C. Cowie, Daniel Gunzelmann, Shi Zhang Qiao, Shaoming Huang, Ying Chen

Abstract: Active sites and catalytic mechanism of nitrogen-doped graphene in oxygen reduction reaction (ORR) have been extensively studied but are still inconclusive, partly due to the lack of an experimental method that can detect the active sites. It is proposed in this report that the active sites on nitrogen-doped graphene can be determined via the examination of its chemical composition change before a… ▽ More Active sites and catalytic mechanism of nitrogen-doped graphene in oxygen reduction reaction (ORR) have been extensively studied but are still inconclusive, partly due to the lack of an experimental method that can detect the active sites. It is proposed in this report that the active sites on nitrogen-doped graphene can be determined via the examination of its chemical composition change before and after ORR. Synchrotron-based X-ray photoelectron spectroscopy analyses of three nitrogen-doped multilayer graphene samples reveal that oxygen reduction intermediate OH(ads) which should chemically attach to the active sites remains on the carbon atoms neighboring pyridinic nitrogen after ORR. In addition, a high amount of the OH(ads) attachment after ORR corresponds to a high catalytic efficiency and vice versa. These pinpoint that the carbon atoms close to pyridinic nitrogen are the main active sites among the different nitrogen do** configurations. △ Less

Submitted 28 October, 2014; originally announced October 2014.

Journal ref: ACS Nano 8(7), 6856-6862, 2014

arXiv:1403.1002 [pdf]

doi 10.1021/nn500059s

Strong Oxidation Resistance of Atomically Thin Boron Nitride Nanosheets

Authors: Lu Hua Li, Jiri Cervenka, Kenji Watanabe, Takashi Taniguchi, Ying Chen

Abstract: Investigation on oxidation resistance of two-dimensional (2D) materials is critical for many of their applications, because 2D materials could have higher oxidation kinetics than their bulk counterparts due to predominant surface atoms and structural distortions. In this study, the oxidation behavior of high-quality boron nitride (BN) nanosheets of 1-4 layer thick has been examined by heating in a… ▽ More Investigation on oxidation resistance of two-dimensional (2D) materials is critical for many of their applications, because 2D materials could have higher oxidation kinetics than their bulk counterparts due to predominant surface atoms and structural distortions. In this study, the oxidation behavior of high-quality boron nitride (BN) nanosheets of 1-4 layer thick has been examined by heating in air. Atomic force microscopy and Raman spectroscopy analyses reveal that monolayer BN nanosheets can sustain up to 850 °C and the starting temperature of oxygen do**/oxidation of BN nanosheets only slightly increases with the increase of nanosheet layer and depends on heating conditions. Elongated etch lines are found on the oxidized monolayer BN nanosheets, suggesting that the BN nanosheets are first cut along the chemisorbed oxygen chains and then the oxidative etching grows perpendicularly to these cut lines. The stronger oxidation resistance of BN nanosheets suggests that they are more preferable for high-temperature applications than graphene. △ Less

Submitted 4 March, 2014; originally announced March 2014.

Journal ref: ACS Nano 8, 1457, 2014

Showing 1–50 of 70 results for author: Li, L H