Search | arXiv e-print repository

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Authors: Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

Abstract: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to th… ▽ More We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized LLaMa2-70B model has losses of at most 0.29 WikiText-2 perplexity and retains 99% of the zero-shot performance. Code is available at: https://github.com/spcl/QuaRot. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 19 pages, 6 figures

arXiv:2402.02622 [pdf, other]

DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging

Authors: Matteo Pagliardini, Amirkeivan Mohtashami, Francois Fleuret, Martin Jaggi

Abstract: The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard architecture that improves the perplexity of the model without increasing its size -- adding a few thousand parameters for large-scale models in the 100B param… ▽ More The transformer architecture by Vaswani et al. (2017) is now ubiquitous across application domains, from natural language processing to speech processing and image understanding. We propose DenseFormer, a simple modification to the standard architecture that improves the perplexity of the model without increasing its size -- adding a few thousand parameters for large-scale models in the 100B parameters range. Our approach relies on an additional averaging step after each transformer block, which computes a weighted average of current and past representations -- we refer to this operation as Depth-Weighted-Average (DWA). The learned DWA weights exhibit coherent patterns of information flow, revealing the strong and structured reuse of activations from distant layers. Experiments demonstrate that DenseFormer is more data efficient, reaching the same perplexity of much deeper transformer models, and that for the same perplexity, these new models outperform transformer baselines in terms of memory efficiency and inference time. △ Less

Submitted 21 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2312.11441 [pdf, other]

Social Learning: Towards Collaborative Learning with Large Language Models

Authors: Amirkeivan Mohtashami, Florian Hartmann, Sian Gooding, Lukas Zilka, Matt Sharifi, Blaise Aguera y Arcas

Abstract: We introduce the framework of "social learning" in the context of large language models (LLMs), whereby models share knowledge with each other in a privacy-aware manner using natural language. We present and evaluate two approaches for knowledge transfer between LLMs. In the first scenario, we allow the model to generate abstract prompts aiming to teach the task. In our second approach, models tra… ▽ More We introduce the framework of "social learning" in the context of large language models (LLMs), whereby models share knowledge with each other in a privacy-aware manner using natural language. We present and evaluate two approaches for knowledge transfer between LLMs. In the first scenario, we allow the model to generate abstract prompts aiming to teach the task. In our second approach, models transfer knowledge by generating synthetic examples. We evaluate these methods across diverse datasets and quantify memorization as a proxy for privacy loss. These techniques inspired by social learning yield promising results with low memorization of the original data. In particular, we show that performance using these methods is comparable to results with the use of original labels and prompts. Our work demonstrates the viability of social learning for LLMs, establishes baseline approaches and highlights several unexplored areas for future work. △ Less

Submitted 8 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.16079 [pdf, other]

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Authors: Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, Antoine Bosselut

Abstract: Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by rele… ▽ More Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2310.10845 [pdf, other]

CoTFormer: More Tokens With Attention Make Up For Less Depth

Authors: Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi

Abstract: The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the Chain-of-Thought (CoT) method continue to play a pivotal role in achieving optimal downstream performance. In this work, we establish an approximate parallel between using chain-of-thought and employing a deeper transformer. Building on this insight, we introduce CoTFormer, a transf… ▽ More The race to continually develop ever larger and deeper foundational models is underway. However, techniques like the Chain-of-Thought (CoT) method continue to play a pivotal role in achieving optimal downstream performance. In this work, we establish an approximate parallel between using chain-of-thought and employing a deeper transformer. Building on this insight, we introduce CoTFormer, a transformer variant that employs an implicit CoT-like mechanism to achieve capacity comparable to a deeper model. Our empirical findings demonstrate the effectiveness of CoTFormers, as they significantly outperform larger standard transformers. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2305.16300 [pdf, other]

Landmark Attention: Random-Access Infinite Context Length for Transformers

Authors: Amirkeivan Mohtashami, Martin Jaggi

Abstract: While Transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts. Prior approaches, such as recurrent memory or retrieval-based augmentation, have either compromised the random-access flexibility of attention (i.e., the capability to select any token in the entire context) or… ▽ More While Transformers have shown remarkable success in natural language processing, their attention mechanism's large memory requirements have limited their ability to handle longer contexts. Prior approaches, such as recurrent memory or retrieval-based augmentation, have either compromised the random-access flexibility of attention (i.e., the capability to select any token in the entire context) or relied on separate mechanisms for relevant context retrieval, which may not be compatible with the model's attention. In this paper, we present a novel approach that allows access to the complete context while retaining random-access flexibility, closely resembling running attention on the entire context. Our method uses a landmark token to represent each block of the input and trains the attention to use it for selecting relevant blocks, enabling retrieval of blocks directly through the attention mechanism instead of by relying on a separate mechanism. Our approach seamlessly integrates with specialized data structures and the system's memory hierarchy, enabling processing of arbitrarily long context lengths. We demonstrate that our method can obtain comparable performance with Transformer-XL while significantly reducing the number of retrieved tokens in each step. Finally, we show that fine-tuning LLaMA 7B with our method successfully extends its context length capacity to over 32k tokens, allowing for inference at the context lengths of GPT-4. We release the implementation of landmark attention and the code to reproduce our experiments at https://github.com/epfml/landmark-attention/. △ Less

Submitted 19 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: Published as a conference paper at NeurIPS 2023 - 37th Conference on Neural Information Processing Systems

arXiv:2302.03491 [pdf, ps, other]

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Authors: Amirkeivan Mohtashami, Mauro Verzetti, Paul K. Rubenstein

Abstract: Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotat… ▽ More Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2205.15142 [pdf, other]

Special Properties of Gradient Descent with Large Learning Rates

Authors: Amirkeivan Mohtashami, Martin Jaggi, Sebastian Stich

Abstract: When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well understood theoretically. Several previous works have attributed this success to the stochastic noise present in SGD. However, we show through a novel set of exper… ▽ More When training neural networks, it has been widely observed that a large step size is essential in stochastic gradient descent (SGD) for obtaining superior models. However, the effect of large step sizes on the success of SGD is not well understood theoretically. Several previous works have attributed this success to the stochastic noise present in SGD. However, we show through a novel set of experiments that the stochastic noise is not sufficient to explain good non-convex training, and that instead the effect of a large learning rate itself is essential for obtaining best performance.We demonstrate the same effects also in the noise-less case, i.e. for full-batch GD. We formally prove that GD with large step size -- on certain non-convex function classes -- follows a different trajectory than GD with a small step size, which can lead to convergence to a global minimum instead of a local one. Our settings provide a framework for future analysis which allows comparing algorithms based on behaviors that can not be observed in the traditional settings. △ Less

Submitted 16 February, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: A short version of this work appeared in ICML 22 ICML Workshop on Continuous Time Methods for Machine Learning under the title "The Gap Between Continuous and Discrete Gradient Descent"

arXiv:2202.01838 [pdf, other]

Characterizing & Finding Good Data Orderings for Fast Convergence of Sequential Gradient Methods

Authors: Amirkeivan Mohtashami, Sebastian Stich, Martin Jaggi

Abstract: While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge faster than SGD. When the order is chosen deterministically, a variant called incremental gradient descent (IG), the existing convergence bounds show improvemen… ▽ More While SGD, which samples from the data with replacement is widely studied in theory, a variant called Random Reshuffling (RR) is more common in practice. RR iterates through random permutations of the dataset and has been shown to converge faster than SGD. When the order is chosen deterministically, a variant called incremental gradient descent (IG), the existing convergence bounds show improvement over SGD but are worse than RR. However, these bounds do not differentiate between a good and a bad ordering and hold for the worst choice of order. Meanwhile, in some cases, choosing the right order when using IG can lead to convergence faster than RR. In this work, we quantify the effect of order on convergence speed, obtaining convergence bounds based on the chosen sequence of permutations while also recovering previous results for RR. In addition, we show benefits of using structured shuffling when various levels of abstractions (e.g. tasks, classes, augmentations, etc.) exists in the dataset in theory and in practice. Finally, relying on our measure, we develop a greedy algorithm for choosing good orders during training, achieving superior performance (by more than 14 percent in accuracy) over RR. △ Less

Submitted 3 February, 2022; originally announced February 2022.

arXiv:2106.08895 [pdf, other]

Masked Training of Neural Networks with Partial Gradients

Authors: Amirkeivan Mohtashami, Martin Jaggi, Sebastian U. Stich

Abstract: State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD updates to a subset of parameters for increased efficiency (such as meProp) or a combination of both (such as Dropout). However, the convergence of these methods… ▽ More State-of-the-art training algorithms for deep learning models are based on stochastic gradient descent (SGD). Recently, many variations have been explored: perturbing parameters for better accuracy (such as in Extragradient), limiting SGD updates to a subset of parameters for increased efficiency (such as meProp) or a combination of both (such as Dropout). However, the convergence of these methods is often not studied in theory. We propose a unified theoretical framework to study such SGD variants -- encompassing the aforementioned algorithms and additionally a broad variety of methods used for communication efficient training or model compression. Our insights can be used as a guide to improve the efficiency of such methods and facilitate generalization to new applications. As an example, we tackle the task of jointly training networks, a version of which (limited to sub-networks) is used to create Slimmable Networks. By training a low-rank Transformer jointly with a standard one we obtain superior performance than when it is trained separately. △ Less

Submitted 22 March, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

arXiv:2103.02351 [pdf, other]

Critical Parameters for Scalable Distributed Learning with Large Batches and Asynchronous Updates

Authors: Sebastian U. Stich, Amirkeivan Mohtashami, Martin Jaggi

Abstract: It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness. Especially, it has been observed that the speedup saturates beyond a certain batch size and/or when the delays grow too large. We identify a data-dependent parameter that explains the… ▽ More It has been experimentally observed that the efficiency of distributed training with stochastic gradient (SGD) depends decisively on the batch size and -- in asynchronous implementations -- on the gradient staleness. Especially, it has been observed that the speedup saturates beyond a certain batch size and/or when the delays grow too large. We identify a data-dependent parameter that explains the speedup saturation in both these settings. Our comprehensive theoretical analysis, for strongly convex, convex and non-convex settings, unifies and generalized prior work directions that often focused on only one of these two aspects. In particular, our approach allows us to derive improved speedup results under frequently considered sparsity assumptions. Our insights give rise to theoretically based guidelines on how the learning rates can be adjusted in practice. We show that our results are tight and illustrate key findings in numerical experiments. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021

arXiv:2008.01009 [pdf, other]

The Splay-List: A Distribution-Adaptive Concurrent Skip-List

Authors: Vitaly Aksenov, Dan Alistarh, Alexandra Drozdova, Amirkeivan Mohtashami

Abstract: The design and implementation of efficient concurrent data structures have seen significant attention. However, most of this work has focused on concurrent data structures providing good \emph{worst-case} guarantees. In real workloads, objects are often accessed at different rates, since access distributions may be non-uniform. Efficient distribution-adaptive data structures are known in the seque… ▽ More The design and implementation of efficient concurrent data structures have seen significant attention. However, most of this work has focused on concurrent data structures providing good \emph{worst-case} guarantees. In real workloads, objects are often accessed at different rates, since access distributions may be non-uniform. Efficient distribution-adaptive data structures are known in the sequential case, e.g. the splay-trees; however, they often are hard to translate efficiently in the concurrent case. In this paper, we investigate distribution-adaptive concurrent data structures and propose a new design called the splay-list. At a high level, the splay-list is similar to a standard skip-list, with the key distinction that the height of each element adapts dynamically to its access rate: popular elements ``move up,'' whereas rarely-accessed elements decrease in height. We show that the splay-list provides order-optimal amortized complexity bounds for a subset of operations while being amenable to efficient concurrent implementation. Experimental results show that the splay-list can leverage distribution-adaptivity to improve on the performance of classic concurrent designs, and can outperform the only previously-known distribution-adaptive design in certain settings. △ Less

Submitted 3 August, 2020; originally announced August 2020.

arXiv:2007.01662 [pdf, other]

Scattering contrast in GHz frequency ultrasound subsurface atomic force microscopy for detection of deeply buried features

Authors: Maarten H. van Es, Benoit A. J. Quesson, Abbas Mohtashami, Daniele Piras, Kodai Hatakeyama, Laurent Fillinger, Paul L. M. J. van Neer

Abstract: While Atomic Force Microscopy is mostly used to investigate surface properties, people have almost since its invention sought to apply its high resolution capability to image also structures buried within samples. One of the earliest techniques for this was based on using ultrasound excitations to visualize local differences in effective tip-sample stiffness caused by the presence of buried struct… ▽ More While Atomic Force Microscopy is mostly used to investigate surface properties, people have almost since its invention sought to apply its high resolution capability to image also structures buried within samples. One of the earliest techniques for this was based on using ultrasound excitations to visualize local differences in effective tip-sample stiffness caused by the presence of buried structures with different visco-elasticity from their surroundings. While the use of ultrasound has often triggered discussions on the contribution of diffraction or scattering of acoustic waves in visualizing buried structures, no conclusive papers on this topic have been published. Here we demonstrate and discuss how such acoustical effects can be unambiguously recognized and can be used with Atomic Force Microscopy to visualize deeply buried structures. △ Less

Submitted 3 July, 2020; originally announced July 2020.

Comments: 18 pages, 5 figures

arXiv:1506.00140 [pdf, other]

doi 10.1103/PhysRevApplied.4.054014

Angle-resolved polarimetry measurements of antenna-mediated fluorescence

Authors: Abbas Mohtashami, Clara I. Osorio, A. Femius Koenderink

Abstract: Optical phase-array antennas can be used to control not only the angular distribution but also the polarization of fluorescence from quantum emitters. The emission pattern of the resulting system is determined by the properties of the antenna, the properties of the emitters and the strength of the antenna-emitter coupling. Here we show that Fourier polarimetry can be used to characterize these thr… ▽ More Optical phase-array antennas can be used to control not only the angular distribution but also the polarization of fluorescence from quantum emitters. The emission pattern of the resulting system is determined by the properties of the antenna, the properties of the emitters and the strength of the antenna-emitter coupling. Here we show that Fourier polarimetry can be used to characterize these three contributions. To this end, we measured the angle and Stokes-parameter resolved emission of bullseye plasmon antennas as well as spiral antennas excited by an ensemble of emitters. We estimate the antenna-emitter coupling on basis of the degree of polarization, and determine the effect of anisotropy in the intrinsic emitter orientation on polarization of the resulting emission pattern. Our results not only provide new insights in the behavior of bullseye and spiral antennas, but also demonstrate the potential of Fourier polarimetry when characterizing antenna mediated fluorescence. △ Less

Submitted 30 May, 2015; originally announced June 2015.

Comments: 7 pages, 5 figures

Journal ref: Phys. Rev. Applied 4, 054014 (2015)

arXiv:1212.5172 [pdf, ps, other]

doi 10.1088/1367-2630/15/4/043017

Suitability of nanodiamond NV centers for spontaneous emission control experiments

Authors: Abbas Mohtashami, A. Femius Koenderink

Abstract: NV centers in diamond are generally recognized as highly promising as indefinitely stable highly efficient single-photon sources. We report an experimental quantification of the brightness, radiative decay rate, nonradiative decay rate and quantum efficiency of single NV centers in diamond nanocrystals. Our experiments show that the commonly observed large spread in fluorescence decay rates of NV… ▽ More NV centers in diamond are generally recognized as highly promising as indefinitely stable highly efficient single-photon sources. We report an experimental quantification of the brightness, radiative decay rate, nonradiative decay rate and quantum efficiency of single NV centers in diamond nanocrystals. Our experiments show that the commonly observed large spread in fluorescence decay rates of NV centers in nanodiamond is inconsistent with the common explanation of large nanophotonic mode-density variations in the ultra-small high-index crystals at near-unity quantum efficiency. We report that NV centers in 25 nm nanocrystals are essentially insensitive to local density of states (LDOS) variations that we induce at a dielectric interface by using liquids to vary the refractive index, and propose that quantum efficiencies in such nanocrystals are widely distributed between 0% and 20%. For single NV centers in larger 100 nm nanocrystals, we show that decay rate changes can be reversibly induced by nanomechanically approaching a mirror to change the LDOS. Using this scanning mirror method, for the first time we report calibrated quantum efficiencies of NV centers, and show that different but nominally identical nanocrystals have widely distributed quantum efficiencies between 10% and 90%. Our measurements imply that nanocrystals that are to be assembled into hybrid photonic structures for cavity QED should first be individually screened to assess fluorescence properties in detail. △ Less

Submitted 20 December, 2012; originally announced December 2012.

Comments: 27 pages, 6 figures

arXiv:1212.5081 [pdf, ps, other]

doi 10.1063/1.4798327

Nanomechanical method to gauge emission quantum yield applied to NV-centers in nanodiamond

Authors: Martin Frimmer, Abbas Mohtashami, A. Femius Koenderink

Abstract: We present a technique to nanomechanically vary the distance between a fluorescent source and a mirror, thereby varying the local density of optical states at the source position. Our method can therefore serve to measure the quantum efficiency of fluorophores. Application of our technique to NV defects in diamond nanocrystals shows that their quantum yield can significantly differ from unity. Rel… ▽ More We present a technique to nanomechanically vary the distance between a fluorescent source and a mirror, thereby varying the local density of optical states at the source position. Our method can therefore serve to measure the quantum efficiency of fluorophores. Application of our technique to NV defects in diamond nanocrystals shows that their quantum yield can significantly differ from unity. Relying on a lateral scanning mechanism with shear-force probe-sample distance control our technique is straightforwardly implemented in most state-of-the-art near-field microscopes. △ Less

Submitted 20 December, 2012; originally announced December 2012.

Comments: 11 pages, 3 figures

arXiv:1007.3032 [pdf, other]

doi 10.1063/1.3576137

Non-resonant feeding of photonic crystal nanocavity modes by quantum dots

Authors: A. Laucht, N. Hauke, A. Neumann, T. Günthner, F. Hofbauer, A. Mohtashami, K. Müller, G. Böhm, M. Bichler, M. -C. Amann, M. Kaniber, J. J. Finley

Abstract: We experimentally probe the non-resonant feeding of photons into the optical mode of a two dimensional photonic crystal nanocavity from the discrete emission from a quantum dot. For a strongly coupled system of a single exciton and the cavity mode, we track the detuning-dependent photoluminescence intensity of the polariton peaks at different lattice temperatures. At low temperatures we observe a… ▽ More We experimentally probe the non-resonant feeding of photons into the optical mode of a two dimensional photonic crystal nanocavity from the discrete emission from a quantum dot. For a strongly coupled system of a single exciton and the cavity mode, we track the detuning-dependent photoluminescence intensity of the polariton peaks at different lattice temperatures. At low temperatures we observe a clear asymmetry in the emission intensity depending on whether the exciton is at higher or lower energy than the cavity mode. At high temperatures this asymmetry vanishes when the probabilities to emit or absorb a phonon become similar. For a different dot-cavity system where the cavity mode is detuned by ΔE>5 meV to lower energy than the single exciton transitions emission from the mode remains correlated with the quantum dot as demonstrated unambiguously by cross-correlation photon counting experiments. By monitoring the temporal evolution of the photoluminescence spectrum, we show that feeding of photons into the mode occurs from multi-exciton transitions. We observe a clear anti-correlation of the mode and single exciton emission; the mode emission quenches as the population in the system reduces towards the single exciton level whilst the intensity of the mode emission tracks the multi-exciton transitions. △ Less

Submitted 18 July, 2010; originally announced July 2010.

Journal ref: J. Appl. Phys. 109, 102404 (2011)

arXiv:1003.2946 [pdf, other]

doi 10.1103/PhysRevB.81.241302

Temporal Monitoring of Non-resonant Feeding of Semiconductor Nanocavity Modes by Quantum Dot Multiexciton Transitions

Authors: A. Laucht, M. Kaniber, A. Mohtashami, N. Hauke, M. Bichler, J. J. Finley

Abstract: We experimentally investigate the non-resonant feeding of photons into the optical mode of a zero dimensional nanocavity by quantum dot multiexciton transitions. Power dependent photoluminescence measurements reveal a super-linear power dependence of the mode emission, indicating that the emission stems from multiexcitons. By monitoring the temporal evolution of the photoluminescence spectrum, we… ▽ More We experimentally investigate the non-resonant feeding of photons into the optical mode of a zero dimensional nanocavity by quantum dot multiexciton transitions. Power dependent photoluminescence measurements reveal a super-linear power dependence of the mode emission, indicating that the emission stems from multiexcitons. By monitoring the temporal evolution of the photoluminescence spectrum, we observe a clear anticorrelation of the mode and single exciton emission; the mode emission quenches as the population in the system reduces towards the single exciton level whilst the intensity of the mode emission tracks the multi-exciton transitions. Our results lend strong support to a recently proposed mechanism mediating the strongly non-resonant feeding of photons into the cavity mode. △ Less

Submitted 15 March, 2010; originally announced March 2010.

Journal ref: Physical Review B 81, 241302(R) (2010)

arXiv:0910.3749 [pdf, other]

doi 10.1103/PhysRevB.80.201311

Phonon-assisted transitions from quantum dot excitons to cavity photons

Authors: Ulrich Hohenester, Arne Laucht, Michael Kaniber, Norman Hauke, Abbas Mohtashami, Marek Seliger, Jonathan J. Finley

Abstract: For a single semiconductor quantum dot embedded in a microcavity, we theoretically and experimentally investigate phonon-assisted transitions between excitons and the cavity mode. Within the framework of the independent boson model we find that such transitions can be very efficient, even for relatively large exciton-cavity detunings of several millielectron volts. Furthermore, we predict a stro… ▽ More For a single semiconductor quantum dot embedded in a microcavity, we theoretically and experimentally investigate phonon-assisted transitions between excitons and the cavity mode. Within the framework of the independent boson model we find that such transitions can be very efficient, even for relatively large exciton-cavity detunings of several millielectron volts. Furthermore, we predict a strong detuning asymmetry for the exciton lifetime that vanishes for elevated lattice temperature. Our findings are corroborated by experiment, which turns out to be in good quantitative and qualitative agreement with theory. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Journal ref: Phys. Rev. B 80, 201311(R) (2009)

Showing 1–19 of 19 results for author: Mohtashami, A