Search | arXiv e-print repository

arXiv:2406.05803 [pdf]

Production and distribution planning, scheduling, and routing optimization in a yogurt supply chain under demand uncertainty: A case study

Authors: Babak Javadi, Zeinab Salimzadeh, Amir Hossein Akbari, Mahla Yadegari, Mohammadreza Abdali

Abstract: Considering the evolution of the food industry and its challenges, like high perishability, managing the food industry supply chain is a key focus for researchers and decision-makers. Uncertainty in decision-making has gained importance, particularly in the yogurt industry, known for its complexity. This study addresses production and distribution planning, scheduling, and routing in the yogurt su… ▽ More Considering the evolution of the food industry and its challenges, like high perishability, managing the food industry supply chain is a key focus for researchers and decision-makers. Uncertainty in decision-making has gained importance, particularly in the yogurt industry, known for its complexity. This study addresses production and distribution planning, scheduling, and routing in the yogurt supply chain. The problem is characterized by multiple products, a single plant, multiple distribution centers, multiple periods, and various transportation methods. A mixed-integer non-linear programming (MINLP) model is used to minimize total costs, including production, setup, overtime, unmet demand, and transportation. Additionally, a robust fuzzy programming approach is applied under uncertainty, with linearization procedures proposed to convert it into a linearized mixed-integer programming formulation. The problem is tested with two data types: a sample problem in three sizes (small, medium, and large) and real data from Kalle Dairy Company, Iran. A Genetic Algorithm (GA) is developed to solve the problem, with necessary modifications made for its application. The GA's performance is compared to an exact algorithm (Branch & Cut), showing that the company's production policy adapts daily to meet demand precisely. The shift to smaller batch production and longer shelf life allows better stock allocation and avoids shortages in uncertain conditions. The company's policies adapt to severe fluctuations in the business environment, though this requires high costs, such as inventory maintenance. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2403.12291 [pdf]

Photon statistics analysis of h-BN quantum emitters with pulsed and continuous-wave excitation

Authors: Hamidreza Akbari, Pankaj K. Jha, Kristina Malinowski, Benjamin E. C. Koltenbah, Harry A. Atwater

Abstract: We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We… ▽ More We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We investigate the effect of cryogenic temperatures on Mandel Q and conclude that the photon statistics vary weakly with temperature. Through calculation of spontaneous emission from an excited two-level emitter model, we demonstrate good agreement between measured and calculated Mandel Q parameter when accounting for the experimental photon collection efficiency. Finally, we illustrate the usefulness of Mandel Q in quantum applications by the example of random number generation and analyze the effect of Mandel Q on the speed of generating random bits via this method. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Main text: 22 pages, 8 figures, 1 table. Supplemental document: 2 pages, 1 figure. Submitted to APL Quantum

arXiv:2312.14125 [pdf, other]

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Authors: Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam , et al. (6 additional authors not shown)

Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas… ▽ More We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/ △ Less

Submitted 4 June, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: To appear at ICML 2024; Project page: http://sites.research.google/videopoet/

arXiv:2305.06324 [pdf, other]

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Authors: Hassan Akbari, Dan Kondratyuk, Yin Cui, Rachel Hornung, Huisheng Wang, Hartwig Adam

Abstract: We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient… ▽ More We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model and task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) Performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model. 2) Sparsification with MoE on a single modality-agnostic encoder substantially improves the performance, outperforming dense models that use modality-specific encoders or additional fusion layers and greatly mitigates the conflicts between modalities. IMP achieves competitive performance on a wide range of downstream tasks including video classification, image classification, image-text, and video-text retrieval. Most notably, we train a sparse IMP-MoE-L variant focusing on video tasks that achieves new state-of-the-art in zero-shot video classification: 77.0% on Kinetics-400, 76.8% on Kinetics-600, and 68.3% on Kinetics-700, improving the previous state-of-the-art by +5%, +6.7%, and +5.8%, respectively, while using only 15% of their total training computational cost. △ Less

Submitted 11 December, 2023; v1 submitted 10 May, 2023; originally announced May 2023.

arXiv:2302.03720 [pdf]

Rydberg Excitons and Trions in Monolayer MoTe$_2$

Authors: Souvik Biswas, Aurélie Champagne, Jonah B. Haber, Supavit Pokawanvit, Joeson Wong, Hamidreza Akbari, Sergiy Krylyuk, Kenji Watanabe, Takashi Taniguchi, Albert V. Davydov, Zakaria Y. Al Balushi, Diana Y. Qiu, Felipe H. da Jornada, Jeffrey B. Neaton, Harry A. Atwater

Abstract: Monolayer transition metal dichalcogenide (TMDC) semiconductors exhibit strong excitonic optical resonances which serve as a microscopic, non-invasive probe into their fundamental properties. Like the hydrogen atom, such excitons can exhibit an entire Rydberg series of resonances. Excitons have been extensively studied in most TMDCs (MoS$_2$, MoSe$_2$, WS$_2$ and WSe$_2$), but detailed exploration… ▽ More Monolayer transition metal dichalcogenide (TMDC) semiconductors exhibit strong excitonic optical resonances which serve as a microscopic, non-invasive probe into their fundamental properties. Like the hydrogen atom, such excitons can exhibit an entire Rydberg series of resonances. Excitons have been extensively studied in most TMDCs (MoS$_2$, MoSe$_2$, WS$_2$ and WSe$_2$), but detailed exploration of excitonic phenomena has been lacking in the important TMDC material molybdenum ditelluride (MoTe$_2$). Here, we report an experimental investigation of excitonic luminescence properties of monolayer MoTe$_2$ to understand the excitonic Rydberg series, up to 3s. We report significant modification of emission energies with temperature (4K to 300K), quantifying the exciton-phonon coupling. Furthermore, we observe a strongly gate-tunable exciton-trion interplay for all the Rydberg states governed mainly by free-carrier screening, Pauli blocking, and band-gap renormalization in agreement with the results of first-principles GW plus Bethe-Salpeter equation approach calculations. Our results help bring monolayer MoTe$_2$ closer to its potential applications in near-infrared optoelectronics and photonic devices. △ Less

Submitted 7 February, 2023; originally announced February 2023.

Comments: 5 figures (main text)

arXiv:2211.02077 [pdf, other]

Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization

Authors: Junru Wu, Yi Liang, Feng Han, Hassan Akbari, Zhangyang Wang, Cong Yu

Abstract: Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modal… ▽ More Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the commonly adopted instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos from the internet. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only instructional videos, there exist strong gradient conflicts between different CMA losses within the same video, audio, text triplet, indicating them as the noisy source of supervision. We then propose to harmonize such gradients, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for each sample triplet, so that their gradient directions are more aligned; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training on less noisy sample triplets. Applying those techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Accepted at NeurIPS 2022

arXiv:2209.06794 [pdf, other]

PaLI: A Jointly-Scaled Multilingual Language-Image Model

Authors: Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz, Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Nan Ding, Keran Rong, Hassan Akbari, Gaurav Mishra, Linting Xue, Ashish Thapliyal, James Bradbury, Weicheng Kuo, Mojtaba Seyedhosseini, Chao Jia, Burcu Karagol Ayan, Carlos Riquelme, Andreas Steiner , et al. (4 additional authors not shown)

Abstract: Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaL… ▽ More Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pre-trained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train a large, 4-billion parameter ViT (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design. △ Less

Submitted 5 June, 2023; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: ICLR 2023 (Notable-top-5%)

arXiv:2208.14445 [pdf]

Artificial intelligence-based locoregional markers of brain peritumoral microenvironment

Authors: Zahra Riahi Samani, Drew Parker, Hamed Akbari, Spyridon Bakas, Ronald L. Wolf, Steven Brem, Ragini Verma

Abstract: In malignant primary brain tumors, cancer cells infiltrate into the peritumoral brain structures which results in inevitable recurrence. Quantitative assessment of infiltrative heterogeneity in the peritumoral region, the area where biopsy or resection can be hazardous, is important for clinical decision making. Previous work on characterizing the infiltrative heterogeneity in the peritumoral regi… ▽ More In malignant primary brain tumors, cancer cells infiltrate into the peritumoral brain structures which results in inevitable recurrence. Quantitative assessment of infiltrative heterogeneity in the peritumoral region, the area where biopsy or resection can be hazardous, is important for clinical decision making. Previous work on characterizing the infiltrative heterogeneity in the peritumoral region used various imaging modalities, but information of extracellular free water movement restriction has been limitedly explored. Here, we derive a unique set of Artificial Intelligence (AI)-based markers capturing the heterogeneity of tumor infiltration, by characterizing free water movement restriction in the peritumoral region using Diffusion Tensor Imaging (DTI)-based free water volume fraction maps. A novel voxel-wise deep learning-based peritumoral microenvironment index (PMI) is first extracted by leveraging the widely different water diffusivity properties of glioblastomas and brain metastases as regions with and without infiltrations in the peritumoral tissue. Descriptive characteristics of locoregional hubs of uniformly high PMI values are extracted as AI-based markers to capture distinct aspects of infiltrative heterogeneity. The proposed markers are applied to two clinical use cases on an independent population of 275 adult-type diffuse gliomas (CNS WHO grade 4), analyzing the duration of survival among Isocitrate-Dehydrogenase 1 (IDH1)-wildtypes and the differences with IDH1-mutants. Our findings provide a panel of markers as surrogates of infiltration that captures unique insight about underlying biology of peritumoral microstructural heterogeneity, establishing them as biomarkers of prognosis pertaining to survival and molecular stratification, with potential applicability in clinical decision making. △ Less

Submitted 29 August, 2022; originally announced August 2022.

arXiv:2206.10725 [pdf]

doi 10.1021/acs.nanolett.2c02163

Lifetime Limited and Tunable Quantum Light Emission in h-BN via Electric Field Modulation

Authors: Hamidreza Akbari, Souvik Biswas, Pankaj K. Jha, Joeson Wong, Benjamin Vest, Harry A. Atwater

Abstract: Color centered-based single photon emitters in hexagonal boron nitride (h-BN) have shown promising photophysical properties as sources for quantum light emission. Despite significant advances towards such a goal, achieving lifetime-limited quantum light emission in h-BN has proven to be challenging, primarily due to various broadening mechanisms including spectral diffusion. Here, we propose and e… ▽ More Color centered-based single photon emitters in hexagonal boron nitride (h-BN) have shown promising photophysical properties as sources for quantum light emission. Despite significant advances towards such a goal, achieving lifetime-limited quantum light emission in h-BN has proven to be challenging, primarily due to various broadening mechanisms including spectral diffusion. Here, we propose and experimentally demonstrate suppression of spectral diffusion by applying an electrostatic field. We observe both Stark shift tuning of the resonant emission wavelength, and emission linewidth reduction nearly to the homogeneously broadened lifetime limit. Lastly, we find a cubic dependence of the linewidth with respect to temperature at the homogeneous broadening regime. Our results suggest that field tuning in electrostatically gated heterostructures is promising as an approach to control the emission characteristics of h-BN color centers, removing spectral diffusion and providing the energy tunability necessary for integrate of quantum light emission in nanophotonic architectures. △ Less

Submitted 21 June, 2022; originally announced June 2022.

arXiv:2112.06979 [pdf, other]

The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients

Authors: Bhakti Baheti, Satrajit Chakrabarty, Hamed Akbari, Michel Bilello, Benedikt Wiestler, Julian Schwarting, Evan Calabrese, Jeffrey Rudie, Syed Abidi, Mina Mousa, Javier Villanueva-Meyer, Brandon K. K. Fields, Florian Kofler, Russell Takeshi Shinohara, Juan Eugenio Iglesias, Tony C. W. Mok, Albert C. S. Chung, Marek Wodzinski, Artur Jurgas, Niccolo Marini, Manfredo Atzori, Henning Muller, Christoph Grobroehmer, Hanna Siebert, Lasse Hansen , et al. (48 additional authors not shown)

Abstract: Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registr… ▽ More Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/. △ Less

Submitted 17 April, 2024; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2104.11178 [pdf, other]

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

Authors: Hassan Akbari, Liangzhe Yuan, Rui Qian, Wei-Hong Chuang, Shih-Fu Chang, Yin Cui, Boqing Gong

Abstract: We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and eval… ▽ More We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available. △ Less

Submitted 6 December, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

Comments: Published in the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2104.08145 [pdf, other]

KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding

Authors: Keyur Faldu, Amit Sheth, Prashant Kikani, Hemang Akbari

Abstract: Contextualized entity representations learned by state-of-the-art transformer-based language models (TLMs) like BERT, GPT, T5, etc., leverage the attention mechanism to learn the data context from training data corpus. However, these models do not use the knowledge context. Knowledge context can be understood as semantics about entities and their relationship with neighboring entities in knowledge… ▽ More Contextualized entity representations learned by state-of-the-art transformer-based language models (TLMs) like BERT, GPT, T5, etc., leverage the attention mechanism to learn the data context from training data corpus. However, these models do not use the knowledge context. Knowledge context can be understood as semantics about entities and their relationship with neighboring entities in knowledge graphs. We propose a novel and effective technique to infuse knowledge context from multiple knowledge graphs for conceptual and ambiguous entities into TLMs during fine-tuning. It projects knowledge graph embeddings in the homogeneous vector-space, introduces new token-types for entities, aligns entity position ids, and a selective attention mechanism. We take BERT as a baseline model and implement the "Knowledge-Infused BERT" by infusing knowledge context from ConceptNet and WordNet, which significantly outperforms BERT and other recent knowledge-aware BERT variants like ERNIE, SenseBERT, and BERT_CS over eight different subtasks of GLUE benchmark. The KI-BERT-base model even significantly outperforms BERT-large for domain-specific tasks like SciTail and academic subsets of QQP, QNLI, and MNLI. △ Less

Submitted 3 September, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

Comments: 10 pages, 4 figures, 4 tables

arXiv:2011.09530 [pdf, other]

Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language

Authors: Hassan Akbari, Hamid Palangi, Jianwei Yang, Sudha Rao, Asli Celikyilmaz, Roland Fernandez, Paul Smolensky, Jianfeng Gao, Shih-Fu Chang

Abstract: Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as rel… ▽ More Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as relative roles and leverage them to make each token role-aware using attention. This results in a more structured and interpretable architecture that incorporates modality-specific inductive biases for the captioning task. Intuitively, the model is able to learn spatial, temporal, and cross-modal relations in a given pair of video and text. The disentanglement achieved by our proposal gives the model more capacity to capture multi-modal structures which result in captions with higher quality for videos. Our experiments on two established video captioning datasets verifies the effectiveness of the proposed approach based on automatic metrics. We further conduct a human evaluation to measure the grounding and relevance of the generated captions and observe consistent improvement for the proposed model. The codes and trained models can be found at https://github.com/hassanhub/R3Transformer △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2008.07628 [pdf, other]

A Deep Network for Joint Registration and Reconstruction of Images with Pathologies

Authors: Xu Han, Zhengyang Shen, Zhenlin Xu, Spyridon Bakas, Hamed Akbari, Michel Bilello, Christos Davatzikos, Marc Niethammer

Abstract: Registration of images with pathologies is challenging due to tissue appearance changes and missing correspondences caused by the pathologies. Moreover, mass effects as observed for brain tumors may displace tissue, creating larger deformations over time than what is observed in a healthy brain. Deep learning models have successfully been applied to image registration to offer dramatic speed up an… ▽ More Registration of images with pathologies is challenging due to tissue appearance changes and missing correspondences caused by the pathologies. Moreover, mass effects as observed for brain tumors may displace tissue, creating larger deformations over time than what is observed in a healthy brain. Deep learning models have successfully been applied to image registration to offer dramatic speed up and to use surrogate information (e.g., segmentations) during training. However, existing approaches focus on learning registration models using images from healthy patients. They are therefore not designed for the registration of images with strong pathologies for example in the context of brain tumors, and traumatic brain injuries. In this work, we explore a deep learning approach to register images with brain tumors to an atlas. Our model learns an appearance map** from images with tumors to the atlas, while simultaneously predicting the transformation to atlas space. Using separate decoders, the network disentangles the tumor mass effect from the reconstruction of quasi-normal images. Results on both synthetic and real brain tumor scans show that our approach outperforms cost function masking for registration to the atlas and that reconstructed quasi-normal images can be used for better longitudinal registrations. △ Less

Submitted 17 August, 2020; originally announced August 2020.

arXiv:2007.07811 [pdf]

Nanoscale axial position and orientation measurement of hexagonal boron nitride quantum emitters using a tunable nanophotonic environment

Authors: Pankaj K. Jha, Hamidreza Akbari, Yonghwi Kim, Souvik Biswas, Harry A. Atwater

Abstract: Color centers in hexagonal boron nitride (hBN) have emerged as promising candidates for single-photon emitters (SPEs) due to their bright emission characteristics at room temperature. In contrast to mono- and few-layered hBN, color centers in multi-layered flakes show superior emission characteristics such as higher saturation counts and spectral stability. Here, we report a method for determining… ▽ More Color centers in hexagonal boron nitride (hBN) have emerged as promising candidates for single-photon emitters (SPEs) due to their bright emission characteristics at room temperature. In contrast to mono- and few-layered hBN, color centers in multi-layered flakes show superior emission characteristics such as higher saturation counts and spectral stability. Here, we report a method for determining both the axial position and three-dimensional dipole orientation of SPEs in thick hBN flakes by tuning the photonic local density of states using vanadium dioxide (VO2), a phase change material. Emitters under study exhibit a strong surface-normal dipole orientation, providing some insight on the atomic structure of hBN SPEs, deeply embedded in thick crystals. We have optimized a hot pickup technique to reproducibly transfer flakes of hBN from VO2 onto SiO2/Si substrate and relocated the same emitters. Our approach serves as a practical method to systematically characterize SPEs in hBN prior to integration in quantum photonics systems. △ Less

Submitted 23 February, 2021; v1 submitted 15 July, 2020; originally announced July 2020.

Comments: 33 pages, 5 figures, 1 Table, and 13 supplementary figures

arXiv:1906.08886 [pdf]

Identifying the magnetospheric driver of STEVE

Authors: Xiangning Chu, David Malaspina, Bea Gallardo-Lacourt, Jun Liang, Laila Andersson, Qianli Ma, Anton Artemyev, Jiang Liu, Bob Ergun, Scott Thaller, Hassanali Akbari, Hong Zhao, Brian Larsen, Geoffrey Reeves, John Wygant, Aaron Breneman, Sheng Tian, Martin Connors, Eric Donovan, William Archer, Elizabeth A. MacDonald

Abstract: For the first time, we identify the magnetospheric driver of STEVE, east-west aligned narrow emissions in the subauroral region. In the ionosphere, STEVE is associated with subauroral ion drift (SAID) features of high electron temperature peak, density gradient, and strong westward ion flow. In this study, we present STEVE's magnetospheric driver region at a sharp plasmapause containing: strong ta… ▽ More For the first time, we identify the magnetospheric driver of STEVE, east-west aligned narrow emissions in the subauroral region. In the ionosphere, STEVE is associated with subauroral ion drift (SAID) features of high electron temperature peak, density gradient, and strong westward ion flow. In this study, we present STEVE's magnetospheric driver region at a sharp plasmapause containing: strong tailward quasi-static electric field, kinetic Alfven waves, parallel electron acceleration, perpendicular ion drift. The observed continuous emissions of STEVE are possibly caused by ionospheric electron heating due to heat conduction and/or auroral acceleration process powered by Alfven waves, both driven by the observed equatorial magnetospheric processes. The observed green emissions are likely optical manifestations of electron precipitations associated with wave structures traveling along the plasmapause. The observed SAR arc at lower latitudes likely corresponds to the formation of low-energy plasma inside the plasmapause by Coulomb collisions between ring current ions and plasmaspheric plasma. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: Presentation for AGU Fall meeting 2018 on December 12, 2018

arXiv:1903.09728 [pdf]

Classification of seizure and seizure-free EEG signals based on empirical wavelet transform and phase space reconstruction

Authors: Hesam Akbari, Somayeh Saraf Esmaili, Sima Farzollah Zadeh

Abstract: Epilepsy is a brain disorder due to abnormalactivity of neurons and recording of seizures is of primary interest in the evaluation of epileptic patients. A seizureis the phenomenon of rhythmicity discharge from either a local area or the whole brain and the individual behavior usually lasts from seconds to minutes.In this work, empirical wavelet transform(EWT) is applied to decompose signals into… ▽ More Epilepsy is a brain disorder due to abnormalactivity of neurons and recording of seizures is of primary interest in the evaluation of epileptic patients. A seizureis the phenomenon of rhythmicity discharge from either a local area or the whole brain and the individual behavior usually lasts from seconds to minutes.In this work, empirical wavelet transform(EWT) is applied to decompose signals into Electroencephalography (EEG) rhythms. EEG signals are separated to delta, theta, alpha, beta and gamma rhythms using EWT.The proposed method has been evaluated by benchmark dataset which is freely downloadable from Bonn University website. 95% confident ellipse area is computed from 2D projection of reconstructed phase space (RPS)of rhythms as features and fed to K-nearest neighbor classifier for detection of seizure (S) and seizure free (SF) EEG signals. Our proposed method archived 98% accuracy in classification of S and SF EEG signals with a tenfold cross-validation strategy that is higher than previous techniques. △ Less

Submitted 22 March, 2019; originally announced March 2019.

arXiv:1811.11683 [pdf, other]

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding

Authors: Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang

Abstract: We address the problem of phrase grounding by lear ing a multi-level common semantic space shared by the textual and visual modalities. We exploit multiple levels of feature maps of a Deep Convolutional Neural Network, as well as contextualized word and sentence embeddings extracted from a character-based language model. Following dedicated non-linear map**s for visual features at each level, wo… ▽ More We address the problem of phrase grounding by lear ing a multi-level common semantic space shared by the textual and visual modalities. We exploit multiple levels of feature maps of a Deep Convolutional Neural Network, as well as contextualized word and sentence embeddings extracted from a character-based language model. Following dedicated non-linear map**s for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity. We guide the model by a multi-level multimodal attention mechanism which outputs attended visual features at each level. The best level is chosen to be compared with text content for maximizing the pertinence scores of image-sentence pairs of the ground truth. Experiments conducted on three publicly available datasets show significant performance gains (20%-60% relative) over the state-of-the-art in phrase localization and set a new performance record on those datasets. We provide a detailed ablation study to show the contribution of each element of our approach and release our code on GitHub. △ Less

Submitted 29 May, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

Comments: Accepted in CVPR 2019

arXiv:1710.09798 [pdf, other]

Lip2AudSpec: Speech reconstruction from silent lip movements video

Authors: Hassan Akbari, Himani Arora, Liangliang Cao, Nima Mesgarani

Abstract: In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram w… ▽ More In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy. △ Less

Submitted 26 October, 2017; originally announced October 2017.

arXiv:1707.04797 [pdf, other]

doi 10.1063/1.4985879

Perspective: Surface Freezing in Water: A Nexus of Experiments and Simulations

Authors: Amir Haji Akbari, Pablo G. Debenedetti

Abstract: Surface freezing is a phenomenon in which crystallization is enhanced at a vapor-liquid interface. In some systems, such as $n$-alkanes, this enhancement is dramatic, and results in the formation of a crystalline layer at the free interface even at temperatures slightly above the equilibrium bulk freezing temperature. There are, however, systems in which the enhancement is purely kinetic, and only… ▽ More Surface freezing is a phenomenon in which crystallization is enhanced at a vapor-liquid interface. In some systems, such as $n$-alkanes, this enhancement is dramatic, and results in the formation of a crystalline layer at the free interface even at temperatures slightly above the equilibrium bulk freezing temperature. There are, however, systems in which the enhancement is purely kinetic, and only involves faster nucleation at or near the interface. The first, thermodynamic, type of surface freezing is easier to confirm in experiments, requiring only the verification of the existence of crystalline order at the interface. The second, kinetic, type of surface freezing is far more difficult to prove experimentally. One material that is suspected of undergoing the second type of surface freezing is liquid water. Despite strong indications that the freezing of liquid water is kinetically enhanced at vapor-liquid interfaces, the findings are far from conclusive, and the topic remains controversial. In this perspective, we present a simple thermodynamic framework to understand conceptually and distinguish these two types of surface freezing. We then briefly survey fifteen years of experimental and computational work aimed at elucidating the surface freezing conundrum in water. △ Less

Submitted 15 July, 2017; originally announced July 2017.

Comments: 14 pages, 8 figures

arXiv:1512.01460 [pdf]

doi 10.1109/TGRS.2015.2505686

Reconstruction of Fine Scale Auroral Dynamics

Authors: Michael Hirsch, Joshua Semeter, Matthew Zettergren, Hanna Dahlgren, Chhavi Goenka, Hassanali Akbari

Abstract: We present a feasibility study for a high frame rate, short baseline auroral tomographic imaging system useful for estimating parametric variations in the precipitating electron number flux spectrum of dynamic auroral events. Of particular interest are auroral substorms, characterized by spatial variations of order 100 m and temporal variations of order 10 ms. These scales are thought to be produc… ▽ More We present a feasibility study for a high frame rate, short baseline auroral tomographic imaging system useful for estimating parametric variations in the precipitating electron number flux spectrum of dynamic auroral events. Of particular interest are auroral substorms, characterized by spatial variations of order 100 m and temporal variations of order 10 ms. These scales are thought to be produced by dispersive Alfvén waves in the near-Earth magnetosphere. The auroral tomography system characterized in this paper reconstructs the auroral volume emission rate to estimate the characteristic energy and location in the direction perpendicular to the geomagnetic field of peak electron precipitation flux using a distributed network of precisely synchronized ground-based cameras. As the observing baseline decreases, the tomographic inverse problem becomes highly ill-conditioned; as the sampling rate increases, the signal-to-noise ratio degrades and synchronization requirements become increasingly critical. Our approach to these challenges uses a physics-based auroral model to regularize the poorly-observed vertical dimension. Specifically, the vertical dimension is expanded in a low-dimensional basis consisting of eigenprofiles computed over the range of expected energies in the precipitating electron flux, while the horizontal dimension retains a standard orthogonal pixel basis. Simulation results show typical characteristic energy estimation error less than 30% for a 3 km baseline achievable within the confines of the Poker Flat Research Range, using GPS-synchronized Electron Multiplying CCD cameras with broad-band BG3 optical filters that pass prompt auroral emissions. △ Less

Submitted 3 December, 2015; originally announced December 2015.

Comments: 13 pages, 11 figures, Accepted for publication Nov. 24, 2015 by IEEE Transactions on Geospace and Remote Sensing

arXiv:1412.7018 [pdf, other]

Discrete Load Balancing in Heterogeneous Networks with a Focus on Second-Order Diffusion

Authors: Hoda Akbari, Petra Berenbrink, Robert Elsässer, Dominik Kaaser

Abstract: In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffu… ▽ More In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffusion load balancing nodes are only allowed to balance their load with their direct neighbors. We show three main results. Firstly, we present a general framework for randomly rounding the flow generated by continuous diffusion schemes over the edges of a graph in order to obtain corresponding discrete schemes. Compared to the results of Rabani, Sinclair, and Wanka, FOCS'98, which are only valid w.r.t. the class of homogeneous first order schemes, our framework can be used to analyze a larger class of diffusion algorithms, such as algorithms for heterogeneous networks and second order schemes. Secondly, we bound the deviation between randomized second order schemes and their continuous counterparts. Finally, we provide a bound for the minimum initial load in a network that is sufficient to prevent the occurrence of negative load at a node during the execution of second order diffusion schemes. Our theoretical results are complemented with extensive simulations on different graph classes. We show empirically that second order schemes, which are usually much faster than first order schemes, will not balance the load completely on a number of networks within reasonable time. However, the maximum load difference at the end seems to be bounded by a constant value, which can be further decreased if first order scheme is applied once this value is achieved by second order scheme. △ Less

Submitted 22 December, 2014; originally announced December 2014.

Comments: Full version of paper submitted to ICDCS 2015

arXiv:1407.1395 [pdf]

CB-REFIM: A Practical Coordinated Beamforming in Multicell Networks

Authors: Mohammad Hossein Akbari, Vahid Tabataba Vakili

Abstract: Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need compu… ▽ More Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need computation of pseudo-inverse or generalized eigenvector of a matrix, which are practically difficult to implement in a real system. To handle these issues, we propose a novel linear beamforming across a set of coordinated cells only with limiting backhaul signalling. Resource allocation (i.e. precoding and power control) is formulated as an optimization problem with objective function of signal-to-interference-plus-noise ratios (SINRs) in order to maximize the instantaneous weighted sum-rate subject to power constraints. Although the primal problem is nonconvex and difficult to be optimally solved, an iterative algorithm is presented based on the Karush-Kuhn-Tucker (KKT) condition. To have a practical solution with low computational complexity and signalling overhead, we present CB-REFIM (coordination beamforming-reference based interference management) and show the recently proposed REFIM algorithm can be interpreted as a special case of CB-REFIM. We evaluate CB-REFIM through extensive simulation and observe that the proposed strategies achieve close-to-optimal performance. △ Less

Submitted 9 July, 2014; v1 submitted 5 July, 2014; originally announced July 2014.

Comments: 20 pages, 8 figures, to appear in IET Communication

Showing 1–23 of 23 results for author: Akbari, H