-
Production and distribution planning, scheduling, and routing optimization in a yogurt supply chain under demand uncertainty: A case study
Authors:
Babak Javadi,
Zeinab Salimzadeh,
Amir Hossein Akbari,
Mahla Yadegari,
Mohammadreza Abdali
Abstract:
Considering the evolution of the food industry and its challenges, like high perishability, managing the food industry supply chain is a key focus for researchers and decision-makers. Uncertainty in decision-making has gained importance, particularly in the yogurt industry, known for its complexity. This study addresses production and distribution planning, scheduling, and routing in the yogurt su…
▽ More
Considering the evolution of the food industry and its challenges, like high perishability, managing the food industry supply chain is a key focus for researchers and decision-makers. Uncertainty in decision-making has gained importance, particularly in the yogurt industry, known for its complexity. This study addresses production and distribution planning, scheduling, and routing in the yogurt supply chain. The problem is characterized by multiple products, a single plant, multiple distribution centers, multiple periods, and various transportation methods. A mixed-integer non-linear programming (MINLP) model is used to minimize total costs, including production, setup, overtime, unmet demand, and transportation. Additionally, a robust fuzzy programming approach is applied under uncertainty, with linearization procedures proposed to convert it into a linearized mixed-integer programming formulation. The problem is tested with two data types: a sample problem in three sizes (small, medium, and large) and real data from Kalle Dairy Company, Iran. A Genetic Algorithm (GA) is developed to solve the problem, with necessary modifications made for its application. The GA's performance is compared to an exact algorithm (Branch & Cut), showing that the company's production policy adapts daily to meet demand precisely. The shift to smaller batch production and longer shelf life allows better stock allocation and avoids shortages in uncertain conditions. The company's policies adapt to severe fluctuations in the business environment, though this requires high costs, such as inventory maintenance.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Photon statistics analysis of h-BN quantum emitters with pulsed and continuous-wave excitation
Authors:
Hamidreza Akbari,
Pankaj K. Jha,
Kristina Malinowski,
Benjamin E. C. Koltenbah,
Harry A. Atwater
Abstract:
We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We…
▽ More
We report on the quantum photon statistics of hexagonal boron nitride (h-BN) quantum emitters by analyzing the Mandel Q parameter. We have measured the Mandel Q parameter for h-BN quantum emitters under various temperatures and pump power excitation conditions. Under pulsed excitation we can achieve a Mandel Q of -0.002 and under continuous-wave (CW) excitation this parameter can reach -0.0025. We investigate the effect of cryogenic temperatures on Mandel Q and conclude that the photon statistics vary weakly with temperature. Through calculation of spontaneous emission from an excited two-level emitter model, we demonstrate good agreement between measured and calculated Mandel Q parameter when accounting for the experimental photon collection efficiency. Finally, we illustrate the usefulness of Mandel Q in quantum applications by the example of random number generation and analyze the effect of Mandel Q on the speed of generating random bits via this method.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Authors:
Dan Kondratyuk,
Lijun Yu,
Xiuye Gu,
José Lezama,
Jonathan Huang,
Grant Schindler,
Rachel Hornung,
Vighnesh Birodkar,
Jimmy Yan,
Ming-Chang Chiu,
Krishna Somandepalli,
Hassan Akbari,
Yair Alon,
Yong Cheng,
Josh Dillon,
Agrim Gupta,
Meera Hahn,
Anja Hauth,
David Hendon,
Alonso Martinez,
David Minnen,
Mikhail Sirotenko,
Kihyuk Sohn,
Xuan Yang,
Hartwig Adam
, et al. (6 additional authors not shown)
Abstract:
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas…
▽ More
We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions. Project page: http://sites.research.google/videopoet/
△ Less
Submitted 4 June, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Authors:
Hassan Akbari,
Dan Kondratyuk,
Yin Cui,
Rachel Hornung,
Huisheng Wang,
Hartwig Adam
Abstract:
We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient…
▽ More
We present Integrated Multimodal Perception (IMP), a simple and scalable multimodal multi-task training and modeling approach. IMP integrates multimodal inputs including image, video, text, and audio into a single Transformer encoder with minimal modality-specific components. IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model and task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) Performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model. 2) Sparsification with MoE on a single modality-agnostic encoder substantially improves the performance, outperforming dense models that use modality-specific encoders or additional fusion layers and greatly mitigates the conflicts between modalities. IMP achieves competitive performance on a wide range of downstream tasks including video classification, image classification, image-text, and video-text retrieval. Most notably, we train a sparse IMP-MoE-L variant focusing on video tasks that achieves new state-of-the-art in zero-shot video classification: 77.0% on Kinetics-400, 76.8% on Kinetics-600, and 68.3% on Kinetics-700, improving the previous state-of-the-art by +5%, +6.7%, and +5.8%, respectively, while using only 15% of their total training computational cost.
△ Less
Submitted 11 December, 2023; v1 submitted 10 May, 2023;
originally announced May 2023.
-
Rydberg Excitons and Trions in Monolayer MoTe$_2$
Authors:
Souvik Biswas,
Aurélie Champagne,
Jonah B. Haber,
Supavit Pokawanvit,
Joeson Wong,
Hamidreza Akbari,
Sergiy Krylyuk,
Kenji Watanabe,
Takashi Taniguchi,
Albert V. Davydov,
Zakaria Y. Al Balushi,
Diana Y. Qiu,
Felipe H. da Jornada,
Jeffrey B. Neaton,
Harry A. Atwater
Abstract:
Monolayer transition metal dichalcogenide (TMDC) semiconductors exhibit strong excitonic optical resonances which serve as a microscopic, non-invasive probe into their fundamental properties. Like the hydrogen atom, such excitons can exhibit an entire Rydberg series of resonances. Excitons have been extensively studied in most TMDCs (MoS$_2$, MoSe$_2$, WS$_2$ and WSe$_2$), but detailed exploration…
▽ More
Monolayer transition metal dichalcogenide (TMDC) semiconductors exhibit strong excitonic optical resonances which serve as a microscopic, non-invasive probe into their fundamental properties. Like the hydrogen atom, such excitons can exhibit an entire Rydberg series of resonances. Excitons have been extensively studied in most TMDCs (MoS$_2$, MoSe$_2$, WS$_2$ and WSe$_2$), but detailed exploration of excitonic phenomena has been lacking in the important TMDC material molybdenum ditelluride (MoTe$_2$). Here, we report an experimental investigation of excitonic luminescence properties of monolayer MoTe$_2$ to understand the excitonic Rydberg series, up to 3s. We report significant modification of emission energies with temperature (4K to 300K), quantifying the exciton-phonon coupling. Furthermore, we observe a strongly gate-tunable exciton-trion interplay for all the Rydberg states governed mainly by free-carrier screening, Pauli blocking, and band-gap renormalization in agreement with the results of first-principles GW plus Bethe-Salpeter equation approach calculations. Our results help bring monolayer MoTe$_2$ closer to its potential applications in near-infrared optoelectronics and photonic devices.
△ Less
Submitted 7 February, 2023;
originally announced February 2023.
-
Scaling Multimodal Pre-Training via Cross-Modality Gradient Harmonization
Authors:
Junru Wu,
Yi Liang,
Feng Han,
Hassan Akbari,
Zhangyang Wang,
Cong Yu
Abstract:
Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modal…
▽ More
Self-supervised pre-training recently demonstrates success on large-scale multimodal data, and state-of-the-art contrastive learning methods often enforce the feature consistency from cross-modality inputs, such as video/audio or video/text pairs. Despite its convenience to formulate and leverage in practice, such cross-modality alignment (CMA) is only a weak and noisy supervision, since two modalities can be semantically misaligned even they are temporally aligned. For example, even in the commonly adopted instructional videos, a speaker can sometimes refer to something that is not visually present in the current frame; and the semantic misalignment would only be more unpredictable for the raw videos from the internet. We conjecture that might cause conflicts and biases among modalities, and may hence prohibit CMA from scaling up to training with larger and more heterogeneous data. This paper first verifies our conjecture by observing that, even in the latest VATT pre-training using only instructional videos, there exist strong gradient conflicts between different CMA losses within the same video, audio, text triplet, indicating them as the noisy source of supervision. We then propose to harmonize such gradients, via two techniques: (i) cross-modality gradient realignment: modifying different CMA loss gradients for each sample triplet, so that their gradient directions are more aligned; and (ii) gradient-based curriculum learning: leveraging the gradient conflict information on an indicator of sample noisiness, to develop a curriculum learning strategy to prioritize training on less noisy sample triplets. Applying those techniques to pre-training VATT on the HowTo100M dataset, we consistently improve its performance on different downstream tasks. Moreover, we are able to scale VATT pre-training to more complicated non-narrative Youtube8M dataset to further improve the state-of-the-arts.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Authors:
Xi Chen,
Xiao Wang,
Soravit Changpinyo,
AJ Piergiovanni,
Piotr Padlewski,
Daniel Salz,
Sebastian Goodman,
Adam Grycner,
Basil Mustafa,
Lucas Beyer,
Alexander Kolesnikov,
Joan Puigcerver,
Nan Ding,
Keran Rong,
Hassan Akbari,
Gaurav Mishra,
Linting Xue,
Ashish Thapliyal,
James Bradbury,
Weicheng Kuo,
Mojtaba Seyedhosseini,
Chao Jia,
Burcu Karagol Ayan,
Carlos Riquelme,
Andreas Steiner
, et al. (4 additional authors not shown)
Abstract:
Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaL…
▽ More
Effective scaling and a flexible task interface enable large language models to excel at many tasks. We present PaLI (Pathways Language and Image model), a model that extends this approach to the joint modeling of language and vision. PaLI generates text based on visual and textual inputs, and with this interface performs many vision, language, and multimodal tasks, in many languages. To train PaLI, we make use of large pre-trained encoder-decoder language models and Vision Transformers (ViTs). This allows us to capitalize on their existing capabilities and leverage the substantial cost of training them. We find that joint scaling of the vision and language components is important. Since existing Transformers for language are much larger than their vision counterparts, we train a large, 4-billion parameter ViT (ViT-e) to quantify the benefits from even larger-capacity vision models. To train PaLI, we create a large multilingual mix of pretraining tasks, based on a new image-text training set containing 10B images and texts in over 100 languages. PaLI achieves state-of-the-art in multiple vision and language tasks (such as captioning, visual question-answering, scene-text understanding), while retaining a simple, modular, and scalable design.
△ Less
Submitted 5 June, 2023; v1 submitted 14 September, 2022;
originally announced September 2022.
-
Artificial intelligence-based locoregional markers of brain peritumoral microenvironment
Authors:
Zahra Riahi Samani,
Drew Parker,
Hamed Akbari,
Spyridon Bakas,
Ronald L. Wolf,
Steven Brem,
Ragini Verma
Abstract:
In malignant primary brain tumors, cancer cells infiltrate into the peritumoral brain structures which results in inevitable recurrence. Quantitative assessment of infiltrative heterogeneity in the peritumoral region, the area where biopsy or resection can be hazardous, is important for clinical decision making. Previous work on characterizing the infiltrative heterogeneity in the peritumoral regi…
▽ More
In malignant primary brain tumors, cancer cells infiltrate into the peritumoral brain structures which results in inevitable recurrence. Quantitative assessment of infiltrative heterogeneity in the peritumoral region, the area where biopsy or resection can be hazardous, is important for clinical decision making. Previous work on characterizing the infiltrative heterogeneity in the peritumoral region used various imaging modalities, but information of extracellular free water movement restriction has been limitedly explored. Here, we derive a unique set of Artificial Intelligence (AI)-based markers capturing the heterogeneity of tumor infiltration, by characterizing free water movement restriction in the peritumoral region using Diffusion Tensor Imaging (DTI)-based free water volume fraction maps. A novel voxel-wise deep learning-based peritumoral microenvironment index (PMI) is first extracted by leveraging the widely different water diffusivity properties of glioblastomas and brain metastases as regions with and without infiltrations in the peritumoral tissue. Descriptive characteristics of locoregional hubs of uniformly high PMI values are extracted as AI-based markers to capture distinct aspects of infiltrative heterogeneity. The proposed markers are applied to two clinical use cases on an independent population of 275 adult-type diffuse gliomas (CNS WHO grade 4), analyzing the duration of survival among Isocitrate-Dehydrogenase 1 (IDH1)-wildtypes and the differences with IDH1-mutants. Our findings provide a panel of markers as surrogates of infiltration that captures unique insight about underlying biology of peritumoral microstructural heterogeneity, establishing them as biomarkers of prognosis pertaining to survival and molecular stratification, with potential applicability in clinical decision making.
△ Less
Submitted 29 August, 2022;
originally announced August 2022.
-
arXiv:2206.10725
[pdf]
physics.optics
cond-mat.mtrl-sci
physics.app-ph
physics.atom-ph
quant-ph
Lifetime Limited and Tunable Quantum Light Emission in h-BN via Electric Field Modulation
Authors:
Hamidreza Akbari,
Souvik Biswas,
Pankaj K. Jha,
Joeson Wong,
Benjamin Vest,
Harry A. Atwater
Abstract:
Color centered-based single photon emitters in hexagonal boron nitride (h-BN) have shown promising photophysical properties as sources for quantum light emission. Despite significant advances towards such a goal, achieving lifetime-limited quantum light emission in h-BN has proven to be challenging, primarily due to various broadening mechanisms including spectral diffusion. Here, we propose and e…
▽ More
Color centered-based single photon emitters in hexagonal boron nitride (h-BN) have shown promising photophysical properties as sources for quantum light emission. Despite significant advances towards such a goal, achieving lifetime-limited quantum light emission in h-BN has proven to be challenging, primarily due to various broadening mechanisms including spectral diffusion. Here, we propose and experimentally demonstrate suppression of spectral diffusion by applying an electrostatic field. We observe both Stark shift tuning of the resonant emission wavelength, and emission linewidth reduction nearly to the homogeneously broadened lifetime limit. Lastly, we find a cubic dependence of the linewidth with respect to temperature at the homogeneous broadening regime. Our results suggest that field tuning in electrostatically gated heterostructures is promising as an approach to control the emission characteristics of h-BN color centers, removing spectral diffusion and providing the energy tunability necessary for integrate of quantum light emission in nanophotonic architectures.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients
Authors:
Bhakti Baheti,
Satrajit Chakrabarty,
Hamed Akbari,
Michel Bilello,
Benedikt Wiestler,
Julian Schwarting,
Evan Calabrese,
Jeffrey Rudie,
Syed Abidi,
Mina Mousa,
Javier Villanueva-Meyer,
Brandon K. K. Fields,
Florian Kofler,
Russell Takeshi Shinohara,
Juan Eugenio Iglesias,
Tony C. W. Mok,
Albert C. S. Chung,
Marek Wodzinski,
Artur Jurgas,
Niccolo Marini,
Manfredo Atzori,
Henning Muller,
Christoph Grobroehmer,
Hanna Siebert,
Lasse Hansen
, et al. (48 additional authors not shown)
Abstract:
Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registr…
▽ More
Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance. Although there has been progress in develo** general-purpose medical image registration techniques, they have not yet attained the requisite precision and reliability for this task, highlighting its inherent complexity. Here we describe the Brain Tumor Sequence Registration (BraTS-Reg) challenge, as the first public benchmark environment for deformable registration algorithms focusing on estimating correspondences between pre-operative and follow-up scans of the same patient diagnosed with a diffuse brain glioma. The BraTS-Reg data comprise de-identified multi-institutional multi-parametric MRI (mpMRI) scans, curated for size and resolution according to a canonical anatomical template, and divided into training, validation, and testing sets. Clinical experts annotated ground truth (GT) landmark points of anatomical locations distinct across the temporal domain. Quantitative evaluation and ranking were based on the Median Euclidean Error (MEE), Robustness, and the determinant of the Jacobian of the displacement field. The top-ranked methodologies yielded similar performance across all evaluation metrics and shared several methodological commonalities, including pre-alignment, deep neural networks, inverse consistency analysis, and test-time instance optimization per-case basis as a post-processing step. The top-ranked method attained the MEE at or below that of the inter-rater variability for approximately 60% of the evaluated landmarks, underscoring the scope for further accuracy and robustness improvements, especially relative to human experts. The aim of BraTS-Reg is to continue to serve as an active resource for research, with the data and online evaluation tools accessible at https://bratsreg.github.io/.
△ Less
Submitted 17 April, 2024; v1 submitted 13 December, 2021;
originally announced December 2021.
-
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Authors:
Hassan Akbari,
Liangzhe Yuan,
Rui Qian,
Wei-Hong Chuang,
Shih-Fu Chang,
Yin Cui,
Boqing Gong
Abstract:
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and eval…
▽ More
We present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations that are rich enough to benefit a variety of downstream tasks. We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval. Furthermore, we study a modality-agnostic, single-backbone Transformer by sharing weights among the three modalities. We show that the convolution-free VATT outperforms state-of-the-art ConvNet-based architectures in the downstream tasks. Especially, VATT's vision Transformer achieves the top-1 accuracy of 82.1% on Kinetics-400, 83.6% on Kinetics-600, 72.7% on Kinetics-700, and 41.1% on Moments in Time, new records while avoiding supervised pre-training. Transferring to image classification leads to 78.7% top-1 accuracy on ImageNet compared to 64.7% by training the same Transformer from scratch, showing the generalizability of our model despite the domain gap between videos and images. VATT's audio Transformer also sets a new record on waveform-based audio event recognition by achieving the mAP of 39.4% on AudioSet without any supervised pre-training. VATT's source code is publicly available.
△ Less
Submitted 6 December, 2021; v1 submitted 22 April, 2021;
originally announced April 2021.
-
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding
Authors:
Keyur Faldu,
Amit Sheth,
Prashant Kikani,
Hemang Akbari
Abstract:
Contextualized entity representations learned by state-of-the-art transformer-based language models (TLMs) like BERT, GPT, T5, etc., leverage the attention mechanism to learn the data context from training data corpus. However, these models do not use the knowledge context. Knowledge context can be understood as semantics about entities and their relationship with neighboring entities in knowledge…
▽ More
Contextualized entity representations learned by state-of-the-art transformer-based language models (TLMs) like BERT, GPT, T5, etc., leverage the attention mechanism to learn the data context from training data corpus. However, these models do not use the knowledge context. Knowledge context can be understood as semantics about entities and their relationship with neighboring entities in knowledge graphs. We propose a novel and effective technique to infuse knowledge context from multiple knowledge graphs for conceptual and ambiguous entities into TLMs during fine-tuning. It projects knowledge graph embeddings in the homogeneous vector-space, introduces new token-types for entities, aligns entity position ids, and a selective attention mechanism. We take BERT as a baseline model and implement the "Knowledge-Infused BERT" by infusing knowledge context from ConceptNet and WordNet, which significantly outperforms BERT and other recent knowledge-aware BERT variants like ERNIE, SenseBERT, and BERT_CS over eight different subtasks of GLUE benchmark. The KI-BERT-base model even significantly outperforms BERT-large for domain-specific tasks like SciTail and academic subsets of QQP, QNLI, and MNLI.
△ Less
Submitted 3 September, 2021; v1 submitted 9 April, 2021;
originally announced April 2021.
-
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language
Authors:
Hassan Akbari,
Hamid Palangi,
Jianwei Yang,
Sudha Rao,
Asli Celikyilmaz,
Roland Fernandez,
Paul Smolensky,
Jianfeng Gao,
Shih-Fu Chang
Abstract:
Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as rel…
▽ More
Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as relative roles and leverage them to make each token role-aware using attention. This results in a more structured and interpretable architecture that incorporates modality-specific inductive biases for the captioning task. Intuitively, the model is able to learn spatial, temporal, and cross-modal relations in a given pair of video and text. The disentanglement achieved by our proposal gives the model more capacity to capture multi-modal structures which result in captions with higher quality for videos. Our experiments on two established video captioning datasets verifies the effectiveness of the proposed approach based on automatic metrics. We further conduct a human evaluation to measure the grounding and relevance of the generated captions and observe consistent improvement for the proposed model. The codes and trained models can be found at https://github.com/hassanhub/R3Transformer
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
A Deep Network for Joint Registration and Reconstruction of Images with Pathologies
Authors:
Xu Han,
Zhengyang Shen,
Zhenlin Xu,
Spyridon Bakas,
Hamed Akbari,
Michel Bilello,
Christos Davatzikos,
Marc Niethammer
Abstract:
Registration of images with pathologies is challenging due to tissue appearance changes and missing correspondences caused by the pathologies. Moreover, mass effects as observed for brain tumors may displace tissue, creating larger deformations over time than what is observed in a healthy brain. Deep learning models have successfully been applied to image registration to offer dramatic speed up an…
▽ More
Registration of images with pathologies is challenging due to tissue appearance changes and missing correspondences caused by the pathologies. Moreover, mass effects as observed for brain tumors may displace tissue, creating larger deformations over time than what is observed in a healthy brain. Deep learning models have successfully been applied to image registration to offer dramatic speed up and to use surrogate information (e.g., segmentations) during training. However, existing approaches focus on learning registration models using images from healthy patients. They are therefore not designed for the registration of images with strong pathologies for example in the context of brain tumors, and traumatic brain injuries. In this work, we explore a deep learning approach to register images with brain tumors to an atlas. Our model learns an appearance map** from images with tumors to the atlas, while simultaneously predicting the transformation to atlas space. Using separate decoders, the network disentangles the tumor mass effect from the reconstruction of quasi-normal images. Results on both synthetic and real brain tumor scans show that our approach outperforms cost function masking for registration to the atlas and that reconstructed quasi-normal images can be used for better longitudinal registrations.
△ Less
Submitted 17 August, 2020;
originally announced August 2020.
-
Nanoscale axial position and orientation measurement of hexagonal boron nitride quantum emitters using a tunable nanophotonic environment
Authors:
Pankaj K. Jha,
Hamidreza Akbari,
Yonghwi Kim,
Souvik Biswas,
Harry A. Atwater
Abstract:
Color centers in hexagonal boron nitride (hBN) have emerged as promising candidates for single-photon emitters (SPEs) due to their bright emission characteristics at room temperature. In contrast to mono- and few-layered hBN, color centers in multi-layered flakes show superior emission characteristics such as higher saturation counts and spectral stability. Here, we report a method for determining…
▽ More
Color centers in hexagonal boron nitride (hBN) have emerged as promising candidates for single-photon emitters (SPEs) due to their bright emission characteristics at room temperature. In contrast to mono- and few-layered hBN, color centers in multi-layered flakes show superior emission characteristics such as higher saturation counts and spectral stability. Here, we report a method for determining both the axial position and three-dimensional dipole orientation of SPEs in thick hBN flakes by tuning the photonic local density of states using vanadium dioxide (VO2), a phase change material. Emitters under study exhibit a strong surface-normal dipole orientation, providing some insight on the atomic structure of hBN SPEs, deeply embedded in thick crystals. We have optimized a hot pickup technique to reproducibly transfer flakes of hBN from VO2 onto SiO2/Si substrate and relocated the same emitters. Our approach serves as a practical method to systematically characterize SPEs in hBN prior to integration in quantum photonics systems.
△ Less
Submitted 23 February, 2021; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Identifying the magnetospheric driver of STEVE
Authors:
Xiangning Chu,
David Malaspina,
Bea Gallardo-Lacourt,
Jun Liang,
Laila Andersson,
Qianli Ma,
Anton Artemyev,
Jiang Liu,
Bob Ergun,
Scott Thaller,
Hassanali Akbari,
Hong Zhao,
Brian Larsen,
Geoffrey Reeves,
John Wygant,
Aaron Breneman,
Sheng Tian,
Martin Connors,
Eric Donovan,
William Archer,
Elizabeth A. MacDonald
Abstract:
For the first time, we identify the magnetospheric driver of STEVE, east-west aligned narrow emissions in the subauroral region. In the ionosphere, STEVE is associated with subauroral ion drift (SAID) features of high electron temperature peak, density gradient, and strong westward ion flow. In this study, we present STEVE's magnetospheric driver region at a sharp plasmapause containing: strong ta…
▽ More
For the first time, we identify the magnetospheric driver of STEVE, east-west aligned narrow emissions in the subauroral region. In the ionosphere, STEVE is associated with subauroral ion drift (SAID) features of high electron temperature peak, density gradient, and strong westward ion flow. In this study, we present STEVE's magnetospheric driver region at a sharp plasmapause containing: strong tailward quasi-static electric field, kinetic Alfven waves, parallel electron acceleration, perpendicular ion drift. The observed continuous emissions of STEVE are possibly caused by ionospheric electron heating due to heat conduction and/or auroral acceleration process powered by Alfven waves, both driven by the observed equatorial magnetospheric processes. The observed green emissions are likely optical manifestations of electron precipitations associated with wave structures traveling along the plasmapause. The observed SAR arc at lower latitudes likely corresponds to the formation of low-energy plasma inside the plasmapause by Coulomb collisions between ring current ions and plasmaspheric plasma.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Classification of seizure and seizure-free EEG signals based on empirical wavelet transform and phase space reconstruction
Authors:
Hesam Akbari,
Somayeh Saraf Esmaili,
Sima Farzollah Zadeh
Abstract:
Epilepsy is a brain disorder due to abnormalactivity of neurons and recording of seizures is of primary interest in the evaluation of epileptic patients. A seizureis the phenomenon of rhythmicity discharge from either a local area or the whole brain and the individual behavior usually lasts from seconds to minutes.In this work, empirical wavelet transform(EWT) is applied to decompose signals into…
▽ More
Epilepsy is a brain disorder due to abnormalactivity of neurons and recording of seizures is of primary interest in the evaluation of epileptic patients. A seizureis the phenomenon of rhythmicity discharge from either a local area or the whole brain and the individual behavior usually lasts from seconds to minutes.In this work, empirical wavelet transform(EWT) is applied to decompose signals into Electroencephalography (EEG) rhythms. EEG signals are separated to delta, theta, alpha, beta and gamma rhythms using EWT.The proposed method has been evaluated by benchmark dataset which is freely downloadable from Bonn University website. 95% confident ellipse area is computed from 2D projection of reconstructed phase space (RPS)of rhythms as features and fed to K-nearest neighbor classifier for detection of seizure (S) and seizure free (SF) EEG signals. Our proposed method archived 98% accuracy in classification of S and SF EEG signals with a tenfold cross-validation strategy that is higher than previous techniques.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Authors:
Hassan Akbari,
Svebor Karaman,
Surabhi Bhargava,
Brian Chen,
Carl Vondrick,
Shih-Fu Chang
Abstract:
We address the problem of phrase grounding by lear ing a multi-level common semantic space shared by the textual and visual modalities. We exploit multiple levels of feature maps of a Deep Convolutional Neural Network, as well as contextualized word and sentence embeddings extracted from a character-based language model. Following dedicated non-linear map**s for visual features at each level, wo…
▽ More
We address the problem of phrase grounding by lear ing a multi-level common semantic space shared by the textual and visual modalities. We exploit multiple levels of feature maps of a Deep Convolutional Neural Network, as well as contextualized word and sentence embeddings extracted from a character-based language model. Following dedicated non-linear map**s for visual features at each level, word, and sentence embeddings, we obtain multiple instantiations of our common semantic space in which comparisons between any target text and the visual content is performed with cosine similarity. We guide the model by a multi-level multimodal attention mechanism which outputs attended visual features at each level. The best level is chosen to be compared with text content for maximizing the pertinence scores of image-sentence pairs of the ground truth. Experiments conducted on three publicly available datasets show significant performance gains (20%-60% relative) over the state-of-the-art in phrase localization and set a new performance record on those datasets. We provide a detailed ablation study to show the contribution of each element of our approach and release our code on GitHub.
△ Less
Submitted 29 May, 2019; v1 submitted 28 November, 2018;
originally announced November 2018.
-
Lip2AudSpec: Speech reconstruction from silent lip movements video
Authors:
Hassan Akbari,
Himani Arora,
Liangliang Cao,
Nima Mesgarani
Abstract:
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram w…
▽ More
In this study, we propose a deep neural network for reconstructing intelligible speech from silent lip movement videos. We use auditory spectrogram as spectral representation of speech and its corresponding sound generation method resulting in a more natural sounding reconstructed speech. Our proposed network consists of an autoencoder to extract bottleneck features from the auditory spectrogram which is then used as target to our main lip reading network comprising of CNN, LSTM and fully connected layers. Our experiments show that the autoencoder is able to reconstruct the original auditory spectrogram with a 98% correlation and also improves the quality of reconstructed speech from the main lip reading network. Our model, trained jointly on different speakers is able to extract individual speaker characteristics and gives promising results of reconstructing intelligible speech with superior word recognition accuracy.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Perspective: Surface Freezing in Water: A Nexus of Experiments and Simulations
Authors:
Amir Haji Akbari,
Pablo G. Debenedetti
Abstract:
Surface freezing is a phenomenon in which crystallization is enhanced at a vapor-liquid interface. In some systems, such as $n$-alkanes, this enhancement is dramatic, and results in the formation of a crystalline layer at the free interface even at temperatures slightly above the equilibrium bulk freezing temperature. There are, however, systems in which the enhancement is purely kinetic, and only…
▽ More
Surface freezing is a phenomenon in which crystallization is enhanced at a vapor-liquid interface. In some systems, such as $n$-alkanes, this enhancement is dramatic, and results in the formation of a crystalline layer at the free interface even at temperatures slightly above the equilibrium bulk freezing temperature. There are, however, systems in which the enhancement is purely kinetic, and only involves faster nucleation at or near the interface. The first, thermodynamic, type of surface freezing is easier to confirm in experiments, requiring only the verification of the existence of crystalline order at the interface. The second, kinetic, type of surface freezing is far more difficult to prove experimentally. One material that is suspected of undergoing the second type of surface freezing is liquid water. Despite strong indications that the freezing of liquid water is kinetically enhanced at vapor-liquid interfaces, the findings are far from conclusive, and the topic remains controversial. In this perspective, we present a simple thermodynamic framework to understand conceptually and distinguish these two types of surface freezing. We then briefly survey fifteen years of experimental and computational work aimed at elucidating the surface freezing conundrum in water.
△ Less
Submitted 15 July, 2017;
originally announced July 2017.
-
Reconstruction of Fine Scale Auroral Dynamics
Authors:
Michael Hirsch,
Joshua Semeter,
Matthew Zettergren,
Hanna Dahlgren,
Chhavi Goenka,
Hassanali Akbari
Abstract:
We present a feasibility study for a high frame rate, short baseline auroral tomographic imaging system useful for estimating parametric variations in the precipitating electron number flux spectrum of dynamic auroral events. Of particular interest are auroral substorms, characterized by spatial variations of order 100 m and temporal variations of order 10 ms. These scales are thought to be produc…
▽ More
We present a feasibility study for a high frame rate, short baseline auroral tomographic imaging system useful for estimating parametric variations in the precipitating electron number flux spectrum of dynamic auroral events. Of particular interest are auroral substorms, characterized by spatial variations of order 100 m and temporal variations of order 10 ms. These scales are thought to be produced by dispersive Alfvén waves in the near-Earth magnetosphere. The auroral tomography system characterized in this paper reconstructs the auroral volume emission rate to estimate the characteristic energy and location in the direction perpendicular to the geomagnetic field of peak electron precipitation flux using a distributed network of precisely synchronized ground-based cameras. As the observing baseline decreases, the tomographic inverse problem becomes highly ill-conditioned; as the sampling rate increases, the signal-to-noise ratio degrades and synchronization requirements become increasingly critical. Our approach to these challenges uses a physics-based auroral model to regularize the poorly-observed vertical dimension. Specifically, the vertical dimension is expanded in a low-dimensional basis consisting of eigenprofiles computed over the range of expected energies in the precipitating electron flux, while the horizontal dimension retains a standard orthogonal pixel basis. Simulation results show typical characteristic energy estimation error less than 30% for a 3 km baseline achievable within the confines of the Poker Flat Research Range, using GPS-synchronized Electron Multiplying CCD cameras with broad-band BG3 optical filters that pass prompt auroral emissions.
△ Less
Submitted 3 December, 2015;
originally announced December 2015.
-
Discrete Load Balancing in Heterogeneous Networks with a Focus on Second-Order Diffusion
Authors:
Hoda Akbari,
Petra Berenbrink,
Robert Elsässer,
Dominik Kaaser
Abstract:
In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffu…
▽ More
In this paper we consider a wide class of discrete diffusion load balancing algorithms. The problem is defined as follows. We are given an interconnection network and a number of load items, which are arbitrarily distributed among the nodes of the network. The goal is to redistribute the load in iterative discrete steps such that at the end each node has (almost) the same number of items. In diffusion load balancing nodes are only allowed to balance their load with their direct neighbors.
We show three main results. Firstly, we present a general framework for randomly rounding the flow generated by continuous diffusion schemes over the edges of a graph in order to obtain corresponding discrete schemes. Compared to the results of Rabani, Sinclair, and Wanka, FOCS'98, which are only valid w.r.t. the class of homogeneous first order schemes, our framework can be used to analyze a larger class of diffusion algorithms, such as algorithms for heterogeneous networks and second order schemes. Secondly, we bound the deviation between randomized second order schemes and their continuous counterparts. Finally, we provide a bound for the minimum initial load in a network that is sufficient to prevent the occurrence of negative load at a node during the execution of second order diffusion schemes.
Our theoretical results are complemented with extensive simulations on different graph classes. We show empirically that second order schemes, which are usually much faster than first order schemes, will not balance the load completely on a number of networks within reasonable time. However, the maximum load difference at the end seems to be bounded by a constant value, which can be further decreased if first order scheme is applied once this value is achieved by second order scheme.
△ Less
Submitted 22 December, 2014;
originally announced December 2014.
-
CB-REFIM: A Practical Coordinated Beamforming in Multicell Networks
Authors:
Mohammad Hossein Akbari,
Vahid Tabataba Vakili
Abstract:
Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need compu…
▽ More
Performance of multicell systems is inevitably limited by interference and available resources. Although intercell interference can be mitigated by Base Station (BS) Coordination, the demand on inter-BS information exchange and computational complexity grows rapidly with the number of cells, subcarriers, and users. On the other hand, some of the existing coordination beamforming methods need computation of pseudo-inverse or generalized eigenvector of a matrix, which are practically difficult to implement in a real system. To handle these issues, we propose a novel linear beamforming across a set of coordinated cells only with limiting backhaul signalling. Resource allocation (i.e. precoding and power control) is formulated as an optimization problem with objective function of signal-to-interference-plus-noise ratios (SINRs) in order to maximize the instantaneous weighted sum-rate subject to power constraints. Although the primal problem is nonconvex and difficult to be optimally solved, an iterative algorithm is presented based on the Karush-Kuhn-Tucker (KKT) condition. To have a practical solution with low computational complexity and signalling overhead, we present CB-REFIM (coordination beamforming-reference based interference management) and show the recently proposed REFIM algorithm can be interpreted as a special case of CB-REFIM. We evaluate CB-REFIM through extensive simulation and observe that the proposed strategies achieve close-to-optimal performance.
△ Less
Submitted 9 July, 2014; v1 submitted 5 July, 2014;
originally announced July 2014.