-
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Authors:
Marah Abdin,
Sam Ade Jacobs,
Ammar Ahmad Awan,
Jyoti Aneja,
Ahmed Awadallah,
Hany Awadalla,
Nguyen Bach,
Amit Bahree,
Arash Bakhtiari,
Jianmin Bao,
Harkirat Behl,
Alon Benhaim,
Misha Bilenko,
Johan Bjorck,
Sébastien Bubeck,
Qin Cai,
Martin Cai,
Caio César Teodoro Mendes,
Weizhu Chen,
Vishrav Chaudhary,
Dong Chen,
Dongdong Chen,
Yen-Chun Chen,
Yi-Ling Chen,
Parul Chopra
, et al. (90 additional authors not shown)
Abstract:
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset…
▽ More
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts.
△ Less
Submitted 23 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Authors:
Juan Manuel Zambrano Chaves,
Shih-Cheng Huang,
Yanbo Xu,
Hanwen Xu,
Naoto Usuyama,
Sheng Zhang,
Fei Wang,
Yujia Xie,
Mahmoud Khademi,
Ziyi Yang,
Hany Awadalla,
Julia Gong,
Houdong Hu,
Jianwei Yang,
Chunyuan Li,
Jianfeng Gao,
Yu Gu,
Cliff Wong,
Mu Wei,
Tristan Naumann,
Muhao Chen,
Matthew P. Lungren,
Akshay Chaudhari,
Serena Yeung-Levy,
Curtis P. Langlotz
, et al. (2 additional authors not shown)
Abstract:
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant…
▽ More
The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant performance gaps in multimodal biomedical applications. More importantly, less-acknowledged pragmatic issues, including accessibility, model cost, and tedious manual evaluation make it hard for clinicians to use state-of-the-art large models directly on private patient data. Here, we explore training open-source small multimodal models (SMMs) to bridge competency gaps for unmet clinical needs in radiology. To maximize data efficiency, we adopt a modular approach by incorporating state-of-the-art pre-trained models for image and text modalities, and focusing on training a lightweight adapter to ground each modality to the text embedding space, as exemplified by LLaVA-Med. For training, we assemble a large dataset of over 697 thousand radiology image-text pairs. For evaluation, we propose CheXprompt, a GPT-4-based metric for factuality evaluation, and demonstrate its parity with expert evaluation. For best practice, we conduct a systematic ablation study on various choices in data engineering and multimodal training. The resulting LlaVA-Rad (7B) model attains state-of-the-art results on standard radiology tasks such as report generation and cross-modal retrieval, even outperforming much larger models such as GPT-4V and Med-PaLM M (84B). The inference of LlaVA-Rad is fast and can be performed on a single V100 GPU in private settings, offering a promising state-of-the-art tool for real-world clinical applications.
△ Less
Submitted 26 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Large Non-Volatile Frequency Tuning of Spin Hall Nano-Oscillators using Circular Memristive Nano-Gates
Authors:
Maha Khademi,
Akash Kumar,
Mona Rajabali,
Saroj P. Dash,
Johan Åkerman
Abstract:
Spin Hall nano oscillators (SHNOs) are promising candidates for neuromorphic computing due to their miniaturized dimensions, non-linearity, fast dynamics, and ability to synchronize in long chains and arrays. However, tuning the individual SHNOs in large chains/arrays, which is key to implementing synaptic control, has remained a challenge. Here, we demonstrate circular memristive nano-gates, both…
▽ More
Spin Hall nano oscillators (SHNOs) are promising candidates for neuromorphic computing due to their miniaturized dimensions, non-linearity, fast dynamics, and ability to synchronize in long chains and arrays. However, tuning the individual SHNOs in large chains/arrays, which is key to implementing synaptic control, has remained a challenge. Here, we demonstrate circular memristive nano-gates, both precisely aligned and shifted with respect to nano-constriction SHNOs of W/CoFeB/HfOx, with increased quality of the device tunability. Gating at the exact center of the nano-constriction region is found to cause irreversible degradation to the oxide layer, resulting in a permanent frequency shift of the auto-oscillating modes. As a remedy, gates shifted outside of the immediate nano-constriction region can tune the frequency dramatically (>200 MHz) without causing any permanent change to the constriction region. Circular memristive nano-gates can, therefore, be used in SHNO chains/arrays to manipulate the synchronization states precisely over large networks of oscillators.
△ Less
Submitted 18 January, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Authors:
Zineng Tang,
Ziyi Yang,
Mahmoud Khademi,
Yang Liu,
Chenguang Zhu,
Mohit Bansal
Abstract:
We present CoDi-2, a versatile and interactive Multimodal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm. By aligning modalities with language for both encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to not only understand comp…
▽ More
We present CoDi-2, a versatile and interactive Multimodal Large Language Model (MLLM) that can follow complex multimodal interleaved instructions, conduct in-context learning (ICL), reason, chat, edit, etc., in an any-to-any input-output modality paradigm. By aligning modalities with language for both encoding and generation, CoDi-2 empowers Large Language Models (LLMs) to not only understand complex modality-interleaved instructions and in-context examples, but also autoregressively generate grounded and coherent multimodal outputs in the continuous feature space. To train CoDi-2, we build a large-scale generation dataset encompassing in-context multimodal instructions across text, vision, and audio. CoDi-2 demonstrates a wide range of zero-shot capabilities for multimodal generation, such as in-context learning, reasoning, and compositionality of any-to-any modality generation through multi-round interactive conversation. CoDi-2 surpasses previous domain-specific models on tasks such as subject-driven image generation, vision transformation, and audio editing. CoDi-2 signifies a substantial breakthrough in develo** a comprehensive multimodal foundation model adept at interpreting in-context language-vision-audio interleaved instructions and producing multimodal outputs.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Black hole solutions to Einstein-Bel-Robinson gravity
Authors:
S. N. Sajadi,
Robert B. Mann,
H. Sheikhahmadi,
M. Khademi
Abstract:
In this paper, we study the physical properties of black holes in the framework of the recently proposed Einstien-Bel-Robinson gravity. We show that interestingly the theory propagates a transverse and massive graviton on a maximally symmetric background with positive energy. There is also a single ghost-free branch that returns to the Einstein case when β\to 0. We find new black hole solutions to…
▽ More
In this paper, we study the physical properties of black holes in the framework of the recently proposed Einstien-Bel-Robinson gravity. We show that interestingly the theory propagates a transverse and massive graviton on a maximally symmetric background with positive energy. There is also a single ghost-free branch that returns to the Einstein case when β\to 0. We find new black hole solutions to the equations, both approximate and exact, the latter being a constant curvature black hole solution, and discuss inconsistencies with metrics that were previously claimed to be approximate solutions to the equations. We obtain the conserved charges of the theory and briefly study the thermodynamics of the black hole solutions.
△ Less
Submitted 28 January, 2024; v1 submitted 2 August, 2023;
originally announced August 2023.
-
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Authors:
Yuwei Fang,
Mahmoud Khademi,
Chenguang Zhu,
Ziyi Yang,
Reid Pryzant,
Yichong Xu,
Yao Qian,
Takuya Yoshioka,
Lu Yuan,
Michael Zeng,
Xuedong Huang
Abstract:
Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and eff…
▽ More
Artificial General Intelligence (AGI) requires comprehensive understanding and generation capabilities for a variety of tasks spanning different modalities and functionalities. Integrative AI is one important direction to approach AGI, through combining multiple models to tackle complex multimodal tasks. However, there is a lack of a flexible and composable platform to facilitate efficient and effective model composition and coordination. In this paper, we propose the i-Code Studio, a configurable and composable framework for Integrative AI. The i-Code Studio orchestrates multiple pre-trained models in a finetuning-free fashion to conduct complex multimodal tasks. Instead of simple model composition, the i-Code Studio provides an integrative, flexible, and composable setting for developers to quickly and easily compose cutting-edge services and technologies tailored to their specific requirements. The i-Code Studio achieves impressive results on a variety of zero-shot multimodal tasks, such as video-to-text retrieval, speech-to-speech translation, and visual question answering. We also demonstrate how to quickly build a multimodal agent based on the i-Code Studio that can communicate and personalize for users.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Authors:
Ziyi Yang,
Mahmoud Khademi,
Yichong Xu,
Reid Pryzant,
Yuwei Fang,
Chenguang Zhu,
Dongdong Chen,
Yao Qian,
Mei Gao,
Yi-Ling Chen,
Robert Gmyr,
Naoyuki Kanda,
Noel Codella,
Bin Xiao,
Yu Shi,
Lu Yuan,
Takuya Yoshioka,
Michael Zeng,
Xuedong Huang
Abstract:
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is a…
▽ More
The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Improving Dense Contrastive Learning with Dense Negative Pairs
Authors:
Berk Iskender,
Zhenlin Xu,
Simon Kornblith,
En-Hung Chu,
Maryam Khademi
Abstract:
Many contrastive representation learning methods learn a single global representation of an entire image. However, dense contrastive representation learning methods such as DenseCL (Wang et al., 2021) can learn better representations for tasks requiring stronger spatial localization of features, such as multi-label classification, detection, and segmentation. In this work, we study how to improve…
▽ More
Many contrastive representation learning methods learn a single global representation of an entire image. However, dense contrastive representation learning methods such as DenseCL (Wang et al., 2021) can learn better representations for tasks requiring stronger spatial localization of features, such as multi-label classification, detection, and segmentation. In this work, we study how to improve the quality of the representations learned by DenseCL by modifying the training scheme and objective function, and propose DenseCL++. We also conduct several ablation studies to better understand the effects of: (i) various techniques to form dense negative pairs among augmentations of different images, (ii) cross-view dense negative and positive pairs, and (iii) an auxiliary reconstruction task. Our results show 3.5% and 4% mAP improvement over SimCLR (Chen et al., 2020a) andDenseCL in COCO multi-label classification. In COCO and VOC segmentation tasks, we achieve 1.8% and 0.7% mIoU improvements over SimCLR, respectively.
△ Less
Submitted 10 January, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
A Study on Self-Supervised Object Detection Pretraining
Authors:
Trung Dang,
Simon Kornblith,
Huy Thong Nguyen,
Peter Chin,
Maryam Khademi
Abstract:
In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as bo…
▽ More
In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views inspired by its success on instance-level image representation learning techniques. Our results suggest that the method is robust to different choices of hyperparameters, and using multiple views is not as effective as shown for instance-level image representation learning. We also design two auxiliary tasks to predict boxes in one view from their features in the other view, by (1) predicting boxes from the sampled set by using a contrastive loss, and (2) predicting box coordinates using a transformer, which potentially benefits downstream object detection tasks. We found that these tasks do not lead to better object detection performance when finetuning the pretrained model on labeled data.
△ Less
Submitted 10 August, 2022; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Influence of Magnetic Fields on the Gas Rotation in the Galaxy $NGC\;6946$
Authors:
M. Khademi,
S. Nasiri,
F. S. Tabatabaei
Abstract:
Magnetic fields can play an important role in the energy balance and formation of gas structures in galaxies. However, their dynamical effect on the rotation curve of galaxies is immensely unexplored. We investigate the dynamical effect of the known magnetic arms of $NGC\;6946$ on its circular gas rotation traced in HI, considering two dark matter mass density models, ISO, and the universal NFW pr…
▽ More
Magnetic fields can play an important role in the energy balance and formation of gas structures in galaxies. However, their dynamical effect on the rotation curve of galaxies is immensely unexplored. We investigate the dynamical effect of the known magnetic arms of $NGC\;6946$ on its circular gas rotation traced in HI, considering two dark matter mass density models, ISO, and the universal NFW profile. We used a three-dimensional model for the magnetic field structure to fit the modeled rotation curve to the observed data via an $χ$-squared minimization method. The shape of the HI gas rotation curve is reproduced better including the effect of the magnetic field, especially in the outer part, where the dynamical effect of the magnetic field could become important. The typical amplitude of the regular magnetic field contribution in the rotation curve is about $ 6 - 14 \; km s^{-1}$ in the outer gaseous disk of the galaxy $NGC\;6946$. The contribution ratio of the regular magnetic field to the observed circular velocity and to dark matter increases with the galactocentric radius. Its ratio to the observed rotational velocity is about five percent and to dark matter is about 10 percent in the outer regions of the galaxy $NGC\;6946$. Therefore, the large-scale magnetic fields cannot be completely ignored in the large-scale dynamics of spiral galaxies, especially in the outer parts of galaxies.
△ Less
Submitted 7 February, 2023; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Kinematical asymmetry in the dwarf irregular galaxy WLM and a perturbed halo potential
Authors:
M. Khademi,
Y. Yang,
F. Hammer,
S. Nasiri
Abstract:
WLM is a dwarf irregular that is seen almost edge-on that has prompted a number of kinematical studies investigating its rotation curve and its dark matter content. In this paper, we investigate the origin of the strong asymmetry of the rotation curve, which shows a significant discrepancy between the approaching and the receding side. We first examine whether an $m = 1$ perturbation (lopsidedness…
▽ More
WLM is a dwarf irregular that is seen almost edge-on that has prompted a number of kinematical studies investigating its rotation curve and its dark matter content. In this paper, we investigate the origin of the strong asymmetry of the rotation curve, which shows a significant discrepancy between the approaching and the receding side. We first examine whether an $m = 1$ perturbation (lopsidedness) in the halo potential could be a mechanism creating such kinematical asymmetry. To do so, we fit a theoretical rotational velocity associated with an $m = 1$ perturbation in the halo potential model to the observed data via a $χ-$squared minimization method. We show that a lopsided halo potential model can explain the asymmetry in the kinematic data reasonably well. We then verify that the kinematical classification of WLM shows that its velocity field is significantly perturbed due to both its asymmetrical rotation curve and also its peculiar velocity dispersion map. In addition, based on a kinemetry analysis, we find that it is possible for WLM to lie in the transition region, where the disk and merger coexist. In conclusion, it appears that the rotation curve of WLM diverges significantly from that of an ideal rotating disk, which may significantly affect investigations of its dark matter content.
△ Less
Submitted 22 August, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Boosting Contrastive Self-Supervised Learning with False Negative Cancellation
Authors:
Tri Huynh,
Simon Kornblith,
Matthew R. Walter,
Michael Maire,
Maryam Khademi
Abstract:
Self-supervised representation learning has made significant leaps fueled by progress in contrastive learning, which seeks to learn transformations that embed positive input pairs nearby, while pushing negative pairs far apart. While positive pairs can be generated reliably (e.g., as different views of the same image), it is difficult to accurately establish negative pairs, defined as samples from…
▽ More
Self-supervised representation learning has made significant leaps fueled by progress in contrastive learning, which seeks to learn transformations that embed positive input pairs nearby, while pushing negative pairs far apart. While positive pairs can be generated reliably (e.g., as different views of the same image), it is difficult to accurately establish negative pairs, defined as samples from different images regardless of their semantic content or visual features. A fundamental problem in contrastive learning is mitigating the effects of false negatives. Contrasting false negatives induces two critical issues in representation learning: discarding semantic information and slow convergence. In this paper, we propose novel approaches to identify false negatives, as well as two strategies to mitigate their effect, i.e. false negative elimination and attraction, while systematically performing rigorous evaluations to study this problem in detail. Our method exhibits consistent improvements over existing contrastive learning-based methods. Without labels, we identify false negatives with 40% accuracy among 1000 semantic classes on ImageNet, and achieve 5.8% absolute improvement in top-1 accuracy over the previous state-of-the-art when finetuning with 1% labels. Our code is available at https://github.com/google-research/fnc.
△ Less
Submitted 2 January, 2022; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Physical Properties of a Regular Rotating Black Hole: Thermodynamics, Stability, Quasinormal Modes
Authors:
S. H. Hendi,
S. N. Sajadi,
Maryam. Khademi
Abstract:
Respecting the angular momentum conservation of torque-free systems, it is natural to consider rotating solutions of massive objects. Besides that, motivated by the realistic astrophysical black holes that rotate, we use the Newman-Janis formalism to construct a regular rotating black hole. We start with a nonlinearly charged regular static black hole in the framework of the standard general relat…
▽ More
Respecting the angular momentum conservation of torque-free systems, it is natural to consider rotating solutions of massive objects. Besides that, motivated by the realistic astrophysical black holes that rotate, we use the Newman-Janis formalism to construct a regular rotating black hole. We start with a nonlinearly charged regular static black hole in the framework of the standard general relativity and then obtain the associated rotating solution through such a formalism. We investigate the geometrical properties of the metric by studying the boundary of ergosphere. We also analyze thermodynamic properties of the solution in AdS spacetime and examine thermal stability and possible phase transition. In addition, we perturb the black hole by using of a real massless scalar field as a probe to investigate its dynamic stability. We obtain an analytic expression for the real and imaginary parts of the quasinormal frequencies. Finally, we look for a connection between the quasinormal frequencies and the properties of the photon sphere in the eikonal limit.
△ Less
Submitted 20 June, 2020;
originally announced June 2020.
-
Does Homophily Make Socialbots More Influential? Exploring Infiltration Strategies
Authors:
Samaneh Hosseini Moghaddam,
Mandana Khademi,
Maghsoud Abbaspour
Abstract:
Socialbots are intelligent software controlling all the behavior of fake accounts in an online social network. They use artificial intelligence techniques to pass themselves off as human social media users. Socialbots exploit user trust to achieve their malicious goals, such as astroturfing, performing Sybil attacks, spamming, and harvesting private data. The first phase to countermeasure the mali…
▽ More
Socialbots are intelligent software controlling all the behavior of fake accounts in an online social network. They use artificial intelligence techniques to pass themselves off as human social media users. Socialbots exploit user trust to achieve their malicious goals, such as astroturfing, performing Sybil attacks, spamming, and harvesting private data. The first phase to countermeasure the malicious activities of the socialbots is studying their characteristics and revealing strategies they can employ to successfully infiltrate stealthily into target online social network. In this paper,we investigate the success of using different infiltration strategies in terms of infiltration performance and being stealthy. Every strategy is characterized by socialbots profile and behavioral characteristics. The findings from this study illustrate that assuming a specific taste for the tweets a socialbot retweets and/or likes make it more influential. Furthermore, the experimental results indicate that considering the presence of common characteristics and similarity increase the probability of being followed by other users. This is in complete agreement with homophily concept which is the tendency of individuals to associate and bond with similar others in social networks.
△ Less
Submitted 15 March, 2019;
originally announced March 2019.
-
Learning to Represent Programs with Graphs
Authors:
Miltiadis Allamanis,
Marc Brockschmidt,
Mahmoud Khademi
Abstract:
Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent…
▽ More
Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. For example, long-range dependencies induced by using the same variable or function in distant locations are often not considered. We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures.
In this work, we present how to construct graphs from source code and how to scale Gated Graph Neural Networks training to such large graphs. We evaluate our method on two tasks: VarNaming, in which a network attempts to predict the name of a variable given its usage, and VarMisuse, in which the network learns to reason about selecting the correct variable that should be used at a given program location. Our comparison to methods that use less structured program representations shows the advantages of modeling known structure, and suggests that our models learn to infer meaningful names and to solve the VarMisuse task in many cases. Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects.
△ Less
Submitted 4 May, 2018; v1 submitted 1 November, 2017;
originally announced November 2017.
-
Conceptual Text Summarizer: A new model in continuous vector space
Authors:
Mohammad Ebrahim Khademi,
Mohammad Fakhredanesh,
Seyed Mojtaba Hoseini
Abstract:
Traditional methods of summarization are not cost-effective and possible today. Extractive summarization is a process that helps to extract the most important sentences from a text automatically and generates a short informative summary. In this work, we propose an unsupervised method to summarize Persian texts. This method is a novel hybrid approach that clusters the concepts of the text using de…
▽ More
Traditional methods of summarization are not cost-effective and possible today. Extractive summarization is a process that helps to extract the most important sentences from a text automatically and generates a short informative summary. In this work, we propose an unsupervised method to summarize Persian texts. This method is a novel hybrid approach that clusters the concepts of the text using deep learning and traditional statistical methods. First we produce a word embedding based on Hamshahri2 corpus and a dictionary of word frequencies. Then the proposed algorithm extracts the keywords of the document, clusters its concepts, and finally ranks the sentences to produce the summary. We evaluated the proposed method on Pasokh single-document corpus using the ROUGE evaluation measure. Without using any hand-crafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Persian methods and we achieved an overall improvement of ROUGE-2 recall score of 7.5%.
△ Less
Submitted 1 September, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
A dynamic Stackelberg game for green supply chain management
Authors:
Mehrnoosh Khademi,
Massimiliano Ferrara,
Mehdi Salimi,
Somayeh Sharifi
Abstract:
In this paper, we establish a dynamic game to allocate CSR (Corporate Social Responsibility) to the members of a supply chain. We propose a model of a three-tier supply chain in a decentralized state which includes a supplier, a manufacturer and a retailer. For analyzing supply chain performance in decentralized state and the relationships between the members of the supply chain, we use a Stackelb…
▽ More
In this paper, we establish a dynamic game to allocate CSR (Corporate Social Responsibility) to the members of a supply chain. We propose a model of a three-tier supply chain in a decentralized state which includes a supplier, a manufacturer and a retailer. For analyzing supply chain performance in decentralized state and the relationships between the members of the supply chain, we use a Stackelberg game and consider in this paper a hierarchical equilibrium solution for a two-level game. In particular, we formulate a model that crosses through multi-periods with the help of a dynamic discrete Stackelberg game. We obtain an equilibrium point at which both the profits of members and the level of CSR taken up by supply chains is maximized.
△ Less
Submitted 21 June, 2015;
originally announced June 2015.
-
A dynamic game on Green Supply Chain Management
Authors:
Mehrnoosh Khademi,
Massimiliano Ferrara,
Bruno Pansera,
Mehdi Salimi
Abstract:
In this paper, we establish a dynamic game to allocate CSR (Corporate Social Responsibility) to the members of a supply chain. We propose a model of three-tier supply chain in decentralized state that is including supplier, manufacturer and retailer. For analyzing supply chain performance in decentralized state and the relationships between the members of supply chain, we use Stackelberg game and…
▽ More
In this paper, we establish a dynamic game to allocate CSR (Corporate Social Responsibility) to the members of a supply chain. We propose a model of three-tier supply chain in decentralized state that is including supplier, manufacturer and retailer. For analyzing supply chain performance in decentralized state and the relationships between the members of supply chain, we use Stackelberg game and we consider in this paper a hierarchical equilibrium solution for a two-level game. Specially, we formulate a model that crosses through multi-periods by a dynamic discreet Stackelberg game. We try to obtain an equilibrium point at where both the profits of members and the level of CSR taken by supply chains are maximized.
△ Less
Submitted 12 March, 2015;
originally announced March 2015.
-
3D Hand Pose Detection in Egocentric RGB-D Images
Authors:
Gregory Rogez,
James S. Supancic III,
Maryam Khademi,
Jose Maria Martinez Montiel,
Deva Ramanan
Abstract:
We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved…
▽ More
We focus on the task of everyday hand pose estimation from egocentric viewpoints. For this task, we show that depth sensors are particularly informative for extracting near-field interactions of the camera wearer with his/her environment. Despite the recent advances in full-body pose estimation using Kinect-like sensors, reliable monocular hand pose estimation in RGB-D images is still an unsolved problem. The problem is considerably exacerbated when analyzing hands performing daily activities from a first-person viewpoint, due to severe occlusions arising from object manipulations and a limited field-of-view. Our system addresses these difficulties by exploiting strong priors over viewpoint and pose in a discriminative tracking-by-detection framework. Our priors are operationalized through a photorealistic synthetic model of egocentric scenes, which is used to generate training data for learning depth-based pose classifiers. We evaluate our approach on an annotated dataset of real egocentric object manipulation scenes and compare to both commercial and academic approaches. Our method provides state-of-the-art performance for both hand detection and pose estimation in egocentric RGB-D images.
△ Less
Submitted 28 November, 2014;
originally announced December 2014.
-
Relative Facial Action Unit Detection
Authors:
Mahmoud Khademi,
Louis-Philippe Morency
Abstract:
This paper presents a subject-independent facial action unit (AU) detection method by introducing the concept of relative AU detection, for scenarios where the neutral face is not provided. We propose a new classification objective function which analyzes the temporal neighborhood of the current frame to decide if the expression recently increased, decreased or showed no change. This approach is a…
▽ More
This paper presents a subject-independent facial action unit (AU) detection method by introducing the concept of relative AU detection, for scenarios where the neutral face is not provided. We propose a new classification objective function which analyzes the temporal neighborhood of the current frame to decide if the expression recently increased, decreased or showed no change. This approach is a significant change from the conventional absolute method which decides about AU classification using the current frame, without an explicit comparison with its neighboring frames. Our proposed method improves robustness to individual differences such as face scale and shape, age-related wrinkles, and transitions among expressions (e.g., lower intensity of expressions). Our experiments on three publicly available datasets (Extended Cohn-Kanade (CK+), Bosphorus, and DISFA databases) show significant improvement of our approach over conventional absolute techniques. Keywords: facial action coding system (FACS); relative facial action unit detection; temporal information;
△ Less
Submitted 30 April, 2014;
originally announced May 2014.
-
Predicting a Business Star in Yelp from Its Reviews Text Alone
Authors:
Mingming Fan,
Maryam Khademi
Abstract:
Yelp online reviews are invaluable source of information for users to choose where to visit or what to eat among numerous available options. But due to overwhelming number of reviews, it is almost impossible for users to go through all reviews and find the information they are looking for. To provide a business overview, one solution is to give the business a 1-5 star(s). This rating can be subjec…
▽ More
Yelp online reviews are invaluable source of information for users to choose where to visit or what to eat among numerous available options. But due to overwhelming number of reviews, it is almost impossible for users to go through all reviews and find the information they are looking for. To provide a business overview, one solution is to give the business a 1-5 star(s). This rating can be subjective and biased toward users personality. In this paper, we predict a business rating based on user-generated reviews texts alone. This not only provides an overview of plentiful long review texts but also cancels out subjectivity. Selecting the restaurant category from Yelp Dataset Challenge, we use a combination of three feature generation methods as well as four machine learning models to find the best prediction result. Our approach is to create bag of words from the top frequent words in all raw text reviews, or top frequent words/adjectives from results of Part-of-Speech analysis. Our results show Root Mean Square Error (RMSE) of 0.6 for the combination of Linear Regression with either of the top frequent words from raw data or top frequent adjectives after Part-of-Speech (POS).
△ Less
Submitted 4 January, 2014;
originally announced January 2014.
-
Extended Active Learning Method
Authors:
Ali Akbar Kiaei,
Saeed Bagheri Shouraki,
Seyed Hossein Khasteh,
Mahmoud Khademi,
Alireza Ghatreh Samani
Abstract:
Active Learning Method (ALM) is a soft computing method which is used for modeling and control, based on fuzzy logic. Although ALM has shown that it acts well in dynamic environments, its operators cannot support it very well in complex situations due to losing data. Thus ALM can find better membership functions if more appropriate operators be chosen for it. This paper substituted two new operato…
▽ More
Active Learning Method (ALM) is a soft computing method which is used for modeling and control, based on fuzzy logic. Although ALM has shown that it acts well in dynamic environments, its operators cannot support it very well in complex situations due to losing data. Thus ALM can find better membership functions if more appropriate operators be chosen for it. This paper substituted two new operators instead of ALM original ones; which consequently renewed finding membership functions in a way superior to conventional ALM. This new method is called Extended Active Learning Method (EALM).
△ Less
Submitted 17 January, 2011; v1 submitted 10 November, 2010;
originally announced November 2010.
-
Local Component Analysis for Nonparametric Bayes Classifier
Authors:
Mahmoud Khademi,
Mohammad T. Manzuri-Shalmani,
Meharn safayani
Abstract:
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depen…
▽ More
The decision boundaries of Bayes classifier are optimal because they lead to maximum probability of correct decision. It means if we knew the prior probabilities and the class-conditional densities, we could design a classifier which gives the lowest probability of error. However, in classification based on nonparametric density estimation methods such as Parzen windows, the decision regions depend on the choice of parameters such as window width. Moreover, these methods suffer from curse of dimensionality of the feature space and small sample size problem which severely restricts their practical applications. In this paper, we address these problems by introducing a novel dimension reduction and classification method based on local component analysis. In this method, by adopting an iterative cross-validation algorithm, we simultaneously estimate the optimal transformation matrices (for dimension reduction) and classifier parameters based on local information. The proposed method can classify the data with complicated boundary and also alleviate the course of dimensionality dilemma. Experiments on real data show the superiority of the proposed algorithm in term of classification accuracies for pattern classification applications like age, facial expression and character recognition. Keywords: Bayes classifier, curse of dimensionality dilemma, Parzen window, pattern classification, subspace learning.
△ Less
Submitted 19 July, 2012; v1 submitted 24 October, 2010;
originally announced October 2010.
-
New S-norm and T-norm Operators for Active Learning Method
Authors:
Ali Akbar Kiaei,
Saeed Bagheri Shouraki,
Seyed Hossein Khasteh,
Mahmoud Khademi,
Ali Reza Ghatreh Samani
Abstract:
Active Learning Method (ALM) is a soft computing method used for modeling and control based on fuzzy logic. All operators defined for fuzzy sets must serve as either fuzzy S-norm or fuzzy T-norm. Despite being a powerful modeling method, ALM does not possess operators which serve as S-norms and T-norms which deprive it of a profound analytical expression/form. This paper introduces two new operato…
▽ More
Active Learning Method (ALM) is a soft computing method used for modeling and control based on fuzzy logic. All operators defined for fuzzy sets must serve as either fuzzy S-norm or fuzzy T-norm. Despite being a powerful modeling method, ALM does not possess operators which serve as S-norms and T-norms which deprive it of a profound analytical expression/form. This paper introduces two new operators based on morphology which satisfy the following conditions: First, they serve as fuzzy S-norm and T-norm. Second, they satisfy Demorgans law, so they complement each other perfectly. These operators are investigated via three viewpoints: Mathematics, Geometry and fuzzy logic.
△ Less
Submitted 6 February, 2011; v1 submitted 21 October, 2010;
originally announced October 2010.
-
Extended Two-Dimensional PCA for Efficient Face Representation and Recognition
Authors:
Mehran Safayani,
Mohammad T. Manzuri-Shalmani,
Mahmoud Khademi
Abstract:
In this paper a novel method called Extended Two-Dimensional PCA (E2DPCA) is proposed which is an extension to the original 2DPCA. We state that the covariance matrix of 2DPCA is equivalent to the average of the main diagonal of the covariance matrix of PCA. This implies that 2DPCA eliminates some covariance information that can be useful for recognition. E2DPCA instead of just using the main diag…
▽ More
In this paper a novel method called Extended Two-Dimensional PCA (E2DPCA) is proposed which is an extension to the original 2DPCA. We state that the covariance matrix of 2DPCA is equivalent to the average of the main diagonal of the covariance matrix of PCA. This implies that 2DPCA eliminates some covariance information that can be useful for recognition. E2DPCA instead of just using the main diagonal considers a radius of r diagonals around it and expands the averaging so as to include the covariance information within those diagonals. The parameter r unifies PCA and 2DPCA. r = 1 produces the covariance of 2DPCA, r = n that of PCA. Hence, by controlling r it is possible to control the trade-offs between recognition accuracy and energy compression (fewer coefficients), and between training and recognition complexity. Experiments on ORL face database show improvement in both recognition accuracy and recognition time over the original 2DPCA.
△ Less
Submitted 5 April, 2010;
originally announced April 2010.
-
Multilinear Biased Discriminant Analysis: A Novel Method for Facial Action Unit Representation
Authors:
Mahmoud Khademi,
Mehran Safayani,
Mohammad T. Manzuri-Shalmani
Abstract:
In this paper a novel efficient method for representation of facial action units by encoding an image sequence as a fourth-order tensor is presented. The multilinear tensor-based extension of the biased discriminant analysis (BDA) algorithm, called multilinear biased discriminant analysis (MBDA), is first proposed. Then, we apply the MBDA and two-dimensional BDA (2DBDA) algorithms, as the dimensio…
▽ More
In this paper a novel efficient method for representation of facial action units by encoding an image sequence as a fourth-order tensor is presented. The multilinear tensor-based extension of the biased discriminant analysis (BDA) algorithm, called multilinear biased discriminant analysis (MBDA), is first proposed. Then, we apply the MBDA and two-dimensional BDA (2DBDA) algorithms, as the dimensionality reduction techniques, to Gabor representations and the geometric features of the input image sequence respectively. The proposed scheme can deal with the asymmetry between positive and negative samples as well as curse of dimensionality dilemma. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method for representation of the subtle changes and the temporal information involved in formation of the facial expressions. As an accurate tool, this representation can be applied to many areas such as recognition of spontaneous and deliberate facial expressions, multi modal/media human computer interaction and lie detection efforts.
△ Less
Submitted 4 April, 2010;
originally announced April 2010.
-
Recognizing Combinations of Facial Action Units with Different Intensity Using a Mixture of Hidden Markov Models and Neural Network
Authors:
Mahmoud Khademi,
Mohammad T. Manzuri-Shalmani,
Mohammad H. Kiapour,
Ali A. Kiaei
Abstract:
Facial Action Coding System consists of 44 action units (AUs) and more than 7000 combinations. Hidden Markov models (HMMs) classifier has been used successfully to recognize facial action units (AUs) and expressions due to its ability to deal with AU dynamics. However, a separate HMM is necessary for each single AU and each AU combination. Since combinations of AU numbering in thousands, a more ef…
▽ More
Facial Action Coding System consists of 44 action units (AUs) and more than 7000 combinations. Hidden Markov models (HMMs) classifier has been used successfully to recognize facial action units (AUs) and expressions due to its ability to deal with AU dynamics. However, a separate HMM is necessary for each single AU and each AU combination. Since combinations of AU numbering in thousands, a more efficient method will be needed. In this paper an accurate real-time sequence-based system for representation and recognition of facial AUs is presented. Our system has the following characteristics: 1) employing a mixture of HMMs and neural network, we develop a novel accurate classifier, which can deal with AU dynamics, recognize subtle changes, and it is also robust to intensity variations, 2) although we use an HMM for each single AU only, by employing a neural network we can recognize each single and combination AU, and 3) using both geometric and appearance-based features, and applying efficient dimension reduction techniques, our system is robust to illumination changes and it can represent the temporal information involved in formation of the facial expressions. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method, in comparison with other classifiers. Keywords: classifier design and evaluation, data fusion, facial action units (AUs), hidden Markov models (HMMs), neural network (NN).
△ Less
Submitted 4 April, 2010;
originally announced April 2010.
-
Analysis, Interpretation, and Recognition of Facial Action Units and Expressions Using Neuro-Fuzzy Modeling
Authors:
Mahmoud Khademi,
Mohammad Hadi Kiapour,
Mohammad T. Manzuri-Shalmani,
Ali A. Kiaei
Abstract:
In this paper an accurate real-time sequence-based system for representation, recognition, interpretation, and analysis of the facial action units (AUs) and expressions is presented. Our system has the following characteristics: 1) employing adaptive-network-based fuzzy inference systems (ANFIS) and temporal information, we developed a classification scheme based on neuro-fuzzy modeling of the AU…
▽ More
In this paper an accurate real-time sequence-based system for representation, recognition, interpretation, and analysis of the facial action units (AUs) and expressions is presented. Our system has the following characteristics: 1) employing adaptive-network-based fuzzy inference systems (ANFIS) and temporal information, we developed a classification scheme based on neuro-fuzzy modeling of the AU intensity, which is robust to intensity variations, 2) using both geometric and appearance-based features, and applying efficient dimension reduction techniques, our system is robust to illumination changes and it can represent the subtle changes as well as temporal information involved in formation of the facial expressions, and 3) by continuous values of intensity and employing top-down hierarchical rule-based classifiers, we can develop accurate human-interpretable AU-to-expression converters. Extensive experiments on Cohn-Kanade database show the superiority of the proposed method, in comparison with support vector machines, hidden Markov models, and neural network classifiers. Keywords: biased discriminant analysis (BDA), classifier design and evaluation, facial action units (AUs), hybrid learning, neuro-fuzzy modeling.
△ Less
Submitted 4 April, 2010;
originally announced April 2010.
-
Facial Expression Representation and Recognition Using 2DHLDA, Gabor Wavelets, and Ensemble Learning
Authors:
Mahmoud Khademi,
Mohammad H. Kiapour,
Mehran Safayani,
Mohammad T. Manzuri,
M. Shojaei
Abstract:
In this paper, a novel method for representation and recognition of the facial expressions in two-dimensional image sequences is presented. We apply a variation of two-dimensional heteroscedastic linear discriminant analysis (2DHLDA) algorithm, as an efficient dimensionality reduction technique, to Gabor representation of the input sequence. 2DHLDA is an extension of the two-dimensional linear dis…
▽ More
In this paper, a novel method for representation and recognition of the facial expressions in two-dimensional image sequences is presented. We apply a variation of two-dimensional heteroscedastic linear discriminant analysis (2DHLDA) algorithm, as an efficient dimensionality reduction technique, to Gabor representation of the input sequence. 2DHLDA is an extension of the two-dimensional linear discriminant analysis (2DLDA) approach and it removes the equal within-class covariance. By applying 2DHLDA in two directions, we eliminate the correlations between both image columns and image rows. Then, we perform a one-dimensional LDA on the new features. This combined method can alleviate the small sample size problem and instability encountered by HLDA. Also, employing both geometric and appearance features and using an ensemble learning scheme based on data fusion, we create a classifier which can efficiently classify the facial expressions. The proposed method is robust to illumination changes and it can properly represent temporal information as well as subtle changes in facial muscles. We provide experiments on Cohn-Kanade database that show the superiority of the proposed method. KEYWORDS: two-dimensional heteroscedastic linear discriminant analysis (2DHLDA), subspace learning, facial expression analysis, Gabor wavelets, ensemble learning.
△ Less
Submitted 19 July, 2012; v1 submitted 2 April, 2010;
originally announced April 2010.