-
Let's Go Shop** (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Authors:
Yatong Bai,
Utsav Garg,
Apaar Shanker,
Haoming Zhang,
Samyak Parajuli,
Erhan Bas,
Isidora Filipovic,
Amelia N. Chu,
Eugenia D Fomitcheva,
Elliot Branson,
Aerin Kim,
Somayeh Sojoudi,
Kyunghyun Cho
Abstract:
Vision and vision-language applications of neural networks, such as image classification and captioning, rely on large-scale annotated datasets that require non-trivial data-collecting processes. This time-consuming endeavor hinders the emergence of large-scale datasets, limiting researchers and practitioners to a small number of choices. Therefore, we seek more efficient ways to collect and annot…
▽ More
Vision and vision-language applications of neural networks, such as image classification and captioning, rely on large-scale annotated datasets that require non-trivial data-collecting processes. This time-consuming endeavor hinders the emergence of large-scale datasets, limiting researchers and practitioners to a small number of choices. Therefore, we seek more efficient ways to collect and annotate images. Previous initiatives have gathered captions from HTML alt-texts and crawled social media postings, but these data sources suffer from noise, sparsity, or subjectivity. For this reason, we turn to commercial shop** websites whose data meet three criteria: cleanliness, informativeness, and fluency. We introduce the Let's Go Shop** (LGS) dataset, a large-scale public dataset with 15 million image-caption pairs from publicly available e-commerce websites. When compared with existing general-domain datasets, the LGS images focus on the foreground object and have less complex backgrounds. Our experiments on LGS show that the classifiers trained on existing benchmark datasets do not readily generalize to e-commerce data, while specific self-supervised visual feature extractors can better generalize. Furthermore, LGS's high-quality e-commerce-focused images and bimodal nature make it advantageous for vision-language bi-modal tasks: LGS enables image-captioning models to generate richer captions and helps text-to-image generation models achieve e-commerce style transfer.
△ Less
Submitted 5 March, 2024; v1 submitted 9 January, 2024;
originally announced January 2024.
-
On the Performance of Multimodal Language Models
Authors:
Utsav Garg,
Erhan Bas
Abstract:
Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently pretrained vision encoders through model grafting. These multimodal variants undergo instruction tuning, similar to LLMs, enabling effective zero-shot generalizat…
▽ More
Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently pretrained vision encoders through model grafting. These multimodal variants undergo instruction tuning, similar to LLMs, enabling effective zero-shot generalization for multimodal tasks. This study conducts a comparative analysis of different multimodal instruction tuning approaches and evaluates their performance across a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not sufficiently address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.
△ Less
Submitted 27 November, 2023; v1 submitted 4 October, 2023;
originally announced October 2023.
-
Detecting and Preventing Hallucinations in Large Vision Language Models
Authors:
Anisha Gunjal,
Jihan Yin,
Erhan Bas
Abstract:
Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30…
▽ More
Instruction tuned Large Vision Language Models (LVLMs) have significantly advanced in generalizing across a diverse set of multi-modal tasks, especially for Visual Question Answering (VQA). However, generating detailed responses that are visually grounded is still a challenging task for these models. We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships. To address this, we introduce M-HalDetect, a (M)ultimodal (Hal)lucination (Detect)ion Dataset that can be used to train and benchmark models for hallucination detection and prevention. M-HalDetect consists of 16k fine-grained annotations on VQA examples, making it the first comprehensive multi-modal hallucination detection dataset for detailed image descriptions. Unlike previous work that only consider object hallucination, we additionally annotate both entity descriptions and relationships that are unfaithful. To demonstrate the potential of this dataset for hallucination prevention, we optimize InstructBLIP through our novel Fine-grained Direct Preference Optimization (FDPO). We also train fine-grained multi-modal reward models from InstructBLIP and evaluate their effectiveness with best-of-n rejection sampling. We perform human evaluation on both FDPO and rejection sampling, and find that they reduce hallucination rates in InstructBLIP by 41% and 55% respectively. We also find that our reward model generalizes to other multi-modal models, reducing hallucinations in LLaVA and mPLUG-OWL by 15% and 57% respectively, and has strong correlation with human evaluated accuracy scores.
△ Less
Submitted 11 February, 2024; v1 submitted 11 August, 2023;
originally announced August 2023.
-
Masked Vision and Language Modeling for Multi-modal Representation Learning
Authors:
Gukyeong Kwon,
Zhaowei Cai,
Avinash Ravichandran,
Erhan Bas,
Rahul Bhotika,
Stefano Soatto
Abstract:
In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of develo** masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature…
▽ More
In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of develo** masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method, along with common V+L alignment losses, achieves state-of-the-art performance in the regime of millions of pre-training data. Also, we outperforms the other competitors by a significant margin in limited data scenarios.
△ Less
Submitted 14 March, 2023; v1 submitted 3 August, 2022;
originally announced August 2022.
-
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
Authors:
Zhaowei Cai,
Gukyeong Kwon,
Avinash Ravichandran,
Erhan Bas,
Zhuowen Tu,
Rahul Bhotika,
Stefano Soatto
Abstract:
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and language streams are independent until the end and t…
▽ More
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and language streams are independent until the end and they are aligned using an efficient dot-product operation. The whole network is trained end-to-end, such that the detector is optimized for the vision-language tasks instead of an off-the-shelf component. To overcome the limited size of paired object-language annotations, we leverage other weak types of supervision to expand the knowledge coverage. This simple yet effective architecture of X-DETR shows good accuracy and fast speeds for multiple instance-wise vision-language tasks, e.g., 16.4 AP on LVIS detection of 1.2K categories at ~20 frames per second without using any LVIS annotation during training.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
Predicting Nonlinear Seismic Response of Structural Braces Using Machine Learning
Authors:
Elif Ecem Bas,
Denis Aslangil,
Mohamed A. Moustafa
Abstract:
Numerical modeling of different structural materials that have highly nonlinear behaviors has always been a challenging problem in engineering disciplines. Experimental data is commonly used to characterize this behavior. This study aims to improve the modeling capabilities by using state of the art Machine Learning techniques, and attempts to answer several scientific questions: (i) Which ML algo…
▽ More
Numerical modeling of different structural materials that have highly nonlinear behaviors has always been a challenging problem in engineering disciplines. Experimental data is commonly used to characterize this behavior. This study aims to improve the modeling capabilities by using state of the art Machine Learning techniques, and attempts to answer several scientific questions: (i) Which ML algorithm is capable and is more efficient to learn such a complex and nonlinear problem? (ii) Is it possible to artificially reproduce structural brace seismic behavior that can represent real physics? (iii) How can our findings be extended to the different engineering problems that are driven by similar nonlinear dynamics? To answer these questions, the presented methods are validated by using experimental brace data. The paper shows that after proper data preparation, the long-short term memory (LSTM) method is highly capable of capturing the nonlinear behavior of braces. Additionally, the effects of tuning the hyperparameters on the models, such as layer numbers, neuron numbers, and the activation functions, are presented. Finally, the ability to learn nonlinear dynamics by using deep neural network algorithms and their advantages are briefly discussed.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Using Machine Learning Approach for Computational Substructure in Real-Time Hybrid Simulation
Authors:
Elif Ecem Bas,
Mohamed A. Moustafa,
David Feil-Seifer,
Janelle Blankenburg
Abstract:
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components and an experimental substructure for other parts of the structure that are physically tested. One challenge for fast HS or real-time HS (RTHS) is associated with the analytical substructures of relatively complex structures, which could…
▽ More
Hybrid simulation (HS) is a widely used structural testing method that combines a computational substructure with a numerical model for well-understood components and an experimental substructure for other parts of the structure that are physically tested. One challenge for fast HS or real-time HS (RTHS) is associated with the analytical substructures of relatively complex structures, which could have large number of degrees of freedoms (DOFs), for instance. These large DOFs computations could be hard to perform in real-time, even with the all current hardware capacities. In this study, a metamodeling technique is proposed to represent the structural dynamic behavior of the analytical substructure. A preliminary study is conducted where a one-bay one-story concentrically braced frame (CBF) is tested under earthquake loading by using a compact HS setup at the University of Nevada, Reno. The experimental setup allows for using a small-scale brace as the experimental substructure combined with a steel frame at the prototype full-scale for the analytical substructure. Two different machine learning algorithms are evaluated to provide a valid and useful metamodeling solution for analytical substructure. The metamodels are trained with the available data that is obtained from the pure analytical solution of the prototype steel frame. The two algorithms used for develo** the metamodels are: (1) linear regression (LR) model, and (2) basic recurrent neural network (RNN). The metamodels are first validated against the pure analytical response of the structure. Next, RTHS experiments are conducted by using metamodels. RTHS test results using both LR and RNN models are evaluated, and the advantages and disadvantages of these models are discussed.
△ Less
Submitted 4 April, 2020;
originally announced April 2020.
-
A New Perspective on Newton's Law of Cooling in Frame of Newly Defined Fractional Conformable Derivative
Authors:
Erdal Bas,
Ramazan Ozarslan,
Ahu Ercan
Abstract:
In this paper, Newton's law of cooling is considered from a different perspective with newly defined fractional conformable. Obtained results are compared with experimental results and found optimal fractional orders which fit better with real data. Results show that Newton's law of cooling with fractional conformable derivative gives better results to integer order derivative. Results are given c…
▽ More
In this paper, Newton's law of cooling is considered from a different perspective with newly defined fractional conformable. Obtained results are compared with experimental results and found optimal fractional orders which fit better with real data. Results show that Newton's law of cooling with fractional conformable derivative gives better results to integer order derivative. Results are given comparatively to Newton's law of cooling with integer order and experimental data and also, fractional conformable derivative's advantages are supported by numerical illustrations and error analysis.
△ Less
Submitted 30 October, 2018;
originally announced November 2018.
-
Comparison Criteria for Discrete Fractional Sturm-Liouville Equations
Authors:
Ramazan Ozarslan,
Erdal Bas
Abstract:
In this study, we give the Sturm comparison theorems for discrete fractional Sturm-Liouville (DFSL) equations within Riemann-Liouville and Grünwald-Letnikov sense. The emergence of Sturm-Liouville equations began as one dimensional Schrödinger equation in quantum mechanics and one of the most important results is Sturm comparison theorems [27]. These theorems give information about the properties…
▽ More
In this study, we give the Sturm comparison theorems for discrete fractional Sturm-Liouville (DFSL) equations within Riemann-Liouville and Grünwald-Letnikov sense. The emergence of Sturm-Liouville equations began as one dimensional Schrödinger equation in quantum mechanics and one of the most important results is Sturm comparison theorems [27]. These theorems give information about the properties of zeros of two equations having different potentials.
△ Less
Submitted 8 February, 2018;
originally announced February 2018.
-
p-Laplacian Fractional Sturm-Liouville Problem for Diffusion Operator via Impulsive Condition
Authors:
Funda Metin Turk,
Erdal Bas
Abstract:
In this study, the existence results of solution is given for fractional p-Laplacian Stum-Liouville problem for diffusion operator of order with impulsive conditions. The derivatives are described in Riemann-Liouville and Caputo sense. The Riemann-Liouville integral operator is used to acquire the integral representation of solution. The existence of solution is demonstrate via Schaefer fixed poin…
▽ More
In this study, the existence results of solution is given for fractional p-Laplacian Stum-Liouville problem for diffusion operator of order with impulsive conditions. The derivatives are described in Riemann-Liouville and Caputo sense. The Riemann-Liouville integral operator is used to acquire the integral representation of solution. The existence of solution is demonstrate via Schaefer fixed point theorem.
△ Less
Submitted 5 February, 2018;
originally announced February 2018.
-
A New Approach for Higher Order Difference Equations and Eigenvalue problems via Physical Potentials
Authors:
Erdal Bas,
Ramazan Ozarslan
Abstract:
In this study, we give the variation of parameters method from a different viewpoint for the Nth order inhomogeneous linear ordinary difference equations with constant coefficient by means of delta exponential function . Advantage of this new approachment is to enable us to investigate the solution of difference equations in the closed form. Also, the method is supported with three difference eige…
▽ More
In this study, we give the variation of parameters method from a different viewpoint for the Nth order inhomogeneous linear ordinary difference equations with constant coefficient by means of delta exponential function . Advantage of this new approachment is to enable us to investigate the solution of difference equations in the closed form. Also, the method is supported with three difference eigenvalue problems, the second-order Sturm-Liouville problem, which is called also one dimensional Schrödinger equation, having Coulomb potential, hydrogen atom equation, and the fourth-order relaxation difference equations. We find sum representation of solution for the second order discrete Sturm-Liouville problem having Coulomb potential, hydrogen atom equation, and analytical solution of the fourth order discrete relaxation problem by the variation of parameters method via delta exponential and delta trigonometric functions .
△ Less
Submitted 3 February, 2018;
originally announced February 2018.
-
Discrete Fractional Sturm-Liouville Equations
Authors:
Erdal Bas,
Ramazan Ozarslan
Abstract:
In this study, we define discrete fractional Sturm-Liouville (DFSL) operators within Riemann-Liouville and Grünwald-Letnikov fractional operators with both delta and nabla operators. We show selfadjointness of the DFSL operator for the first time and prove some spectral properties, like orthogonality of distinct eigenfunctions, reality of eigenvalues, paralelly in integer and fractional order diff…
▽ More
In this study, we define discrete fractional Sturm-Liouville (DFSL) operators within Riemann-Liouville and Grünwald-Letnikov fractional operators with both delta and nabla operators. We show selfadjointness of the DFSL operator for the first time and prove some spectral properties, like orthogonality of distinct eigenfunctions, reality of eigenvalues, paralelly in integer and fractional order differential operator counterparts.
△ Less
Submitted 11 May, 2017;
originally announced May 2017.
-
Sturm-Liouville Difference Equations Having Special Potentials
Authors:
Erdal Bas,
Ramazan Ozarslan
Abstract:
In this paper, we present a new approachment for Sturm-Liouville problem having special potentials. We acquire the representations of solutions and asymptotic formulas for solutions with regard to initial conditions. Also, a few applications are given to show the requirement of Sturm-Liouville difference equations having potential function in view of suitability to the spectral theory. The approxi…
▽ More
In this paper, we present a new approachment for Sturm-Liouville problem having special potentials. We acquire the representations of solutions and asymptotic formulas for solutions with regard to initial conditions. Also, a few applications are given to show the requirement of Sturm-Liouville difference equations having potential function in view of suitability to the spectral theory. The approximate numerical outcomes for the eigenfunctions are compared with each other.
△ Less
Submitted 21 April, 2017; v1 submitted 30 March, 2017;
originally announced March 2017.
-
The Diffusion Difference Equation
Authors:
Erdal Bas,
Ramazan Ozarslan
Abstract:
In this work, we introduce a new difference equation which is discrete analogue of Diffusion differential equation and analyze some essential spectral properties, Diffusion difference operator is self-adjoint, eigenvalues of this problem are simple and real, eigenfunctions corresponding to distinct eigenvalues, of this problem are orthogonal. Also, some useful sum representation for the linearly i…
▽ More
In this work, we introduce a new difference equation which is discrete analogue of Diffusion differential equation and analyze some essential spectral properties, Diffusion difference operator is self-adjoint, eigenvalues of this problem are simple and real, eigenfunctions corresponding to distinct eigenvalues, of this problem are orthogonal. Also, some useful sum representation for the linearly independent solutions of Diffusion difference equation with Dirichlet boundary conditions has been acquired and by means of this result, asymptotic formula for eigenfunction is analyzed and these results are proved.
△ Less
Submitted 2 May, 2017; v1 submitted 30 March, 2017;
originally announced March 2017.
-
New Estimations for Sturm-Liouville Problems in Difference Equations
Authors:
Erdal Bas,
Ramazan Ozarslan
Abstract:
In this paper, Sturm-Liouville problem for difference equations is considered with potential function q(n). The representations of solutions are obtained by variation of parameters method. These solutions are proved, using summation by parts. Also, estimation of asymptotic expansion of the solutions are established.
In this paper, Sturm-Liouville problem for difference equations is considered with potential function q(n). The representations of solutions are obtained by variation of parameters method. These solutions are proved, using summation by parts. Also, estimation of asymptotic expansion of the solutions are established.
△ Less
Submitted 24 April, 2015;
originally announced May 2015.
-
A Note Basis Properties for Fractional Hydrogen Atom Equation
Authors:
E. Bas,
F. Metin
Abstract:
In this paper, spectral analysis of fractional Sturm Liouville problem defined on (0,1], having the singularity of type at zero and research the fundamental properties of the eigenfunctions and eigenvalues for the operator. We show that the eigenvalues and eigenfunctions of the problem are real and orthogonal, respectively. Furthermore,we give some important theorems and lemmas for fractional hydr…
▽ More
In this paper, spectral analysis of fractional Sturm Liouville problem defined on (0,1], having the singularity of type at zero and research the fundamental properties of the eigenfunctions and eigenvalues for the operator. We show that the eigenvalues and eigenfunctions of the problem are real and orthogonal, respectively. Furthermore,we give some important theorems and lemmas for fractional hydrogen atom equation.
△ Less
Submitted 24 July, 2013; v1 submitted 12 March, 2013;
originally announced March 2013.
-
Spectral Properties of Fractional Sturm-Liouville Problem for Diffusion Operator
Authors:
Erdal Bas,
Funda Metin
Abstract:
In this study, we give a regular fractional Sturm Liouville problem for diffusion operator (FSLPDO), research the spectral properties of the eigenfunctions and eigenvalues of the diffusion operator. We show that the eigenvalues and eigenfunctions of (FSLPDO) are real and orthogonal, respectively and fractional diffusion operator is self adjoint.
In this study, we give a regular fractional Sturm Liouville problem for diffusion operator (FSLPDO), research the spectral properties of the eigenfunctions and eigenvalues of the diffusion operator. We show that the eigenvalues and eigenfunctions of (FSLPDO) are real and orthogonal, respectively and fractional diffusion operator is self adjoint.
△ Less
Submitted 6 November, 2013; v1 submitted 19 December, 2012;
originally announced December 2012.
-
Load-Balancing Spatially Located Computations using Rectangular Partitions
Authors:
Erik Saule,
Erdeniz Ö. Baş,
Ümit V. Çatalyürek
Abstract:
Distributing spatially located heterogeneous workloads is an important problem in parallel scientific computing. We investigate the problem of partitioning such workloads (represented as a matrix of non-negative integers) into rectangles, such that the load of the most loaded rectangle (processor) is minimized. Since finding the optimal arbitrary rectangle-based partition is an NP-hard problem, we…
▽ More
Distributing spatially located heterogeneous workloads is an important problem in parallel scientific computing. We investigate the problem of partitioning such workloads (represented as a matrix of non-negative integers) into rectangles, such that the load of the most loaded rectangle (processor) is minimized. Since finding the optimal arbitrary rectangle-based partition is an NP-hard problem, we investigate particular classes of solutions: rectilinear, jagged and hierarchical. We present a new class of solutions called m-way jagged partitions, propose new optimal algorithms for m-way jagged partitions and hierarchical partitions, propose new heuristic algorithms, and provide worst case performance analyses for some existing and new heuristics. Moreover, the algorithms are tested in simulation on a wide set of instances. Results show that two of the algorithms we introduce lead to a much better load balance than the state-of-the-art algorithms. We also show how to design a two-phase algorithm that reaches different time/quality tradeoff.
△ Less
Submitted 13 April, 2011;
originally announced April 2011.