-
Spectral Determinants of Almost Equilateral Quantum Graphs
Authors:
Jonathan Harrison,
Tracy Weyand
Abstract:
Kirchoff's matrix tree theorem of 1847 connects the number of spanning trees of a graph to the spectral determinant of the discrete Laplacian [6]. Recently an analogue was obtained for quantum graphs relating the number of spanning trees to the spectral determinant of a Laplacian acting on functions on a metric graph with standard (Neumann-like) vertex conditions [11]. This result holds for quantu…
▽ More
Kirchoff's matrix tree theorem of 1847 connects the number of spanning trees of a graph to the spectral determinant of the discrete Laplacian [6]. Recently an analogue was obtained for quantum graphs relating the number of spanning trees to the spectral determinant of a Laplacian acting on functions on a metric graph with standard (Neumann-like) vertex conditions [11]. This result holds for quantum graphs where the edge lengths are close together. A quantum graph where the edge lengths are all equal is called equilateral. Here we consider equilateral graphs where we perturb the length of a single edge (almost equilateral graphs). We analyze the spectral determinant of almost equilateral complete graphs, complete bipartite graphs, and circulant graphs. This provides a measure of how fast the spectral determinant changes with respect to changes in an edge length. We apply these results to estimate the width of a window of edge lengths where the connection between the number of spanning trees and the spectral determinant can be observed. The results suggest the connection holds for a much wider window of edge lengths than is required in [11].
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
VideoPrism: A Foundational Visual Encoder for Video Understanding
Authors:
Long Zhao,
Nitesh B. Gundavarapu,
Liangzhe Yuan,
Hao Zhou,
Shen Yan,
Jennifer J. Sun,
Luke Friedman,
Rui Qian,
Tobias Weyand,
Yue Zhao,
Rachel Hornung,
Florian Schroff,
Ming-Hsuan Yang,
David A. Ross,
Huisheng Wang,
Hartwig Adam,
Mikhail Sirotenko,
Ting Liu,
Boqing Gong
Abstract:
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic…
▽ More
We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic video embeddings and a token shuffling scheme, enabling VideoPrism to focus primarily on the video modality while leveraging the invaluable text associated with videos. We extensively test VideoPrism on four broad groups of video understanding tasks, from web video question answering to CV for science, achieving state-of-the-art performance on 31 out of 33 video understanding benchmarks.
△ Less
Submitted 15 June, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Authors:
Liangzhe Yuan,
Nitesh Bharadwaj Gundavarapu,
Long Zhao,
Hao Zhou,
Yin Cui,
Lu Jiang,
Xuan Yang,
Menglin Jia,
Tobias Weyand,
Luke Friedman,
Mikhail Sirotenko,
Huisheng Wang,
Florian Schroff,
Hartwig Adam,
Ming-Hsuan Yang,
Ting Liu,
Boqing Gong
Abstract:
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoG…
▽ More
We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoGLUE score (VGS) to measure an FMs efficacy and efficiency when adapting to general video understanding tasks. Our main findings are as follows. First, task-specialized models significantly outperform the six FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second,video-native FMs, whose pretraining data contains the video modality, are generally better than image-native FMs in classifying motion-rich videos, localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks(e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs. Our code is released under: https://github.com/tensorflow/models/tree/master/official/projects/videoglue.
△ Less
Submitted 1 December, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Can One Hear the Spanning Trees of a Quantum Graph?
Authors:
Jonathan Harrison,
Tracy Weyand
Abstract:
Kirchhoff showed that the number of spanning trees of a graph is the spectral determinant of the combinatorial Laplacian divided by the number of vertices; we reframe this result in the quantum graph setting. We prove that the spectral determinant of the Laplace operator on a finite connected metric graph with standard (Neummann-Kirchhoff) vertex conditions determines the number of spanning trees…
▽ More
Kirchhoff showed that the number of spanning trees of a graph is the spectral determinant of the combinatorial Laplacian divided by the number of vertices; we reframe this result in the quantum graph setting. We prove that the spectral determinant of the Laplace operator on a finite connected metric graph with standard (Neummann-Kirchhoff) vertex conditions determines the number of spanning trees when the lengths of the edges of the metric graph are sufficiently close together. To obtain this result, we analyze an equilateral quantum graph whose spectrum is closely related to spectra of discrete graph operators and then use the continuity of the spectral determinant under perturbations of the edge lengths.
△ Less
Submitted 3 March, 2023; v1 submitted 2 September, 2022;
originally announced September 2022.
-
Improving Fairness in Large-Scale Object Recognition by CrowdSourced Demographic Information
Authors:
Zu Kim,
André Araujo,
Bingyi Cao,
Cam Askew,
Jack Sim,
Mike Green,
N'Mah Fodiatu Yilla,
Tobias Weyand
Abstract:
There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML f…
▽ More
There has been increasing awareness of ethical issues in machine learning, and fairness has become an important research topic. Most fairness efforts in computer vision have been focused on human sensing applications and preventing discrimination by people's physical attributes such as race, skin color or age by increasing visual representation for particular demographic groups. We argue that ML fairness efforts should extend to object recognition as well. Buildings, artwork, food and clothing are examples of the objects that define human culture. Representing these objects fairly in machine learning datasets will lead to models that are less biased towards a particular culture and more inclusive of different traditions and values. There exist many research datasets for object recognition, but they have not carefully considered which classes should be included, or how much training data should be collected per class. To address this, we propose a simple and general approach, based on crowdsourcing the demographic composition of the contributors: we define fair relevance scores, estimate them, and assign them to each class. We showcase its application to the landmark recognition domain, presenting a detailed analysis and the final fairer landmark rankings. We present analysis which leads to a much fairer coverage of the world compared to existing datasets. The evaluation dataset was used for the 2021 Google Landmark Challenges, which was the first of a kind with an emphasis on fairness in generic object recognition.
△ Less
Submitted 2 June, 2022;
originally announced June 2022.
-
Towards A Fairer Landmark Recognition Dataset
Authors:
Zu Kim,
André Araujo,
Bingyi Cao,
Cam Askew,
Jack Sim,
Mike Green,
N'Mah Fodiatu Yilla,
Tobias Weyand
Abstract:
We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. T…
▽ More
We introduce a new landmark recognition dataset, which is created with a focus on fair worldwide representation. While previous work proposes to collect as many images as possible from web repositories, we instead argue that such approaches can lead to biased data. To create a more comprehensive and equitable dataset, we start by defining the fair relevance of a landmark to the world population. These relevances are estimated by combining anonymized Google Maps user contribution statistics with the contributors' demographic information. We present a stratification approach and analysis which leads to a much fairer coverage of the world, compared to existing datasets. The resulting datasets are used to evaluate computer vision models as part of the the Google Landmark Recognition and RetrievalChallenges 2021.
△ Less
Submitted 6 June, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food
Authors:
Quin Thames,
Arjun Karpur,
Wade Norris,
Fangting Xia,
Liviu Panait,
Tobias Weyand,
Jack Sim
Abstract:
Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field that lack sufficient diversity or labels required for training models with nutritional understanding capability. We introduce Nutrition5k, a novel dat…
▽ More
Understanding the nutritional content of food from visual data is a challenging computer vision problem, with the potential to have a positive and widespread impact on public health. Studies in this area are limited to existing datasets in the field that lack sufficient diversity or labels required for training models with nutritional understanding capability. We introduce Nutrition5k, a novel dataset of 5k diverse, real world food dishes with corresponding video streams, depth images, component weights, and high accuracy nutritional content annotation. We demonstrate the potential of this dataset by training a computer vision algorithm capable of predicting the caloric and macronutrient values of a complex, real world dish at an accuracy that outperforms professional nutritionists. Further we present a baseline for incorporating depth sensor data to improve nutrition predictions. We will publicly release Nutrition5k in the hope that it will accelerate innovation in the space of nutritional understanding.
△ Less
Submitted 22 June, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval
Authors:
Tobias Weyand,
Andre Araujo,
Bingyi Cao,
Jack Sim
Abstract:
While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of…
▽ More
While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at https://github.com/cvdfoundation/google-landmark.
△ Less
Submitted 2 November, 2020; v1 submitted 3 April, 2020;
originally announced April 2020.
-
CPlaNet: Enhancing Image Geolocalization by Combinatorial Partitioning of Maps
Authors:
Paul Hongsuck Seo,
Tobias Weyand,
Jack Sim,
Bohyung Han
Abstract:
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granular…
▽ More
Image geolocalization is the task of identifying the location depicted in a photo based only on its visual information. This task is inherently challenging since many photos have only few, possibly ambiguous cues to their geolocation. Recent work has cast this task as a classification problem by partitioning the earth into a set of discrete cells that correspond to geographic regions. The granularity of this partitioning presents a critical trade-off; using fewer but larger cells results in lower location accuracy while using more but smaller cells reduces the number of training examples per class and increases model size, making the model prone to overfitting. To tackle this issue, we propose a simple but effective algorithm, combinatorial partitioning, which generates a large number of fine-grained output classes by intersecting multiple coarse-grained partitionings of the earth. Each classifier votes for the fine-grained classes that overlap with their respective coarse-grained ones. This technique allows us to predict locations at a fine scale while maintaining sufficient training examples per class. Our algorithm achieves the state-of-the-art performance in location recognition on multiple benchmark datasets.
△ Less
Submitted 6 August, 2018;
originally announced August 2018.
-
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Authors:
Andrew G. Howard,
Menglong Zhu,
Bo Chen,
Dmitry Kalenichenko,
Weijun Wang,
Tobias Weyand,
Marco Andreetto,
Hartwig Adam
Abstract:
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choo…
▽ More
We present a class of efficient models called MobileNets for mobile and embedded vision applications. MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks. We introduce two simple global hyper-parameters that efficiently trade off between latency and accuracy. These hyper-parameters allow the model builder to choose the right sized model for their application based on the constraints of the problem. We present extensive experiments on resource and accuracy tradeoffs and show strong performance compared to other popular models on ImageNet classification. We then demonstrate the effectiveness of MobileNets across a wide range of applications and use cases including object detection, finegrain classification, face attributes and large scale geo-localization.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
Large-Scale Image Retrieval with Attentive Deep Local Features
Authors:
Hyeonwoo Noh,
Andre Araujo,
Jack Sim,
Tobias Weyand,
Bohyung Han
Abstract:
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for keypoint selecti…
▽ More
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELF (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for keypoint selection, which shares most network layers with the descriptor. This framework can be used for image retrieval as a drop-in replacement for other keypoint detectors and descriptors, enabling more accurate feature matching and geometric verification. Our system produces reliable confidence scores to reject false positives---in particular, it is robust against queries that have no correct match in the database. To evaluate the proposed descriptor, we introduce a new large-scale dataset, referred to as Google-Landmarks dataset, which involves challenges in both database and query such as background clutter, partial occlusion, multiple landmarks, objects in variable scales, etc. We show that DELF outperforms the state-of-the-art global and local descriptors in the large-scale setting by significant margins. Code and dataset can be found at the project webpage: https://github.com/tensorflow/models/tree/master/research/delf .
△ Less
Submitted 2 February, 2018; v1 submitted 19 December, 2016;
originally announced December 2016.
-
Relating Zeta Functions of Discrete and Quantum Graphs
Authors:
Jonathan Harrison,
Tracy Weyand
Abstract:
We write the spectral zeta function of the Laplace operator on an equilateral metric graph in terms of the spectral zeta function of the normalized Laplace operator on the corresponding discrete graph. To do this, we apply a relation between the spectrum of the Laplacian on a discrete graph and that of the Laplacian on an equilateral metric graph. As a by-product, we determine how the multiplicity…
▽ More
We write the spectral zeta function of the Laplace operator on an equilateral metric graph in terms of the spectral zeta function of the normalized Laplace operator on the corresponding discrete graph. To do this, we apply a relation between the spectrum of the Laplacian on a discrete graph and that of the Laplacian on an equilateral metric graph. As a by-product, we determine how the multiplicity of eigenvalues of the quantum graph, that are also in the spectrum of the graph with Dirichlet conditions at the vertices, depends on the graph geometry. Finally we apply the result to calculate the vacuum energy and spectral determinant of a complete bipartite graph and compare our results with those for a star graph, a graph in which all vertices are connected to a central vertex by a single edge.
△ Less
Submitted 17 October, 2017; v1 submitted 13 December, 2016;
originally announced December 2016.
-
Zeta Functions of the Dirac Operator on Quantum Graphs
Authors:
J. M. Harrison,
T. Weyand,
K. Kirsten
Abstract:
We construct spectral zeta functions for the Dirac operator on metric graphs. We start with the case of a rose graph, a graph with a single vertex where every edge is a loop. The technique is then developed to cover any finite graph with general energy independent matching conditions at the vertices. The regularized spectral determinant of the Dirac operator is also obtained as the derivative of t…
▽ More
We construct spectral zeta functions for the Dirac operator on metric graphs. We start with the case of a rose graph, a graph with a single vertex where every edge is a loop. The technique is then developed to cover any finite graph with general energy independent matching conditions at the vertices. The regularized spectral determinant of the Dirac operator is also obtained as the derivative of the zeta function at a special value. In each case the zeta function is formulated using a contour integral method, which extends results obtained for Laplace and Schrodinger operators on graphs.
△ Less
Submitted 24 June, 2016;
originally announced June 2016.
-
PlaNet - Photo Geolocation with Convolutional Neural Networks
Authors:
Tobias Weyand,
Ilya Kostrikov,
James Philbin
Abstract:
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow on…
▽ More
Is it possible to build a system to determine the location where a photo was taken using just its pixels? In general, the problem seems exceptionally difficult: it is trivial to construct situations where no location can be inferred. Yet images often contain informative cues such as landmarks, weather patterns, vegetation, road markings, and architectural details, which in combination may allow one to determine an approximate location and occasionally an exact location. Websites such as GeoGuessr and View from your Window suggest that humans are relatively good at integrating these cues to geolocate images, especially en-masse. In computer vision, the photo geolocation problem is usually approached using image retrieval methods. In contrast, we pose the problem as one of classification by subdividing the surface of the earth into thousands of multi-scale geographic cells, and train a deep network using millions of geotagged images. While previous approaches only recognize landmarks or perform approximate matching using global image descriptors, our model is able to use and integrate multiple visible cues. We show that the resulting model, called PlaNet, outperforms previous approaches and even attains superhuman levels of accuracy in some cases. Moreover, we extend our model to photo albums by combining it with a long short-term memory (LSTM) architecture. By learning to exploit temporal coherence to geolocate uncertain photos, we demonstrate that this model achieves a 50% performance improvement over the single-image model.
△ Less
Submitted 17 February, 2016;
originally announced February 2016.
-
Anomalous nodal count and singularities in the dispersion relation of honeycomb graphs
Authors:
Ram Band,
Gregory Berkolaiko,
Tracy Weyand
Abstract:
We study the nodal count of the so-called bi-dendral graphs and show that it exhibits an anomaly: the nodal surplus is never equal to 0 or $β$, the first Betti number of the graph. According to the nodal-magnetic theorem, this means that bands of the magnetic spectrum (dispersion relation) of such graphs do not have maxima or minima at the "usual" symmetry points of the fundamental domain of the r…
▽ More
We study the nodal count of the so-called bi-dendral graphs and show that it exhibits an anomaly: the nodal surplus is never equal to 0 or $β$, the first Betti number of the graph. According to the nodal-magnetic theorem, this means that bands of the magnetic spectrum (dispersion relation) of such graphs do not have maxima or minima at the "usual" symmetry points of the fundamental domain of the reciprocal space of magnetic parameters.
In search of the missing extrema we prove a necessary condition for a smooth critical point to happen inside the reciprocal fundamental domain. Using this condition, we identify the extrema as the singularities in the dispersion relation of the maximal abelian cover of the graph (the honeycomb graph being an important example).
In particular, our results show that the anomalous nodal count is an indication of the presence of the conical points in the dispersion relation of the maximal universal cover. Also, we discover that the conical points are present in the dispersion relation of graphs with much less symmetry than was required in previous investigations.
△ Less
Submitted 14 November, 2015; v1 submitted 24 March, 2015;
originally announced March 2015.
-
Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation
Authors:
Tobias Weyand,
Bastian Leibe
Abstract:
The task of a visual landmark recognition system is to identify photographed buildings or objects in query photos and to provide the user with relevant information on them. With their increasing coverage of the world's landmark buildings and objects, Internet photo collections are now being used as a source for building such systems in a fully automatic fashion. This process typically consists of…
▽ More
The task of a visual landmark recognition system is to identify photographed buildings or objects in query photos and to provide the user with relevant information on them. With their increasing coverage of the world's landmark buildings and objects, Internet photo collections are now being used as a source for building such systems in a fully automatic fashion. This process typically consists of three steps: clustering large amounts of images by the objects they depict; determining object names from user-provided tags; and building a robust, compact, and efficient recognition index. To this date, however, there is little empirical information on how well current approaches for those steps perform in a large-scale open-set mining and recognition task. Furthermore, there is little empirical information on how recognition performance varies for different types of landmark objects and where there is still potential for improvement. With this paper, we intend to fill these gaps. Using a dataset of 500k images from Paris, we analyze each component of the landmark recognition pipeline in order to answer the following questions: How many and what kinds of objects can be discovered automatically? How can we best use the resulting image clusters to recognize the object in a query? How can the object be efficiently represented in memory for recognition? How reliably can semantic information be extracted? And finally: What are the limiting factors in the resulting pipeline from query to semantics? We evaluate how different choices of methods and parameters for the individual pipeline steps affect overall system performance and examine their effects for different query categories such as buildings, paintings or sculptures.
△ Less
Submitted 18 September, 2014;
originally announced September 2014.
-
Stability of eigenvalues of quantum graphs with respect to magnetic perturbation and the nodal count of the eigenfunctions
Authors:
G. Berkolaiko,
T. Weyand
Abstract:
We prove an analogue of the magnetic nodal theorem on quantum graphs: the number of zeros $φ$ of the $n$-th eigenfunction of the Schrödinger operator on a quantum graph is related to the stability of the $n$-th eigenvalue of the perturbation of the operator by magnetic potential. More precisely, we consider the $n$-th eigenvalue as a function of the magnetic perturbation and show that its Morse in…
▽ More
We prove an analogue of the magnetic nodal theorem on quantum graphs: the number of zeros $φ$ of the $n$-th eigenfunction of the Schrödinger operator on a quantum graph is related to the stability of the $n$-th eigenvalue of the perturbation of the operator by magnetic potential. More precisely, we consider the $n$-th eigenvalue as a function of the magnetic perturbation and show that its Morse index at zero magnetic field is equal to $φ- (n-1)$.
△ Less
Submitted 21 December, 2013; v1 submitted 18 December, 2012;
originally announced December 2012.