Search | arXiv e-print repository

Data-dependent and Oracle Bounds on Forgetting in Continual Learning

Abstract: In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle… ▽ More In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle upper bounds that apply regardless of model and algorithm choice, as well as bounds for Gibbs posteriors. We derive an algorithm inspired by our bounds and demonstrate empirically that our approach yields improved forward and backward transfer. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.01496 [pdf, other]

Evidence for five types of fixation during a random saccade eye tracking task: Implications for the study of oculomotor fatigue

Authors: Lee Friedman, Oleg V. Komogortsev

Abstract: Our interest was to evaluate changes in fixation duration as a function of time-on-task (TOT) during a random saccade task. We employed a large, publicly available dataset. The frequency histogram of fixation durations was multimodal and modelled as a Gaussian mixture. We found five fixation types. The ``ideal'' response would be a single accurate saccade after each target movement, with a typical… ▽ More Our interest was to evaluate changes in fixation duration as a function of time-on-task (TOT) during a random saccade task. We employed a large, publicly available dataset. The frequency histogram of fixation durations was multimodal and modelled as a Gaussian mixture. We found five fixation types. The ``ideal'' response would be a single accurate saccade after each target movement, with a typical saccade latency of 200-250 msec, followed by a long fixation (> 800 msec) until the next target jump. We found fixations like this, but they comprised only 10% of all fixations and were the first fixation after target movement only 23.4% of the time. More frequently (57.4% of the time), the first fixation after target movement was short (117.7 msec mean) and was commonly followed by a corrective saccade. Across the entire 100 sec of the task, median total fixation duration decreased. This decrease was approximated with a power law fit with R^2=0.94. A detailed examination of the frequency of each of our five fixation types over time on task (TOT) revealed that the three shortest duration fixation types became more and more frequent with TOT whereas the two longest fixations became less and less frequent. In all cases, the changes over TOT followed power law relationships, with R^2 values between 0.73 and 0.93. We concluded that, over the 100 second duration of our task, long fixations are common in the first 15 to 22 seconds but become less common after that. Short fixations are relatively uncommon in the first 15 to 22 seconds but become more and more common as the task progressed. Apparently. the ability to produce an ideal response, although somewhat likely in the first 22 seconds, rapidly declines. This might be related to a noted decline in saccade accuracy over time. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 23 pages, 19 figures

arXiv:2403.07210 [pdf, other]

Evaluation of Eye Tracking Signal Quality for Virtual Reality Applications: A Case Study in the Meta Quest Pro

Authors: Samantha Aziz, Dillon J Lohr, Lee Friedman, Oleg Komogortsev

Abstract: We present an extensive, in-depth analysis of the eye tracking capabilities of the Meta Quest Pro virtual reality headset using a dataset of eye movement recordings collected from 78 participants. In addition to presenting classical signal quality metrics--spatial accuracy, spatial precision and linearity--in ideal settings, we also study the impact of background luminance and headset slippage on… ▽ More We present an extensive, in-depth analysis of the eye tracking capabilities of the Meta Quest Pro virtual reality headset using a dataset of eye movement recordings collected from 78 participants. In addition to presenting classical signal quality metrics--spatial accuracy, spatial precision and linearity--in ideal settings, we also study the impact of background luminance and headset slippage on device performance. We additionally present a user-centered analysis of eye tracking signal quality, where we highlight the potential differences in user experience as a function of device performance. This work contributes to a growing understanding of eye tracking signal quality in virtual reality headsets, where the performance of applications such as gaze-based interaction, foveated rendering, and social gaze are directly dependent on the quality of eye tracking signal. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 14 pages

arXiv:2403.05726 [pdf, other]

Augmentations vs Algorithms: What Works in Self-Supervised Learning

Authors: Warren Morningstar, Alex Bijamov, Chris Duvarney, Luke Friedman, Neha Kalibhat, Luyang Liu, Philip Mansfield, Renan Rojas-Gomez, Karan Singhal, Bradley Green, Sushant Prakash

Abstract: We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL). While the recent literature in this space leaves the impression that the pretraining algorithm is of critical importance to performance, understanding its effect is complicated by the difficulty in making objective and direct comparisons between methods. We propos… ▽ More We study the relative effects of data augmentations, pretraining algorithms, and model architectures in Self-Supervised Learning (SSL). While the recent literature in this space leaves the impression that the pretraining algorithm is of critical importance to performance, understanding its effect is complicated by the difficulty in making objective and direct comparisons between methods. We propose a new framework which unifies many seemingly disparate SSL methods into a single shared template. Using this framework, we identify aspects in which methods differ and observe that in addition to changing the pretraining algorithm, many works also use new data augmentations or more powerful model architectures. We compare several popular SSL methods using our framework and find that many algorithmic additions, such as prediction networks or new losses, have a minor impact on downstream task performance (often less than $1\%$), while enhanced augmentation techniques offer more significant performance improvements ($2-4\%$). Our findings challenge the premise that SSL is being driven primarily by algorithmic improvements, and suggest instead a bitter lesson for SSL: that augmentation diversity and data / model scale are more critical contributors to recent advances in self-supervised learning. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 18 pages, 1 figure

arXiv:2402.16399 [pdf, other]

Analysis of Embeddings Learned by End-to-End Machine Learning Eye Movement-driven Biometrics Pipeline

Authors: Mehedi Hasan Raju, Lee Friedman, Dillon J Lohr, Oleg V Komogortsev

Abstract: This paper expands on the foundational concept of temporal persistence in biometric systems, specifically focusing on the domain of eye movement biometrics facilitated by machine learning. Unlike previous studies that primarily focused on develo** biometric authentication systems, our research delves into the embeddings learned by these systems, particularly examining their temporal persistence,… ▽ More This paper expands on the foundational concept of temporal persistence in biometric systems, specifically focusing on the domain of eye movement biometrics facilitated by machine learning. Unlike previous studies that primarily focused on develo** biometric authentication systems, our research delves into the embeddings learned by these systems, particularly examining their temporal persistence, reliability, and biometric efficacy in response to varying input data. Utilizing two publicly available eye-movement datasets, we employed the state-of-the-art Eye Know You Too machine learning pipeline for our analysis. We aim to validate whether the machine learning-derived embeddings in eye movement biometrics mirror the temporal persistence observed in traditional biometrics. Our methodology involved conducting extensive experiments to assess how different lengths and qualities of input data influence the performance of eye movement biometrics more specifically how it impacts the learned embeddings. We also explored the reliability and consistency of the embeddings under varying data conditions. Three key metrics (kendall's coefficient of concordance, intercorrelations, and equal error rate) were employed to quantitatively evaluate our findings. The results reveal while data length significantly impacts the stability of the learned embeddings, however, the intercorrelations among embeddings show minimal effect. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: 15 pages, 10 Figures

arXiv:2402.13217 [pdf, other]

VideoPrism: A Foundational Visual Encoder for Video Understanding

Authors: Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

Abstract: We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic… ▽ More We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic video embeddings and a token shuffling scheme, enabling VideoPrism to focus primarily on the video modality while leveraging the invaluable text associated with videos. We extensively test VideoPrism on four broad groups of video understanding tasks, from web video question answering to CV for science, achieving state-of-the-art performance on 31 out of 33 video understanding benchmarks. △ Less

Submitted 15 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted to ICML 2024. v2: added retrieval results on MSRVTT (1K-A), more data analyses, and ablation studies

arXiv:2307.03166 [pdf, other]

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Authors: Liangzhe Yuan, Nitesh Bharadwaj Gundavarapu, Long Zhao, Hao Zhou, Yin Cui, Lu Jiang, Xuan Yang, Menglin Jia, Tobias Weyand, Luke Friedman, Mikhail Sirotenko, Huisheng Wang, Florian Schroff, Hartwig Adam, Ming-Hsuan Yang, Ting Liu, Boqing Gong

Abstract: We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoG… ▽ More We evaluate existing foundation models video understanding capabilities using a carefully designed experiment protocol consisting of three hallmark tasks (action recognition, temporal localization, and spatiotemporal localization), eight datasets well received by the community, and four adaptation methods tailoring a foundation model (FM) for a downstream task. Moreover, we propose a scalar VideoGLUE score (VGS) to measure an FMs efficacy and efficiency when adapting to general video understanding tasks. Our main findings are as follows. First, task-specialized models significantly outperform the six FMs studied in this work, in sharp contrast to what FMs have achieved in natural language and image understanding. Second,video-native FMs, whose pretraining data contains the video modality, are generally better than image-native FMs in classifying motion-rich videos, localizing actions in time, and understanding a video of more than one action. Third, the video-native FMs can perform well on video tasks under light adaptations to downstream tasks(e.g., freezing the FM backbones), while image-native FMs win in full end-to-end finetuning. The first two observations reveal the need and tremendous opportunities to conduct research on video-focused FMs, and the last confirms that both tasks and adaptation methods matter when it comes to the evaluation of FMs. Our code is released under: https://github.com/tensorflow/models/tree/master/official/projects/videoglue. △ Less

Submitted 1 December, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: Fixes some typos and include project open-source page: https://github.com/tensorflow/models/tree/master/official/projects/videoglue

arXiv:2305.07961 [pdf, other]

Leveraging Large Language Models in Conversational Recommender Systems

Authors: Luke Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexi Chen, Manoj Tiwari

Abstract: A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this pa… ▽ More A Conversational Recommender System (CRS) offers increased transparency and control to users by enabling them to engage with the system through a real-time multi-turn dialogue. Recently, Large Language Models (LLMs) have exhibited an unprecedented ability to converse naturally and incorporate world knowledge and common-sense reasoning into language understanding, unlocking the potential of this paradigm. However, effectively leveraging LLMs within a CRS introduces new technical challenges, including properly understanding and controlling a complex conversation and retrieving from external sources of information. These issues are exacerbated by a large, evolving item corpus and a lack of conversational data for training. In this paper, we provide a roadmap for building an end-to-end large-scale CRS using LLMs. In particular, we propose new implementations for user preference understanding, flexible dialogue management and explainable recommendations as part of an integrated architecture powered by LLMs. For improved personalization, we describe how an LLM can consume interpretable natural language user profiles and use them to modulate session-level context. To overcome conversational data limitations in the absence of an existing production CRS, we propose techniques for building a controllable LLM-based user simulator to generate synthetic conversations. As a proof of concept we introduce RecLLM, a large-scale CRS for YouTube videos built on LaMDA, and demonstrate its fluency and diverse functionality through some illustrative example conversations. △ Less

Submitted 16 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

arXiv:2305.04413 [pdf, other]

doi 10.1145/3649902.3653353

Signal vs Noise in Eye-tracking Data: Biometric Implications and Identity Information Across Frequencies

Authors: Mehedi H. Raju, Lee Friedman, Dillon Lohr, Oleg Komogortsev

Abstract: Prior research states that frequencies below 75 Hz in eye-tracking data represent the primary eye movement termed ``signal'' while those above 75 Hz are deemed ``noise''. This study examines the biometric significance of this signal-noise distinction and its privacy implications. There are important individual differences in a person's eye movement, which lead to reliable biometric performance in… ▽ More Prior research states that frequencies below 75 Hz in eye-tracking data represent the primary eye movement termed ``signal'' while those above 75 Hz are deemed ``noise''. This study examines the biometric significance of this signal-noise distinction and its privacy implications. There are important individual differences in a person's eye movement, which lead to reliable biometric performance in the ``signal'' part. Despite minimal eye-movement information in the ``noise'' recordings, there might be significant individual differences. Our results confirm the ``signal'' predominantly contains identity-specific information, yet the ``noise'' also possesses unexpected identity-specific data. This consistency holds for both short-(approx. 20 min) and long-term (approx. 1 year) biometric evaluations. Understanding the location of identity data within the eye movement spectrum is essential for privacy preservation. △ Less

Submitted 17 April, 2024; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: 10 pages, 2 figures, 1 table

arXiv:2303.02134 [pdf, other]

doi 10.16910/jemr.14.3.6

Filtering Eye-Tracking Data From an EyeLink 1000: Comparing Heuristic, Savitzky-Golay, IIR and FIR Digital Filters

Authors: Mehedi H. Raju, Lee Friedman, Troy M. Bouman, Oleg V. Komogortsev

Abstract: In a previous report (Raju et al.,2023) we concluded that, if the goal was to preserve events such as saccades, microsaccades, and smooth pursuit in eye-tracking recordings, data with sine wave frequencies less than 100 Hz (-3db) were the signal and data above 100 Hz were noise. We compare 5 filters in their ability to preserve signal and remove noise. Specifically, we compared the proprietary STD… ▽ More In a previous report (Raju et al.,2023) we concluded that, if the goal was to preserve events such as saccades, microsaccades, and smooth pursuit in eye-tracking recordings, data with sine wave frequencies less than 100 Hz (-3db) were the signal and data above 100 Hz were noise. We compare 5 filters in their ability to preserve signal and remove noise. Specifically, we compared the proprietary STD and EXTRA heuristic filters provided by our EyeLink 1000 (SR-Research, Ottawa, Canada), a Savitzky-Golay (SG) filter, an infinite impulse response (IIR) filter (low-pass Butterworth), and a finite impulse filter (FIR). For each of the non-heuristic filters, we systematically searched for optimal parameters. Both the IIR and the FIR filters were zero-phase filters. Mean frequency response profiles and amplitude spectra for all 5 filters are provided. In addition, we examined the effect of our filters on a noisy recording. Our FIR filter had the sharpest roll-off of any filter. Therefore, it maintained the signal and removed noise more effectively than any other filter. On this basis, we recommend the use of our FIR filter. Several reports have shown that filtering increased the temporal autocorrelation of a signal. To address this, the present filters were also evaluated in terms of autocorrelation (specifically the first 3 lags). Of all our filters, the STD filter introduced the least amount of autocorrelation. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 10 pages, 5 figures. arXiv admin note: text overlap with arXiv:2209.07657

Journal ref: Journal of Eye Movement Research. 14, 3, 6 (Oct. 2023)

arXiv:2302.00029 [pdf]

doi 10.16910/jemr.14.3.5

Determining Which Sine Wave Frequencies Correspond to Signal and Which Correspond to Noise in Eye-Tracking Time-Series

Authors: Mehedi H. Raju, Lee Friedman, Troy M. Bouman, Oleg V. Komogortsev

Abstract: The Fourier theorem states that any time-series can be decomposed into a set of sinusoidal frequencies, each with its own phase and amplitude. The literature suggests that some frequencies are important to reproduce key qualities of eye-movements ("signal") and some of frequencies are not important ("noise"). To investigate what is signal and what is noise, we analyzed our dataset in three ways: (… ▽ More The Fourier theorem states that any time-series can be decomposed into a set of sinusoidal frequencies, each with its own phase and amplitude. The literature suggests that some frequencies are important to reproduce key qualities of eye-movements ("signal") and some of frequencies are not important ("noise"). To investigate what is signal and what is noise, we analyzed our dataset in three ways: (1) visual inspection of plots of saccade, microsaccade and smooth pursuit exemplars; (2) analysis of the percentage of variance accounted for (PVAF) in 1,033 unfiltered saccade trajectories by each frequency band; (3) analyzing the main sequence relationship between saccade peak velocity and amplitude, based on a power law fit. Visual inspection suggested that frequencies up to 75 Hz are required to represent microsaccades. Our PVAF analysis indicated that signals in the 0-25 Hz band account for nearly 100% of the variance in saccade trajectories. Power law coefficients (a, b) return to unfiltered levels for signals low-pass filtered at 75 Hz or higher. We conclude that to maintain eye movement signal and reduce noise, a cutoff frequency of 75 Hz is appropriate. We explain why, given this finding, a minimum sampling rate of 750 Hz is suggested. △ Less

Submitted 19 October, 2023; v1 submitted 31 January, 2023; originally announced February 2023.

Comments: Pages-16, Figures-11, Tables-4. arXiv admin note: text overlap with arXiv:2209.07657

Journal ref: Journal of Eye Movement Research. 14, 3, 5 (Dec. 2023)

arXiv:2210.07533 [pdf, other]

GazeBaseVR, a large-scale, longitudinal, binocular eye-tracking dataset collected in virtual reality

Authors: Dillon Lohr, Samantha Aziz, Lee Friedman, Oleg V Komogortsev

Abstract: We present GazeBaseVR, a large-scale, longitudinal, binocular eye-tracking (ET) dataset collected at 250 Hz with an ET-enabled virtual-reality (VR) headset. GazeBaseVR comprises 5,020 binocular recordings from a diverse population of 407 college-aged participants. Participants were recorded up to six times each over a 26-month period, each time performing a series of five different ET tasks: (1) a… ▽ More We present GazeBaseVR, a large-scale, longitudinal, binocular eye-tracking (ET) dataset collected at 250 Hz with an ET-enabled virtual-reality (VR) headset. GazeBaseVR comprises 5,020 binocular recordings from a diverse population of 407 college-aged participants. Participants were recorded up to six times each over a 26-month period, each time performing a series of five different ET tasks: (1) a vergence task, (2) a horizontal smooth pursuit task, (3) a video-viewing task, (4) a self-paced reading task, and (5) a random oblique saccade task. Many of these participants have also been recorded for two previously published datasets with different ET devices, and some participants were recorded before and after COVID-19 infection and recovery. GazeBaseVR is suitable for a wide range of research on ET data in VR devices, especially eye movement biometrics due to its large population and longitudinal nature. In addition to ET data, additional participant details are provided to enable further research on topics such as fairness. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Data publicly available on figshare at https://doi.org/10.6084/m9.figshare.21308391

arXiv:2209.07657 [pdf, other]

Analysis of Heuristic and Digital Filters as Applied to Video-oculography Signals

Authors: Mehedi H. Raju, Lee Friedman, Troy M. Bouman, Oleg V. Komogortsev

Abstract: In 1993, Stampe [1993] suggested two "heurisitic" filters that were designed for video-oculography data. Several manufacturers (e.g., SR-Research, Tobii T60 XL and SMI) have employed these filters as an option for recording eye-movements. For the EyeLink family of eye-trackers, these two filters are referred to as standard (STD) or EXTRA. We have implemented these filters as software functions. Fo… ▽ More In 1993, Stampe [1993] suggested two "heurisitic" filters that were designed for video-oculography data. Several manufacturers (e.g., SR-Research, Tobii T60 XL and SMI) have employed these filters as an option for recording eye-movements. For the EyeLink family of eye-trackers, these two filters are referred to as standard (STD) or EXTRA. We have implemented these filters as software functions. For those who use their eye-trackers for data-collection only, this will allow users to collect unfiltered data and simultaneously have access to unfiltered, STD filtered and EXTRA filtered data for the exact same recording. Based on the literature, which has employed various eye-tracking technologies, and our analysis of our EyeLink-1000 data, we conclude that the highest signal frequency content needed for most eye-tracking studies (i.e., saccades, microsaccades and smooth pursuit) is around 100 Hz, excluding fixation microtremor. For those who collect their data at 1000 Hz or higher, we test two zero-phase low-pass digital filters, one with a cutoff of 50 Hz and one with a cutoff of 100 Hz. We perform a Fourier (FFT) analysis to examine the frequency content for unfiltered data, STD data, EXTRA filtered data, and data filtered by low-pass digital filters. We also examine the frequency response of these filters. The digital filter with the 100 Hz cutoff dramatically outperforms both heuristic filters because the heuristic filters leave noise above 100 Hz. In the paper we provide additional conclusions and suggest the use of digital filters in scenarios where offline data processing is an option. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: 19 pages, 12 figures, 5 tables

arXiv:2001.09100 [pdf, other]

Why Temporal Persistence of Biometric Features is so Valuable for Classification Performance

Authors: Lee Friedman, Hal Stern, Larry R. Price, Oleg V. Komogortsev

Abstract: It is generally accepted that relatively more permanent (i.e., more temporally persistent) traits are more valuable for biometric performance than less permanent traits. Although this finding is intuitive, there is no current work identifying exactly where in the biometric analysis temporal persistence makes a difference. In this paper, we answer this question. In a recent report, we introduced th… ▽ More It is generally accepted that relatively more permanent (i.e., more temporally persistent) traits are more valuable for biometric performance than less permanent traits. Although this finding is intuitive, there is no current work identifying exactly where in the biometric analysis temporal persistence makes a difference. In this paper, we answer this question. In a recent report, we introduced the intraclass correlation coefficient (ICC) as an index of temporal persistence for such features. In that report, we also showed that choosing only the most temporally persistent features yielded superior performance in 12 of 14 datasets. Motivated by those empirical results, we present a novel approach using synthetic features to study which aspects of a biometric identification study are influenced by the temporal persistence of features. What we show is that using more temporally persistent features produces effects on the similarity score distributions that explain why this quality is so key to biometric performance. The results identified with the synthetic data are largely reinforced by an analysis of two datasets, one based on eye-movements and one based on gait. There was one difference between the synthetic and real data: In real data, features are intercorrelated, with the level of intercorrelation increasing with increasing ICC. This increasedhttps://www.overleaf.com/project/5e2b14694c5dc600017292e6 intercorrelation in real data was associated with an increase in the spread of the impostor similarity score distributions. Removing these intercorrelations for real datasets with a decorrelation step produced results which were very similar to that obtained with synthetic features. △ Less

Submitted 24 January, 2020; originally announced January 2020.

Comments: 19 pages, 8 figures, 7 tables, 2 Appendices

arXiv:1912.02083 [pdf, other]

Evaluating the Data Quality of Eye Tracking Signals from a Virtual Reality System: Case Study using SMI's Eye-Tracking HTC Vive

Authors: Dillon J. Lohr, Lee Friedman, Oleg V. Komogortsev

Abstract: We evaluated the data quality of SMI's tethered eye-tracking head-mounted display based on the HTC Vive (ET-HMD) during a random saccade task. We measured spatial accuracy, spatial precision, temporal precision, linearity, and crosstalk. We proposed the use of a non-parametric spatial precision measure based on the median absolute deviation (MAD). Our linearity analysis considered both the slope a… ▽ More We evaluated the data quality of SMI's tethered eye-tracking head-mounted display based on the HTC Vive (ET-HMD) during a random saccade task. We measured spatial accuracy, spatial precision, temporal precision, linearity, and crosstalk. We proposed the use of a non-parametric spatial precision measure based on the median absolute deviation (MAD). Our linearity analysis considered both the slope and adjusted R-squared of a best-fitting line. We were the first to test for a quadratic component to crosstalk. We prepended a calibration task to the random saccade task and evaluated 2 methods to employ this user-supplied calibration. For this, we used a unique binning approach to choose samples to be included in the recalibration analyses. We compared our quality measures between the ET-HMD and our EyeLink 1000 (SR-Research, Ottawa, Ontario, CA). We found that the ET-HMD had significantly better spatial accuracy and linearity fit than our EyeLink, but both devices had similar spatial precision and linearity slope. We also found that, while the EyeLink had no significant crosstalk, the ET-HMD generally exhibited quadratic crosstalk. Fourier analysis revealed that the binocular signal was a low-pass filtered version of the monocular signal. Such filtering resulted in the binocular signal being useless for the study of high-frequency components such as saccade dynamics. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: Data publicly available at https://doi.org/10.18738/T8/N8EIVG, 35 pages, 13 figures

arXiv:1906.06272 [pdf, other]

Biometric Performance as a Function of Gallery Size

Authors: Lee Friedman, Hal S Stern, Vladyslav Prokopenko, Shagen Djanian, Henry K. Griffith, Oleg V. Komogortsev

Abstract: Many developers of biometric systems start with modest samples before general deployment. They are interested in how their systems will work with much larger samples. We evaluated the effect of gallery size on biometric performance. Identification rates describe the performance of biometric identification, whereas ROC-based measures describe the performance of biometric authentication (verificatio… ▽ More Many developers of biometric systems start with modest samples before general deployment. They are interested in how their systems will work with much larger samples. We evaluated the effect of gallery size on biometric performance. Identification rates describe the performance of biometric identification, whereas ROC-based measures describe the performance of biometric authentication (verification). Therefore, we examined how increases in gallery size affected identification rates (i.e., Rank-1 Identification Rate, or Rank-1 IR) and ROC-based measures such as equal error rate (EER). We studied these phenomena with synthetic data as well as real data from a face recognition study. It is well known that the Rank-1 IR declines with increasing gallery size. We have provided further insight into this decline. We have shown that this relationship is linear in log(Gallery Size). We have also shown that this decline can be counteracted with the inclusion of additional information (features) for larger gallery sizes. We have also described the curves which can be used to predict how much additional information is required to stabilize the Rank-1 IR as a function of gallery size. These equations are also linear in log(gallery size). We have also shown that the entire ROC curve is not systematically affected by gallery size, and so ROC-based scalar performance metrics such as EER are also stable across gallery size. △ Less

Submitted 23 January, 2020; v1 submitted 14 June, 2019; originally announced June 2019.

Comments: 19 pages, 9 Figures, 0 Tables

arXiv:1906.06262 [pdf]

The Linear Relationship between Temporal Persistence, Number of Independent Features and Target EER

Authors: Lee Friedman, Hal S. Stern, Oleg V. Komogortsev

Abstract: If you have a target level of biometric performance (e.g. EER = 5% or 0.1%), how many units of unique information (uncorrelated features) are needed to achieve that target? We show, for normally distributed features, that the answer to that question depends on the temporal persistence of the feature set. We address these questions with synthetic features introduced in a prior report. We measure te… ▽ More If you have a target level of biometric performance (e.g. EER = 5% or 0.1%), how many units of unique information (uncorrelated features) are needed to achieve that target? We show, for normally distributed features, that the answer to that question depends on the temporal persistence of the feature set. We address these questions with synthetic features introduced in a prior report. We measure temporal persistence with an intraclass correlation coefficient (ICC). For 5 separate EER targets (5.0%, 2.0%, 1.0%, 0.5% and 0.1%) we provide linear relationships between the temporal persistence of the feature set and the log10(number of features). These linear relationships will help those in the planning stage, prior to setting up a new biometric system, determine the required temporal persistence and number of independent features needed to achieve certain EER targets. △ Less

Submitted 14 June, 2019; originally announced June 2019.

Comments: 4 pages, 3 figures, 4 tables

arXiv:1904.07361 [pdf]

Custom Video-Oculography Device and Its Application to Fourth Purkinje Image Detection during Saccades

Authors: Evgeniy Abdulin, Lee Friedman, Oleg Komogortsev

Abstract: We built a custom video-based eye-tracker that saves every video frame as a full resolution image (MJPEG). Images can be processed offline for the detection of ocular features, including the pupil and corneal reflection (First Purkinje Image, P1) position. A comparison of multiple algorithms for detection of pupil and corneal reflection can be performed. The system provides for highly flexible sti… ▽ More We built a custom video-based eye-tracker that saves every video frame as a full resolution image (MJPEG). Images can be processed offline for the detection of ocular features, including the pupil and corneal reflection (First Purkinje Image, P1) position. A comparison of multiple algorithms for detection of pupil and corneal reflection can be performed. The system provides for highly flexible stimulus creation, with mixing of graphic, image, and video stimuli. We can change cameras and infrared illuminators depending on the image qualities and frame rate desired. Using this system, we have detected the position of the Fourth Purkinje image (P4) in the frames. We show that when we estimate gaze by calculating P1-P4, signal compares well with gaze estimated with a DPI eye-tracker, which natively detects and tracks the P1 and P4. △ Less

Submitted 15 April, 2019; originally announced April 2019.

Comments: 8 pages, 8 figures, appendix

arXiv:1812.10381 [pdf]

doi 10.13140/RG.2.2.18890.00965

Decision Support System for Renal Transplantation

Authors: Ehsan Khan, Avishek Choudhury, Amy L Friedman, Daehan Won

Abstract: The burgeoning need for kidney transplantation mandates immediate attention. Mismatch of deceased donor-recipient kidney leads to post-transplant death. To ensure ideal kidney donor-recipient match and minimize post-transplant deaths, the paper develops a prediction model that identifies factors that determine the probability of success of renal transplantation, that is, if the kidney procured fro… ▽ More The burgeoning need for kidney transplantation mandates immediate attention. Mismatch of deceased donor-recipient kidney leads to post-transplant death. To ensure ideal kidney donor-recipient match and minimize post-transplant deaths, the paper develops a prediction model that identifies factors that determine the probability of success of renal transplantation, that is, if the kidney procured from the deceased donor can be transplanted or discarded. The paper conducts a study envelo** data for 584 imported kidneys collected from 12 transplant centers associated with an organ procurement organization located in New York City, NY. The predicting model yielding best performance measures can be beneficial to the healthcare industry. Transplant centers and organ procurement organizations can take advantage of the prediction model to efficiently predict the outcome of kidney transplantation. Consequently, it will reduce the mortality rate caused by mismatching of donor-recipient kidney transplantation during the surgery. Keywords △ Less

Submitted 11 December, 2018; originally announced December 2018.

Journal ref: In: Proceedings of the 2018 IISE Annual Conference: 2018; Orlando: IISE; 2018: 431-436

arXiv:1802.00050 [pdf, other]

Recursive Feature Generation for Knowledge-based Learning

Authors: Lior Friedman, Shaul Markovitch

Abstract: When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorit… ▽ More When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms. △ Less

Submitted 31 January, 2018; originally announced February 2018.

arXiv:1709.02700 [pdf]

Method to Detect Eye Position Noise from Video-Oculography when Detection of Pupil or Corneal Reflection Position Fails

Authors: Evgeny Abdulin, Lee Friedman, Oleg V. Komogortsev

Abstract: We present software to detect noise in eye position signals from video-based eye-tracking systems that depend on accurate pupil and corneal reflection position estimation. When such systems transiently fail to properly detect the pupil or the corneal reflection due to occlusion from eyelids, eye lashes or various shadows, the estimated gaze position is false. This produces an artifactual signal in… ▽ More We present software to detect noise in eye position signals from video-based eye-tracking systems that depend on accurate pupil and corneal reflection position estimation. When such systems transiently fail to properly detect the pupil or the corneal reflection due to occlusion from eyelids, eye lashes or various shadows, the estimated gaze position is false. This produces an artifactual signal in the position trace that is rapidly, irregularly oscillating between true and false gaze positions. We refer to this noise as RIONEPS (Rapid Irregularly Oscillating Noise of the Eye Position Signal). Our method for detecting these periods automatically is based on an estimate of the relative inefficiency of the eye position signal. We look for RIONEPS in the horizontal and vertical traces separately, and although we typically use it offline, it is suitable to adaptation for real time use. This method requires a threshold to be set, and although we provide some guidance, thresholds will have to be estimated empirically. △ Less

Submitted 8 September, 2017; originally announced September 2017.

Comments: 9 figures, 20 pages, pseudocode and Matlab code

arXiv:1707.09543 [pdf]

Synthetic Database for Evaluation of General, Fundamental Biometric Principles

Authors: Lee Friedman, Oleg Komogortsev

Abstract: We create synthetic biometric databases to study general, fundamental, biometric principles. First, we check the validity of the synthetic database design by comparing it to real data in terms of biometric performance. The real data used for this validity check was from an eye-movement related biometric database. Next, we employ our database to evaluate the impact of variations of temporal persist… ▽ More We create synthetic biometric databases to study general, fundamental, biometric principles. First, we check the validity of the synthetic database design by comparing it to real data in terms of biometric performance. The real data used for this validity check was from an eye-movement related biometric database. Next, we employ our database to evaluate the impact of variations of temporal persistence of features on biometric performance. We index temporal persistence with the intraclass correlation coefficient (ICC). We find that variations in temporal persistence are extremely highly correlated with variations in biometric performance. Finally, we use our synthetic database strategy to determine how many features are required to achieve particular levels of performance as the number of subjects in the database increases from 100 to 10,000. An important finding is that the number of features required to achieve various EER values (2%, 0.3%, 0.15%) is essentially constant in the database sizes that we studied. We hypothesize that the insights obtained from our study would be applicable to many biometric modalities where extracted feature properties resemble the properties of the synthetic features we discuss in this work. △ Less

Submitted 29 July, 2017; originally announced July 2017.

Comments: 8 pages, 8 figures

arXiv:1703.09167 [pdf]

A Study on the Extraction and Analysis of a Large Set of Eye Movement Features during Reading

Authors: Ioannis Rigas, Lee Friedman, Oleg Komogortsev

Abstract: This work presents a study on the extraction and analysis of a set of 101 categories of eye movement features from three types of eye movement events: fixations, saccades, and post-saccadic oscillations. The eye movements were recorded during a reading task. For the categories of features with multiple instances in a recording we extract corresponding feature subtypes by calculating descriptive st… ▽ More This work presents a study on the extraction and analysis of a set of 101 categories of eye movement features from three types of eye movement events: fixations, saccades, and post-saccadic oscillations. The eye movements were recorded during a reading task. For the categories of features with multiple instances in a recording we extract corresponding feature subtypes by calculating descriptive statistics on the distributions of these instances. A unified framework of detailed descriptions and mathematical formulas are provided for the extraction of the feature set. The analysis of feature values is performed using a large database of eye movement recordings from a normative population of 298 subjects. We demonstrate the central tendency and overall variability of feature values over the experimental population, and more importantly, we quantify the test-retest reliability (repeatability) of each separate feature. The described methods and analysis can provide valuable tools in fields exploring the eye movements, such as in behavioral studies, attention and cognition research, medical research, biometric recognition, and human-computer interaction. △ Less

Submitted 27 March, 2017; originally announced March 2017.

Comments: 38 pages, 10 figures

arXiv:1609.03948 [pdf]

doi 10.1371/journal.pone.0178501

Method to Assess the Temporal Persistence of Potential Biometric Features: Application to Oculomotor, and Gait-Related Databases

Authors: Lee Friedman, Ioannis Rigas, Mark S. Nixon, Oleg V. Komogortsev

Abstract: Although temporal persistence, or permanence, is a well understood requirement for optimal biometric features, there is no general agreement on how to assess temporal persistence. We suggest that the best way to assess temporal persistence is to perform a test-retest study, and assess test-retest reliability. For ratio-scale features that are normally distributed, this is best done using the Intra… ▽ More Although temporal persistence, or permanence, is a well understood requirement for optimal biometric features, there is no general agreement on how to assess temporal persistence. We suggest that the best way to assess temporal persistence is to perform a test-retest study, and assess test-retest reliability. For ratio-scale features that are normally distributed, this is best done using the Intraclass Correlation Coefficient (ICC). For 10 distinct data sets (8 eye-movement related, and 2 gait related), we calculated the test-retest reliability ('Temporal persistence') of each feature, and compared biometric performance of high-ICC features to lower ICC features, and to the set of all features. We demonstrate that using a subset of only high-ICC features produced superior Rank-1-Identification Rate (Rank-1-IR) performance in 9 of 10 databases (p = 0.01, one-tailed). For Equal Error Rate (EER), using a subset of only high-ICC features produced superior performance in 8 of 10 databases (p = 0.055, one-tailed). In general, then, prescreening potential biometric features, and choosing only highly reliable features will yield better performance than lower ICC features or than the set of all features combined. We hypothesize that this would likely be the case for any biometric modality where the features can be expressed as quantitative values on an interval or ratio scale, assuming an adequate number of relatively independent features. △ Less

Submitted 13 September, 2016; originally announced September 2016.

Comments: 13 pages, 8 figures, 5 tables

arXiv:1406.7658 [pdf, ps, other]

doi 10.2168/LMCS-10(3:5)2014

Reductions to the set of random strings: The resource-bounded case

Authors: Eric Allender, Harry Buhrman, Luke Friedman, Bruno Loff

Abstract: This paper is motivated by a conjecture that BPP can be characterized in terms of polynomial-time nonadaptive reductions to the set of Kolmogorov-random strings. In this paper we show that an approach laid out in [Allender et al] to settle this conjecture cannot succeed without significant alteration, but that it does bear fruit if we consider time-bounded Kolmogorov complexity instead. We show t… ▽ More This paper is motivated by a conjecture that BPP can be characterized in terms of polynomial-time nonadaptive reductions to the set of Kolmogorov-random strings. In this paper we show that an approach laid out in [Allender et al] to settle this conjecture cannot succeed without significant alteration, but that it does bear fruit if we consider time-bounded Kolmogorov complexity instead. We show that if a set A is reducible in polynomial time to the set of time-t-bounded Kolmogorov random strings (for all large enough time bounds t), then A is in P/poly, and that if in addition such a reduction exists for any universal Turing machine one uses in the definition of Kolmogorov complexity, then A is in PSPACE. △ Less

Submitted 17 August, 2014; v1 submitted 30 June, 2014; originally announced June 2014.

Comments: Conference version in MFCS 2012

Journal ref: Logical Methods in Computer Science, Volume 10, Issue 3 (August 19, 2014) lmcs:723

arXiv:0807.1253 [pdf, ps, other]

doi 10.1098/rspa.2008.0465

Informed Traders

Authors: Dorje C. Brody, Mark H. A. Davis, Robyn L. Friedman, Lane P. Hughston

Abstract: An asymmetric information model is introduced for the situation in which there is a small agent who is more susceptible to the flow of information in the market than the general market participant, and who tries to implement strategies based on the additional information. In this model market participants have access to a stream of noisy information concerning the future return of an asset, wher… ▽ More An asymmetric information model is introduced for the situation in which there is a small agent who is more susceptible to the flow of information in the market than the general market participant, and who tries to implement strategies based on the additional information. In this model market participants have access to a stream of noisy information concerning the future return of an asset, whereas the informed trader has access to a further information source which is obscured by an additional noise that may be correlated with the market noise. The informed trader uses the extraneous information source to seek statistical arbitrage opportunities, while at the same time accommodating the additional risk. The amount of information available to the general market participant concerning the asset return is measured by the mutual information of the asset price and the associated cash flow. The worth of the additional information source is then measured in terms of the difference of mutual information between the general market participant and the informed trader. This difference is shown to be nonnegative when the signal-to-noise ratio of the information flow is known in advance. Explicit trading strategies leading to statistical arbitrage opportunities, taking advantage of the additional information, are constructed, illustrating how excess information can be translated into profit. △ Less

Submitted 17 November, 2008; v1 submitted 8 July, 2008; originally announced July 2008.

Comments: 20 pages, 5 figures. Version to appear in the Proceedings of the Royal Society A

Journal ref: Proceedings of the Royal Society London A465, 1103-1122 (2009)

Showing 1–26 of 26 results for author: Friedman, L