-
FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pretraining
Authors:
Shaheer Mohamed,
Maryam Haghighat,
Tharindu Fernando,
Sridha Sridharan,
Clinton Fookes,
Peyman Moghadam
Abstract:
Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize…
▽ More
Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension, resulting in the under-utilization of spatial information. Moreover, transformers are known to be data-hungry and their performance relies heavily on large-scale pretraining, which is challenging due to limited annotated hyperspectral data. Therefore, the full potential of HSI transformers has not been fully realized. To overcome these limitations, we propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pretraining procedures, leading to significant improvements in performance. The factorization of the inputs allows the spectral and spatial transformers to better capture the interactions within the hyperspectral data cubes. Inspired by masked image modeling pretraining, we also devise efficient masking strategies for pretraining each of the spectral and spatial transformers. We conduct experiments on six publicly available datasets for HSI classification task and demonstrate that our model achieves state-of-the-art performance in all the datasets. The code for our model will be made available at https://github.com/csiro-robotics/factoformer.
△ Less
Submitted 3 January, 2024; v1 submitted 17 September, 2023;
originally announced September 2023.
-
Observation of flat and weakly dispersing bands in a van der Waals semiconductor Nb3Br8 with breathing kagome lattice
Authors:
Sabin Regmi,
Anup Pradhan Sakhya,
Tharindu Fernando,
Yuzhou Zhao,
Dylan Jeff,
Milo Sprague,
Favian Gonzalez,
Iftakhar Bin Elius,
Mazharul Islam Mondal,
Nathan Valadez,
Damani Jarrett,
Alexis Agosto,
Jihui Yang,
Jiun-Haw Chu,
Saiful I. Khondaker,
Xiaodong Xu,
Ting Cao,
Madhab Neupane
Abstract:
Niobium halides, Nb3X8 (X = Cl,Br,I), which are predicted two-dimensional magnets, have recently gotten attention due to their breathing kagome geometry. Here, we have studied the electronic structure of Nb3Br8 by using angle-resolved photoemission spectroscopy (ARPES) and first-principles calculations. ARPES results depict the presence of multiple flat and weakly dispersing bands. These bands are…
▽ More
Niobium halides, Nb3X8 (X = Cl,Br,I), which are predicted two-dimensional magnets, have recently gotten attention due to their breathing kagome geometry. Here, we have studied the electronic structure of Nb3Br8 by using angle-resolved photoemission spectroscopy (ARPES) and first-principles calculations. ARPES results depict the presence of multiple flat and weakly dispersing bands. These bands are well explained by the theoretical calculations, which show they have Nb d character indicating their origination from the Nb atoms forming the breathing kagome plane. This van der Waals material can be easily thinned down via mechanical exfoliation to the ultrathin limit and such ultrathin samples are stable as depicted from the time-dependent Raman spectroscopy measurements at room temperature. These results demonstrate that Nb3Br8 is an excellent material not only for studying breathing kagome induced flat band physics and its connection with magnetism, but also for heterostructure fabrication for application purposes.
△ Less
Submitted 9 September, 2023;
originally announced September 2023.
-
Who Is Alyx? A new Behavioral Biometric Dataset for User Identification in XR
Authors:
Christian Rack,
Tamara Fernando,
Murat Yalcin,
Andreas Hotho,
Marc Erich Latoschik
Abstract:
This article presents a new dataset containing motion and physiological data of users playing the game "Half-Life: Alyx". The dataset specifically targets behavioral and biometric identification of XR users. It includes motion and eye-tracking data captured by a HTC Vive Pro of 71 users playing the game on two separate days for 45 minutes. Additionally, we collected physiological data from 31 of t…
▽ More
This article presents a new dataset containing motion and physiological data of users playing the game "Half-Life: Alyx". The dataset specifically targets behavioral and biometric identification of XR users. It includes motion and eye-tracking data captured by a HTC Vive Pro of 71 users playing the game on two separate days for 45 minutes. Additionally, we collected physiological data from 31 of these users. We provide benchmark performances for the task of motion-based identification of XR users with two prominent state-of-the-art deep learning architectures (GRU and CNN). After training on the first session of each user, the best model can identify the 71 users in the second session with a mean accuracy of 95% within 2 minutes. The dataset is freely available under https://github.com/cschell/who-is-alyx
△ Less
Submitted 4 August, 2023;
originally announced August 2023.
-
A Quantized Interband Topological Index in Two-Dimensional Systems
Authors:
Tharindu Fernando,
Ting Cao
Abstract:
We introduce a novel gauge-invariant, quantized interband index in two-dimensional (2D) multiband systems. It provides a bulk topological classification of a submanifold of parameter space (e.g., an electron valley in a Brillouin zone), and therefore overcomes difficulties in characterizing topology of submanifolds. We confirm its topological nature by numerically demonstrating a one-to-one corres…
▽ More
We introduce a novel gauge-invariant, quantized interband index in two-dimensional (2D) multiband systems. It provides a bulk topological classification of a submanifold of parameter space (e.g., an electron valley in a Brillouin zone), and therefore overcomes difficulties in characterizing topology of submanifolds. We confirm its topological nature by numerically demonstrating a one-to-one correspondence to the valley Chern number in $k\cdot p$ models (e.g., gapped Dirac fermion model), and the first Chern number in lattice models (e.g., Haldane model). Furthermore, we derive a band-resolved topological charge and demonstrate that it can be used to investigate the nature of edge states due to band inversion in valley systems like multilayer graphene.
△ Less
Submitted 31 July, 2023;
originally announced July 2023.
-
Functional Observability, Structural Functional Observability and Optimal Sensor Placement
Authors:
Yuan Zhang,
Tyrone Fernando,
Mohamed Darouach
Abstract:
In this paper, functional observability and detectability, structural functional observability (SFO), and the related sensor placement problems are investigated. A new concept of modal functional observability coinciding with the notion of modal observability is proposed. This notion introduces new necessary and sufficient conditions for functional observability and detectability without using sys…
▽ More
In this paper, functional observability and detectability, structural functional observability (SFO), and the related sensor placement problems are investigated. A new concept of modal functional observability coinciding with the notion of modal observability is proposed. This notion introduces new necessary and sufficient conditions for functional observability and detectability without using system observability decomposition and facilitates the design of a functionally observable/detectable system. Afterwards, SFO is redefined rigorously from a generic perspective, contrarily to the definition of structural observability. A new and complete graph-theoretic characterization for SFO is proposed. Based on these results, the problems of selecting the minimal sensors from a prior set to achieve functional observability and SFO are shown to be NP-hard. Nevertheless, supermodular set functions are established, leading to greedy heuristics that can find approximation solutions to these problems with provable guarantees in polynomial time. A closed-form solution along with a constructive procedure is also given for the unconstrained case on systems with diagonalizable state matrices. Notably, our results also yield a corollary: polynomial time verification of the structural target controllability of $n-1$ state variables is achievable, where $n$ is the system state dimension, a problem that may be hard otherwise.
△ Less
Submitted 28 August, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Raman Study of Layered Breathing Kagome Lattice Semiconductor Nb3Cl8
Authors:
Dylan A. Jeff,
Favian Gonzalez,
Kamal Harrison,
Yuzhou Zhao,
Tharindu Fernando,
Sabin Regmi,
Zhaoyu Liu,
Humberto R. Gutierrez,
Madhab Neupane,
Jihui Yang,
Jiun-Haw Chu,
Xiaodong Xu,
Ting Cao,
Saiful I. Khondaker
Abstract:
Niobium chloride (Nb3Cl8) is a layered 2D semiconducting material with many exotic properties including a breathing kagome lattice, a topological flat band in its band structure, and a crystal structure that undergoes a structural and magnetic phase transition at temperatures below 90 K. Despite being a remarkable material with fascinating new physics, the understanding of its phonon properties is…
▽ More
Niobium chloride (Nb3Cl8) is a layered 2D semiconducting material with many exotic properties including a breathing kagome lattice, a topological flat band in its band structure, and a crystal structure that undergoes a structural and magnetic phase transition at temperatures below 90 K. Despite being a remarkable material with fascinating new physics, the understanding of its phonon properties is at its infancy. In this study, we investigate the phonon dynamics of Nb3Cl8 in bulk and few layer flakes using polarized Raman spectroscopy and density functional theory (DFT) analysis to determine the material's vibrational modes, as well as their symmetrical representations and atomic displacements. We experimentally resolved 12 phonon modes, 5 of which are A1g modes while the remaining 7 are Eg modes, which is in strong agreement with our DFT calculation. Layer-dependent results suggest that the Raman peak positions are mostly insensitive to changes in layer thickness, while peak intensity and FWHM are affected. Raman measurements as a function of excitation wavelength (473-785 nm) show a significant increase of the peak intensities when using a 473 nm excitation source, suggesting a near resonant condition. Temperature-dependent Raman experiments carried out above and below the transition temperature did not show any change in the symmetries of the phonon modes, suggesting that the structural phase transition is likely from the high temperature P3m1 phase to the low-temperature R3m phase. Magneto-Raman measurements carried out at 140 and 2 K between -2 to 2 T show that the Raman modes are not magnetically coupled. Overall, our study presented here significantly advances the fundamental understanding of layered Nb3Cl8 material which can be further exploited for future applications.
△ Less
Submitted 25 October, 2023; v1 submitted 20 June, 2023;
originally announced June 2023.
-
Remembering What Is Important: A Factorised Multi-Head Retrieval and Auxiliary Memory Stabilisation Scheme for Human Motion Prediction
Authors:
Tharindu Fernando,
Harshala Gammulle,
Sridha Sridharan,
Simon Denman,
Clinton Fookes
Abstract:
Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical kno…
▽ More
Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical knowledge. Specifically, we disentangle subject-specific, task-specific, and other auxiliary information from the observed pose sequences and utilise these factorised features to query the memory. A novel Multi-Head knowledge retrieval scheme leverages these factorised feature embeddings to perform multiple querying operations over the historical observations captured within the auxiliary memory. Moreover, our proposed dynamic masking strategy makes this feature disentanglement process dynamic. Two novel loss functions are introduced to encourage diversity within the auxiliary memory while ensuring the stability of the memory contents, such that it can locate and store salient information that can aid the long-term prediction of future motion, irrespective of data imbalances or the diversity of the input data distribution. With extensive experiments conducted on two public benchmarks, Human3.6M and CMU-Mocap, we demonstrate that these design choices collectively allow the proposed approach to outperform the current state-of-the-art methods by significant margins: $>$ 17\% on the Human3.6M dataset and $>$ 9\% on the CMU-Mocap dataset.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Physical Adversarial Attacks for Surveillance: A Survey
Authors:
Kien Nguyen,
Tharindu Fernando,
Clinton Fookes,
Sridha Sridharan
Abstract:
Modern automated surveillance techniques are heavily reliant on deep learning methods. Despite the superior performance, these learning systems are inherently vulnerable to adversarial attacks - maliciously crafted inputs that are designed to mislead, or trick, models into making incorrect predictions. An adversary can physically change their appearance by wearing adversarial t-shirts, glasses, or…
▽ More
Modern automated surveillance techniques are heavily reliant on deep learning methods. Despite the superior performance, these learning systems are inherently vulnerable to adversarial attacks - maliciously crafted inputs that are designed to mislead, or trick, models into making incorrect predictions. An adversary can physically change their appearance by wearing adversarial t-shirts, glasses, or hats or by specific behavior, to potentially avoid various forms of detection, tracking and recognition of surveillance systems; and obtain unauthorized access to secure properties and assets. This poses a severe threat to the security and safety of modern surveillance systems. This paper reviews recent attempts and findings in learning and designing physical adversarial attacks for surveillance applications. In particular, we propose a framework to analyze physical adversarial attacks and provide a comprehensive survey of physical adversarial attacks on four key surveillance tasks: detection, identification, tracking, and action recognition under this framework. Furthermore, we review and analyze strategies to defend against the physical adversarial attacks and the methods for evaluating the strengths of the defense. The insights in this paper present an important step in building resilience within surveillance systems to physical adversarial attacks.
△ Less
Submitted 14 October, 2023; v1 submitted 1 May, 2023;
originally announced May 2023.
-
Versatile User Identification in Extended Reality using Pretrained Similarity-Learning
Authors:
Christian Rack,
Konstantin Kobs,
Tamara Fernando,
Andreas Hotho,
Marc Erich Latoschik
Abstract:
Various machine learning approaches have proven to be useful for user verification and identification based on motion data in eXtended Reality (XR). However, their real-world application still faces significant challenges concerning versatility, i.e., in terms of extensibility and generalization capability. This article presents a solution that is both extensible to new users without expensive ret…
▽ More
Various machine learning approaches have proven to be useful for user verification and identification based on motion data in eXtended Reality (XR). However, their real-world application still faces significant challenges concerning versatility, i.e., in terms of extensibility and generalization capability. This article presents a solution that is both extensible to new users without expensive retraining, and that generalizes well across different sessions, devices, and user tasks. To this end, we developed a similarity-learning model and pretrained it on the "Who Is Alyx?" dataset. This dataset features a wide array of tasks and hence motions from users playing the VR game "Half-Life: Alyx". In contrast to previous works, we used a dedicated set of users for model validation and final evaluation. Furthermore, we extended this evaluation using an independent dataset that features completely different users, tasks, and three different XR devices. In comparison with a traditional classification-learning baseline, our model shows superior performance, especially in scenarios with limited enrollment data. The pretraining process allows immediate deployment in a diverse range of XR applications while maintaining high versatility. Looking ahead, our approach paves the way for easy integration of pretrained motion-based identification models in production XR systems.
△ Less
Submitted 15 April, 2024; v1 submitted 15 February, 2023;
originally announced February 2023.
-
Finite Element Methods for Linear Maxwell's Equations in Bianisotropic Media Permitting Polarization Fields and Magnetic Currents
Authors:
Tharindu Fernando,
Martin Licht,
Michael Holst
Abstract:
We review Maxwell's equations and constitutive relations for 3D bianisotropic media in a generalized form: we consider all four variables and allow for nonzero polarization or magnetization, and also nonzero nonzero magnetic charge or current. After a discussion of general boundary conditions, we introduce a time-harmonic variational formulation of linear Maxwell's equations within 3D bianisotropi…
▽ More
We review Maxwell's equations and constitutive relations for 3D bianisotropic media in a generalized form: we consider all four variables and allow for nonzero polarization or magnetization, and also nonzero nonzero magnetic charge or current. After a discussion of general boundary conditions, we introduce a time-harmonic variational formulation of linear Maxwell's equations within 3D bianisotropic media in terms of the electric and magnetic fields. We showcase a finite element approximation of our variational formulation, using curl-conforming Nédélec edge elements of the first kind. Numerical examples illustrate the convergence of the method.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Using Auxiliary Information for Person Re-Identification -- A Tutorial Overview
Authors:
Tharindu Fernando,
Clinton Fookes,
Sridha Sridharan,
Dana Michalski
Abstract:
Person re-identification (re-id) is a pivotal task within an intelligent surveillance pipeline and there exist numerous re-id frameworks that achieve satisfactory performance in challenging benchmarks. However, these systems struggle to generate acceptable results when there are significant differences between the camera views, illumination conditions, or occlusions. This result can be attributed…
▽ More
Person re-identification (re-id) is a pivotal task within an intelligent surveillance pipeline and there exist numerous re-id frameworks that achieve satisfactory performance in challenging benchmarks. However, these systems struggle to generate acceptable results when there are significant differences between the camera views, illumination conditions, or occlusions. This result can be attributed to the deficiency that exists within many recently proposed re-id pipelines where they are predominately driven by appearance-based features and little attention is paid to other auxiliary information that could aid the re-id. In this paper, we systematically review the current State-Of-The-Art (SOTA) methods in both uni-modal and multimodal person re-id. Extending beyond a conceptual framework, we illustrate how the existing SOTA methods can be extended to support these additional auxiliary information and quantitatively evaluate the utility of such auxiliary feature information, ranging from logos printed on the objects carried by the subject or printed on the clothes worn by the subject, through to his or her behavioural trajectories. To the best of our knowledge, this is the first work that explores the fusion of multiple information to generate a more discriminant person descriptor and the principal aim of this paper is to provide a thorough theoretical analysis regarding the implementation of such a framework. In addition, using model interpretation techniques, we validate the contributions from different combinations of the auxiliary information versus the original features that the SOTA person re-id models extract. We outline the limitations of the proposed approaches and propose future research directions that could be pursued to advance the area of multi-modal person re-id.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Towards On-Board Panoptic Segmentation of Multispectral Satellite Images
Authors:
Tharindu Fernando,
Clinton Fookes,
Harshala Gammulle,
Simon Denman,
Sridha Sridharan
Abstract:
With tremendous advancements in low-power embedded computing devices and remote sensing instruments, the traditional satellite image processing pipeline which includes an expensive data transfer step prior to processing data on the ground is being replaced by on-board processing of captured data. This paradigm shift enables critical and time-sensitive analytic intelligence to be acquired in a time…
▽ More
With tremendous advancements in low-power embedded computing devices and remote sensing instruments, the traditional satellite image processing pipeline which includes an expensive data transfer step prior to processing data on the ground is being replaced by on-board processing of captured data. This paradigm shift enables critical and time-sensitive analytic intelligence to be acquired in a timely manner on-board the satellite itself. However, at present, the on-board processing of multi-spectral satellite images is limited to classification and segmentation tasks. Extending this processing to its next logical level, in this paper we propose a lightweight pipeline for on-board panoptic segmentation of multi-spectral satellite images. Panoptic segmentation offers major economic and environmental insights, ranging from yield estimation from agricultural lands to intelligence for complex military applications. Nevertheless, the on-board intelligence extraction raises several challenges due to the loss of temporal observations and the need to generate predictions from a single image sample. To address this challenge, we propose a multimodal teacher network based on a cross-modality attention-based fusion strategy to improve the segmentation accuracy by exploiting data from multiple modes. We also propose an online knowledge distillation framework to transfer the knowledge learned by this multi-modal teacher network to a uni-modal student which receives only a single frame input, and is more appropriate for an on-board environment. We benchmark our approach against existing state-of-the-art panoptic segmentation models using the PASTIS multi-spectral panoptic segmentation dataset considering an on-board processing setting. Our evaluations demonstrate a substantial increase in accuracy metrics compared to the existing state-of-the-art models.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Spectroscopic evidence of flat bands in breathing kagome semiconductor Nb3I8
Authors:
Sabin Regmi,
Tharindu Fernando,
Yuzhou Zhao,
Anup Pradhan Sakhya,
Gyanendra Dhakal,
Iftakhar Bin Elius,
Hector Vazquez,
Jonathan D Denlinger,
Jihui Yang,
Jiun-Haw Chu,
Xiaodong Xu,
Ting Cao,
Madhab Neupane
Abstract:
Kagome materials have become solid grounds to study the interplay among geometry, topology, correlation, and magnetism. Recently, semiconductors Nb3X8(X = Cl, Br, I) have been predicted to be two-dimensional (2D) magnets and importantly these materials possess breathing kagome geometry. Electronic structure study of these promising materials is still lacking. Here, we report the spectroscopic evid…
▽ More
Kagome materials have become solid grounds to study the interplay among geometry, topology, correlation, and magnetism. Recently, semiconductors Nb3X8(X = Cl, Br, I) have been predicted to be two-dimensional (2D) magnets and importantly these materials possess breathing kagome geometry. Electronic structure study of these promising materials is still lacking. Here, we report the spectroscopic evidence of at and weakly dispersing bands in breathing-kagome semiconductor Nb3I8 around 500 meV binding energy, which is well supported by our first-principles calculations. These bands originate from the breathing kagome lattice of Niobium atoms and have Nb d character. They are found to be sensitive to polarization of the incident photon beam. Our study provides insight into the electronic structure and at band topology in an exfoliable kagome semiconductor thereby providing an important platform to understand the interaction of geometry and electron correlations in 2D material.
△ Less
Submitted 21 December, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Multi-Slice Net: A novel light weight framework for COVID-19 Diagnosis
Authors:
Harshala Gammulle,
Tharindu Fernando,
Sridha Sridharan,
Simon Denman,
Clinton Fookes
Abstract:
This paper presents a novel lightweight COVID-19 diagnosis framework using CT scans. Our system utilises a novel two-stage approach to generate robust and efficient diagnoses across heterogeneous patient level inputs. We use a powerful backbone network as a feature extractor to capture discriminative slice-level features. These features are aggregated by a lightweight network to obtain a patient l…
▽ More
This paper presents a novel lightweight COVID-19 diagnosis framework using CT scans. Our system utilises a novel two-stage approach to generate robust and efficient diagnoses across heterogeneous patient level inputs. We use a powerful backbone network as a feature extractor to capture discriminative slice-level features. These features are aggregated by a lightweight network to obtain a patient level diagnosis. The aggregation network is carefully designed to have a small number of trainable parameters while also possessing sufficient capacity to generalise to diverse variations within different CT volumes and to adapt to noise introduced during the data acquisition. We achieve a significant performance increase over the baselines when benchmarked on the SPGC COVID-19 Radiomics Dataset, despite having only 2.5 million trainable parameters and requiring only 0.623 seconds on average to process a single patient's CT volume using an Nvidia-GeForce RTX 2080 GPU.
△ Less
Submitted 8 August, 2021;
originally announced August 2021.
-
Robust and Interpretable Temporal Convolution Network for Event Detection in Lung Sound Recordings
Authors:
Tharindu Fernando,
Sridha Sridharan,
Simon Denman,
Houman Ghaemmaghami,
Clinton Fookes
Abstract:
This paper proposes a novel framework for lung sound event detection, segmenting continuous lung sound recordings into discrete events and performing recognition on each event. Exploiting the lightweight nature of Temporal Convolution Networks (TCNs) and their superior results compared to their recurrent counterparts, we propose a lightweight, yet robust, and completely interpretable framework for…
▽ More
This paper proposes a novel framework for lung sound event detection, segmenting continuous lung sound recordings into discrete events and performing recognition on each event. Exploiting the lightweight nature of Temporal Convolution Networks (TCNs) and their superior results compared to their recurrent counterparts, we propose a lightweight, yet robust, and completely interpretable framework for lung sound event detection. We propose the use of a multi-branch TCN architecture and exploit a novel fusion strategy to combine the resultant features from these branches. This not only allows the network to retain the most salient information across different temporal granularities and disregards irrelevant information, but also allows our network to process recordings of arbitrary length. Results: The proposed method is evaluated on multiple public and in-house benchmarks of irregular and noisy recordings of the respiratory auscultation process for the identification of numerous auscultation events including inhalation, exhalation, crackles, wheeze, stridor, and rhonchi. We exceed the state-of-the-art results in all evaluations. Furthermore, we empirically analyse the effect of the proposed multi-branch TCN architecture and the feature fusion strategy and provide quantitative and qualitative evaluations to illustrate their efficiency. Moreover, we provide an end-to-end model interpretation pipeline that interprets the operations of all the components of the proposed framework. Our analysis of different feature fusion strategies shows that the proposed feature concatenation method leads to better suppression of non-informative features, which drastically reduces the classifier overhead resulting in a robust lightweight network.The lightweight nature of our model allows it to be deployed in end-user devices such as smartphones, and it has the ability to generate predictions in real-time.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
An In-depth Analysis of Passage-Level Label Transfer for Contextual Document Ranking
Authors:
Koustav Rudra,
Zeon Trevor Fernando,
Avishek Anand
Abstract:
Pre-trained contextual language models such as BERT, GPT, and XLnet work quite well for document retrieval tasks. Such models are fine-tuned based on the query-document/query-passage level relevance labels to capture the ranking signals. However, the documents are longer than the passages and such document ranking models suffer from the token limitation (512) of BERT. Researchers proposed ranking…
▽ More
Pre-trained contextual language models such as BERT, GPT, and XLnet work quite well for document retrieval tasks. Such models are fine-tuned based on the query-document/query-passage level relevance labels to capture the ranking signals. However, the documents are longer than the passages and such document ranking models suffer from the token limitation (512) of BERT. Researchers proposed ranking strategies that either truncate the documents beyond the token limit or chunk the documents into units that can fit into the BERT. In the later case, the relevance labels are either directly transferred from the original query-document pair or learned through some external model. In this paper, we conduct a detailed study of the design decisions about splitting and label transfer on retrieval effectiveness and efficiency. We find that direct transfer of relevance labels from documents to passages introduces label noise that strongly affects retrieval effectiveness for large training datasets. We also find that query processing times are adversely affected by fine-grained splitting schemes. As a remedy, we propose a careful passage level labelling scheme using weak supervision that delivers improved performance (3-14% in terms of nDCG score) over most of the recently proposed models for ad-hoc retrieval while maintaining manageable computational complexity on four diverse document retrieval datasets.
△ Less
Submitted 6 December, 2023; v1 submitted 30 March, 2021;
originally announced March 2021.
-
Finite Difference Weerakoon-Fernando Method to solve nonlinear equations without using derivatives
Authors:
S. L. Heenatigala,
S. Weerakoon,
T. G. I. Fernando
Abstract:
This research was mainly conducted to explore the possibility of formulating an efficient algorithm to find roots of nonlinear equations without using the derivative of the function. The Weerakoon-Fernando method had been taken as the base in this project to find a new method without the derivative since Weerakoon-Fernando method gives 3rd order convergence. After several unsuccessful attempts we…
▽ More
This research was mainly conducted to explore the possibility of formulating an efficient algorithm to find roots of nonlinear equations without using the derivative of the function. The Weerakoon-Fernando method had been taken as the base in this project to find a new method without the derivative since Weerakoon-Fernando method gives 3rd order convergence. After several unsuccessful attempts we were able to formulate the Finite Difference Weerakoon-Fernando Method (FDWFM) presented here. We noticed that the FDWFM approaches the root faster than any other existing method in the absence of the derivatives as an example, the popular nonlinear equation solver such as secant method (order of convergence is 1.618) in the absence of the derivative. And the FDWFM had three function evaluations and secant method had two function evaluations. By implementing FDWFM on nonlinear equations with complex roots and also on systems of nonlinear equations, we received very encouraging results. When applying the FDWFM to systems of nonlinear equations, we resolved the involvement of the Jacobian problem by following the procedure in the Broyden's method. The computational order of convergence of the FDWFM was close to 2.5 for all these cases. This will undoubtedly provide scientists the efficient numerical algorithm, that doesn't need the derivative of the function to solve nonlinear equations, that they were searching for over centuries.
△ Less
Submitted 3 February, 2021;
originally announced February 2021.
-
Deep Learning for Medical Anomaly Detection -- A Survey
Authors:
Tharindu Fernando,
Harshala Gammulle,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
Machine learning-based medical anomaly detection is an important problem that has been extensively studied. Numerous approaches have been proposed across various medical application domains and we observe several similarities across these distinct applications. Despite this comparability, we observe a lack of structured organisation of these diverse research applications such that their advantages…
▽ More
Machine learning-based medical anomaly detection is an important problem that has been extensively studied. Numerous approaches have been proposed across various medical application domains and we observe several similarities across these distinct applications. Despite this comparability, we observe a lack of structured organisation of these diverse research applications such that their advantages and limitations can be studied. The principal aim of this survey is to provide a thorough theoretical analysis of popular deep learning techniques in medical anomaly detection. In particular, we contribute a coherent and systematic review of state-of-the-art techniques, comparing and contrasting their architectural differences as well as training algorithms. Furthermore, we provide a comprehensive overview of deep model interpretation strategies that can be used to interpret model decisions. In addition, we outline the key limitations of existing deep medical anomaly detection techniques and propose key research directions for further investigation.
△ Less
Submitted 13 April, 2021; v1 submitted 3 December, 2020;
originally announced December 2020.
-
Patient-independent Epileptic Seizure Prediction using Deep Learning Models
Authors:
Theekshana Dissanayake,
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
Objective: Epilepsy is one of the most prevalent neurological diseases among humans and can lead to severe brain injuries, strokes, and brain tumors. Early detection of seizures can help to mitigate injuries, and can be used to aid the treatment of patients with epilepsy. The purpose of a seizure prediction system is to successfully identify the pre-ictal brain stage, which occurs before a seizure…
▽ More
Objective: Epilepsy is one of the most prevalent neurological diseases among humans and can lead to severe brain injuries, strokes, and brain tumors. Early detection of seizures can help to mitigate injuries, and can be used to aid the treatment of patients with epilepsy. The purpose of a seizure prediction system is to successfully identify the pre-ictal brain stage, which occurs before a seizure event. Patient-independent seizure prediction models are designed to offer accurate performance across multiple subjects within a dataset, and have been identified as a real-world solution to the seizure prediction problem. However, little attention has been given for designing such models to adapt to the high inter-subject variability in EEG data. Methods: We propose two patient-independent deep learning architectures with different learning strategies that can learn a global function utilizing data from multiple subjects. Results: Proposed models achieve state-of-the-art performance for seizure prediction on the CHB-MIT-EEG dataset, demonstrating 88.81% and 91.54% accuracy respectively. Conclusions: The Siamese model trained on the proposed learning strategy is able to learn patterns related to patient variations in data while predicting seizures. Significance: Our models show superior performance for patient-independent seizure prediction, and the same architecture can be used as a patient-specific classifier after model adaptation. We are the first study that employs model interpretation to understand classifier behavior for the task for seizure prediction, and we also show that the MFCC feature map utilized by our models contains predictive biomarkers related to interictal and pre-ictal brain states.
△ Less
Submitted 18 November, 2020;
originally announced November 2020.
-
Domain Generalization in Biosignal Classification
Authors:
Theekshana Dissanayake,
Tharindu Fernando,
Simon Denman,
Houman Ghaemmaghami,
Sridha Sridharan,
Clinton Fookes
Abstract:
Objective: When training machine learning models, we often assume that the training data and evaluation data are sampled from the same distribution. However, this assumption is violated when the model is evaluated on another unseen but similar database, even if that database contains the same classes. This problem is caused by domain-shift and can be solved using two approaches: domain adaptation…
▽ More
Objective: When training machine learning models, we often assume that the training data and evaluation data are sampled from the same distribution. However, this assumption is violated when the model is evaluated on another unseen but similar database, even if that database contains the same classes. This problem is caused by domain-shift and can be solved using two approaches: domain adaptation and domain generalization. Simply, domain adaptation methods can access data from unseen domains during training; whereas in domain generalization, the unseen data is not available during training. Hence, domain generalization concerns models that perform well on inaccessible, domain-shifted data. Method: Our proposed domain generalization method represents an unseen domain using a set of known basis domains, afterwhich we classify the unseen domain using classifier fusion. To demonstrate our system, we employ a collection of heart sound databases that contain normal and abnormal sounds (classes). Results: Our proposed classifier fusion method achieves accuracy gains of up to 16% for four completely unseen domains. Conclusion: Recognizing the complexity induced by the inherent temporal nature of biosignal data, the two-stage method proposed in this study is able to effectively simplify the whole process of domain generalization while demonstrating good results on unseen domains and the adopted basis domains. Significance: To our best knowledge, this is the first study that investigates domain generalization for biosignal data. Our proposed learning strategy can be used to effectively learn domain-relevant features while being aware of the class differences in the data.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.
-
Fast & Slow Learning: Incorporating Synthetic Gradients in Neural Memory Controllers
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excelle…
▽ More
Neural Memory Networks (NMNs) have received increased attention in recent years compared to deep architectures that use a constrained memory. Despite their new appeal, the success of NMNs hinges on the ability of the gradient-based optimiser to perform incremental training of the NMN controllers, determining how to leverage their high capacity for knowledge retrieval. This means that while excellent performance can be achieved when the training data is consistent and well distributed, rare data samples are hard to learn from as the controllers fail to incorporate them effectively during model training. Drawing inspiration from the human cognition process, in particular the utilisation of neuromodulators in the human brain, we propose to decouple the learning process of the NMN controllers to allow them to achieve flexible, rapid adaptation in the presence of new information. This trait is highly beneficial for meta-learning tasks where the memory controllers must quickly grasp abstract concepts in the target domain, and adapt stored knowledge. This allows the NMN controllers to quickly determine which memories are to be retained and which are to be erased, and swiftly adapt their strategy to the new task at hand. Through both quantitative and qualitative evaluations on multiple public benchmarks, including classification and regression tasks, we demonstrate the utility of the proposed approach. Our evaluations not only highlight the ability of the proposed NMN architecture to outperform the current state-of-the-art methods, but also provide insights on how the proposed augmentations help achieve such superior results. In addition, we demonstrate the practical implications of the proposed learning strategy, where the feedback path can be shared among multiple neural memory networks as a mechanism for knowledge sharing.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Attention Driven Fusion for Multi-Modal Emotion Recognition
Authors:
Darshana Priyasad,
Tharindu Fernando,
Simon Denman,
Clinton Fookes,
Sridha Sridharan
Abstract:
Deep learning has emerged as a powerful alternative to hand-crafted methods for emotion recognition on combined acoustic and text modalities. Baseline systems model emotion information in text and acoustic modes independently using Deep Convolutional Neural Networks (DCNN) and Recurrent Neural Networks (RNN), followed by applying attention, fusion, and classification. In this paper, we present a d…
▽ More
Deep learning has emerged as a powerful alternative to hand-crafted methods for emotion recognition on combined acoustic and text modalities. Baseline systems model emotion information in text and acoustic modes independently using Deep Convolutional Neural Networks (DCNN) and Recurrent Neural Networks (RNN), followed by applying attention, fusion, and classification. In this paper, we present a deep learning-based approach to exploit and fuse text and acoustic data for emotion classification. We utilize a SincNet layer, based on parameterized sinc functions with band-pass filters, to extract acoustic features from raw audio followed by a DCNN. This approach learns filter banks tuned for emotion recognition and provides more effective features compared to directly applying convolutions over the raw speech signal. For text processing, we use two branches (a DCNN and a Bi-direction RNN followed by a DCNN) in parallel where cross attention is introduced to infer the N-gram level correlations on hidden representations received from the Bi-RNN. Following existing state-of-the-art, we evaluate the performance of the proposed system on the IEMOCAP dataset. Experimental results indicate that the proposed system outperforms existing methods, achieving 3.5% improvement in weighted accuracy.
△ Less
Submitted 10 October, 2020; v1 submitted 23 September, 2020;
originally announced September 2020.
-
Memory based fusion for multi-modal deep learning
Authors:
Darshana Priyasad,
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
The use of multi-modal data for deep machine learning has shown promise when compared to uni-modal approaches with fusion of multi-modal features resulting in improved performance in several applications. However, most state-of-the-art methods use naive fusion which processes feature streams independently, ignoring possible long-term dependencies within the data during fusion. In this paper, we pr…
▽ More
The use of multi-modal data for deep machine learning has shown promise when compared to uni-modal approaches with fusion of multi-modal features resulting in improved performance in several applications. However, most state-of-the-art methods use naive fusion which processes feature streams independently, ignoring possible long-term dependencies within the data during fusion. In this paper, we present a novel Memory based Attentive Fusion layer, which fuses modes by incorporating both the current features and longterm dependencies in the data, thus allowing the model to understand the relative importance of modes over time. We introduce an explicit memory block within the fusion layer which stores features containing long-term dependencies of the fused data. The feature inputs from uni-modal encoders are fused through attentive composition and transformation followed by naive fusion of the resultant memory derived features with layer inputs. Following state-of-the-art methods, we have evaluated the performance and the generalizability of the proposed fusion approach on two different datasets with different modalities. In our experiments, we replace the naive fusion layer in benchmark networks with our proposed layer to enable a fair comparison. Experimental results indicate that the MBAF layer can generalise across different modalities and networks to enhance fusion and improve performance.
△ Less
Submitted 23 October, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
A Robust Interpretable Deep Learning Classifier for Heart Anomaly Detection Without Segmentation
Authors:
Theekshana Dissanayake,
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Houman Ghaemmaghami,
Clinton Fookes
Abstract:
Traditionally, abnormal heart sound classification is framed as a three-stage process. The first stage involves segmenting the phonocardiogram to detect fundamental heart sounds; after which features are extracted and classification is performed. Some researchers in the field argue the segmentation step is an unwanted computational burden, whereas others embrace it as a prior step to feature extra…
▽ More
Traditionally, abnormal heart sound classification is framed as a three-stage process. The first stage involves segmenting the phonocardiogram to detect fundamental heart sounds; after which features are extracted and classification is performed. Some researchers in the field argue the segmentation step is an unwanted computational burden, whereas others embrace it as a prior step to feature extraction. When comparing accuracies achieved by studies that have segmented heart sounds before analysis with those who have overlooked that step, the question of whether to segment heart sounds before feature extraction is still open. In this study, we explicitly examine the importance of heart sound segmentation as a prior step for heart sound classification, and then seek to apply the obtained insights to propose a robust classifier for abnormal heart sound detection. Furthermore, recognizing the pressing need for explainable Artificial Intelligence (AI) models in the medical domain, we also unveil hidden representations learned by the classifier using model interpretation techniques. Experimental results demonstrate that the segmentation plays an essential role in abnormal heart sound classification. Our new classifier is also shown to be robust, stable and most importantly, explainable, with an accuracy of almost 100% on the widely used PhysioNet dataset.
△ Less
Submitted 29 September, 2020; v1 submitted 21 May, 2020;
originally announced May 2020.
-
Heart Sound Segmentation using Bidirectional LSTMs with Attention
Authors:
Tharindu Fernando,
Houman Ghaemmaghami,
Simon Denman,
Sridha Sridharan,
Nayyar Hussain,
Clinton Fookes
Abstract:
This paper proposes a novel framework for the segmentation of phonocardiogram (PCG) signals into heart states, exploiting the temporal evolution of the PCG as well as considering the salient information that it provides for the detection of the heart state. We propose the use of recurrent neural networks and exploit recent advancements in attention based learning to segment the PCG signal. This al…
▽ More
This paper proposes a novel framework for the segmentation of phonocardiogram (PCG) signals into heart states, exploiting the temporal evolution of the PCG as well as considering the salient information that it provides for the detection of the heart state. We propose the use of recurrent neural networks and exploit recent advancements in attention based learning to segment the PCG signal. This allows the network to identify the most salient aspects of the signal and disregard uninformative information. The proposed method attains state-of-the-art performance on multiple benchmarks including both human and animal heart recordings. Furthermore, we empirically analyse different feature combinations including envelop features, wavelet and Mel Frequency Cepstral Coefficients (MFCC), and provide quantitative measurements that explore the importance of different features in the proposed approach. We demonstrate that a recurrent neural network coupled with attention mechanisms can effectively learn from irregular and noisy PCG recordings. Our analysis of different feature combinations shows that MFCC features and their derivatives offer the best performance compared to classical wavelet and envelop features. Heart sound segmentation is a crucial pre-processing step for many diagnostic applications. The proposed method provides a cost effective alternative to labour extensive manual segmentation, and provides a more accurate segmentation than existing methods. As such, it can improve the performance of further analysis including the detection of murmurs and ejection clicks. The proposed method is also applicable for detection and segmentation of other one dimensional biomedical signals.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection
Authors:
Tharindu Fernando,
Sridha Sridharan,
Mitchell McLaren,
Darshana Priyasad,
Simon Denman,
Clinton Fookes
Abstract:
This paper presents a novel framework for Speech Activity Detection (SAD). Inspired by the recent success of multi-task learning approaches in the speech processing domain, we propose a novel joint learning framework for SAD. We utilise generative adversarial networks to automatically learn a loss function for joint prediction of the frame-wise speech/ non-speech classifications together with the…
▽ More
This paper presents a novel framework for Speech Activity Detection (SAD). Inspired by the recent success of multi-task learning approaches in the speech processing domain, we propose a novel joint learning framework for SAD. We utilise generative adversarial networks to automatically learn a loss function for joint prediction of the frame-wise speech/ non-speech classifications together with the next audio segment. In order to exploit the temporal relationships within the input signal, we propose a temporal discriminator which aims to ensure that the predicted signal is temporally consistent. We evaluate the proposed framework on multiple public benchmarks, including NIST OpenSAT' 17, AMI Meeting and HAVIC, where we demonstrate its capability to outperform state-of-the-art SAD approaches. Furthermore, our cross-database evaluations demonstrate the robustness of the proposed approach across different languages, accents, and acoustic environments.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Neural Memory Networks for Seizure Type Classification
Authors:
David Ahmedt-Aristizabal,
Tharindu Fernando,
Simon Denman,
Lars Petersson,
Matthew J. Aburn,
Clinton Fookes
Abstract:
Classification of seizure type is a key step in the clinical process for evaluating an individual who presents with seizures. It determines the course of clinical diagnosis and treatment, and its impact stretches beyond the clinical domain to epilepsy research and the development of novel therapies. Automated identification of seizure type may facilitate understanding of the disease, and seizure d…
▽ More
Classification of seizure type is a key step in the clinical process for evaluating an individual who presents with seizures. It determines the course of clinical diagnosis and treatment, and its impact stretches beyond the clinical domain to epilepsy research and the development of novel therapies. Automated identification of seizure type may facilitate understanding of the disease, and seizure detection and prediction has been the focus of recent research that has sought to exploit the benefits of machine learning and deep learning architectures. Nevertheless, there is not yet a definitive solution for automating the classification of seizure type, a task that must currently be performed by an expert epileptologist. Inspired by recent advances in neural memory networks (NMNs), we introduce a novel approach for the classification of seizure type using electrophysiological data. We first explore the performance of traditional deep learning techniques which use convolutional and recurrent neural networks, and enhance these architectures by using external memory modules with trainable neural plasticity. We show that our model achieves a state-of-the-art weighted F1 score of 0.945 for seizure type classification on the TUH EEG Seizure Corpus with the IBM TUSZ preprocessed data. This work highlights the potential of neural memory networks to support the field of epilepsy research, along with biomedical research and signal analysis more broadly.
△ Less
Submitted 29 January, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Exploiting Human Social Cognition for the Detection of Fake and Fraudulent Faces via Memory Networks
Authors:
Tharindu Fernando,
Clinton Fookes,
Simon Denman,
Sridha Sridharan
Abstract:
Advances in computer vision have brought us to the point where we have the ability to synthesise realistic fake content. Such approaches are seen as a source of disinformation and mistrust, and pose serious concerns to governments around the world. Convolutional Neural Networks (CNNs) demonstrate encouraging results when detecting fake images that arise from the specific type of manipulation they…
▽ More
Advances in computer vision have brought us to the point where we have the ability to synthesise realistic fake content. Such approaches are seen as a source of disinformation and mistrust, and pose serious concerns to governments around the world. Convolutional Neural Networks (CNNs) demonstrate encouraging results when detecting fake images that arise from the specific type of manipulation they are trained on. However, this success has not transitioned to unseen manipulation types, resulting in a significant gap in the line-of-defense. We propose a Hierarchical Memory Network (HMN) architecture, which is able to successfully detect faked faces by utilising knowledge stored in neural memories as well as visual cues to reason about the perceived face and anticipate its future semantic embeddings. This renders a generalisable face tampering detection framework. Experimental results demonstrate the proposed approach achieves superior performance for fake and fraudulent face detection compared to the state-of-the-art.
△ Less
Submitted 17 November, 2019;
originally announced November 2019.
-
Neural Memory Plasticity for Anomaly Detection
Authors:
Tharindu Fernando,
Simon Denman,
David Ahmedt-Aristizabal,
Sridha Sridharan,
Kristin Laurens,
Patrick Johnston,
Clinton Fookes
Abstract:
In the domain of machine learning, Neural Memory Networks (NMNs) have recently achieved impressive results in a variety of application areas including visual question answering, trajectory prediction, object tracking, and language modelling. However, we observe that the attention based knowledge retrieval mechanisms used in current NMNs restricts them from achieving their full potential as the att…
▽ More
In the domain of machine learning, Neural Memory Networks (NMNs) have recently achieved impressive results in a variety of application areas including visual question answering, trajectory prediction, object tracking, and language modelling. However, we observe that the attention based knowledge retrieval mechanisms used in current NMNs restricts them from achieving their full potential as the attention process retrieves information based on a set of static connection weights. This is suboptimal in a setting where there are vast differences among samples in the data domain; such as anomaly detection where there is no consistent criteria for what constitutes an anomaly. In this paper, we propose a plastic neural memory access mechanism which exploits both static and dynamic connection weights in the memory read, write and output generation procedures. We demonstrate the effectiveness and flexibility of the proposed memory model in three challenging anomaly detection tasks in the medical domain: abnormal EEG identification, MRI tumour type classification and schizophrenia risk detection in children. In all settings, the proposed approach outperforms the current state-of-the-art. Furthermore, we perform an in-depth analysis demonstrating the utility of neural plasticity for the knowledge retrieval process and provide evidence on how the proposed memory model generates sparse yet informative memory outputs.
△ Less
Submitted 11 October, 2019;
originally announced October 2019.
-
Coupled Generative Adversarial Network for Continuous Fine-grained Action Segmentation
Authors:
Harshala Gammulle,
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and Auxiliary GAN, where the Action GAN is trained to operate over the current RGB frame while the Auxiliary GAN utilises supplementary information such as depth or opt…
▽ More
We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and Auxiliary GAN, where the Action GAN is trained to operate over the current RGB frame while the Auxiliary GAN utilises supplementary information such as depth or optical flow. The goal of both GANs is to generate similar `action codes', a vector representation of the current action. To facilitate this process a context extractor that incorporates data and recent outputs from both modes is used to extract context information to aid recognition. The result is a recurrent GAN architecture which learns a task specific loss function from multiple feature modalities. Extensive evaluations on variants of the proposed model to show the importance of utilising different information streams such as context and auxiliary information in the proposed network; and show that our model is capable of outperforming state-of-the-art methods for three widely used datasets: 50 Salads, MERL Shop** and Georgia Tech Egocentric Activities, comprising both static and dynamic camera settings.
△ Less
Submitted 19 September, 2019;
originally announced September 2019.
-
A study on the Interpretability of Neural Retrieval Models using DeepSHAP
Authors:
Zeon Trevor Fernando,
Jaspreet Singh,
Avishek Anand
Abstract:
A recent trend in IR has been the usage of neural networks to learn retrieval models for text based adhoc search. While various approaches and architectures have yielded significantly better performance than traditional retrieval models such as BM25, it is still difficult to understand exactly why a document is relevant to a query. In the ML community several approaches for explaining decisions ma…
▽ More
A recent trend in IR has been the usage of neural networks to learn retrieval models for text based adhoc search. While various approaches and architectures have yielded significantly better performance than traditional retrieval models such as BM25, it is still difficult to understand exactly why a document is relevant to a query. In the ML community several approaches for explaining decisions made by deep neural networks have been proposed -- including DeepSHAP which modifies the DeepLift algorithm to estimate the relative importance (shapley values) of input features for a given decision by comparing the activations in the network for a given image against the activations caused by a reference input. In image classification, the reference input tends to be a plain black image. While DeepSHAP has been well studied for image classification tasks, it remains to be seen how we can adapt it to explain the output of Neural Retrieval Models (NRMs). In particular, what is a good "black" image in the context of IR? In this paper we explored various reference input document construction techniques. Additionally, we compared the explanations generated by DeepSHAP to LIME (a model agnostic approach) and found that the explanations differ considerably. Our study raises concerns regarding the robustness and accuracy of explanations produced for NRMs. With this paper we aim to shed light on interesting problems surrounding interpretability in NRMs and highlight areas of future work.
△ Less
Submitted 15 July, 2019;
originally announced July 2019.
-
Memory Augmented Deep Generative models for Forecasting the Next Shot Location in Tennis
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neu…
▽ More
This paper presents a novel framework for predicting shot location and type in tennis. Inspired by recent neuroscience discoveries we incorporate neural memory modules to model the episodic and semantic memory components of a tennis player. We propose a Semi Supervised Generative Adversarial Network architecture that couples these memory models with the automatic feature learning power of deep neural networks and demonstrate methodologies for learning player level behavioural patterns with the proposed framework. We evaluate the effectiveness of the proposed model on tennis tracking data from the 2012 Australian Tennis open and exhibit applications of the proposed method in discovering how players adapt their style depending on the match context.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
GD-GAN: Generative Adversarial Networks for Trajectory Prediction and Group Detection in Crowds
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
This paper presents a novel deep learning framework for human trajectory prediction and detecting social group membership in crowds. We introduce a generative adversarial pipeline which preserves the spatio-temporal structure of the pedestrian's neighbourhood, enabling us to extract relevant attributes describing their social identity. We formulate the group detection task as an unsupervised learn…
▽ More
This paper presents a novel deep learning framework for human trajectory prediction and detecting social group membership in crowds. We introduce a generative adversarial pipeline which preserves the spatio-temporal structure of the pedestrian's neighbourhood, enabling us to extract relevant attributes describing their social identity. We formulate the group detection task as an unsupervised learning problem, obviating the need for supervised learning of group memberships via hand labeled databases, allowing us to directly employ the proposed framework in different surveillance settings. We evaluate the proposed trajectory prediction and group detection frameworks on multiple public benchmarks, and for both tasks the proposed method demonstrates its capability to better anticipate human sociological behaviour compared to the existing state-of-the-art methods.
△ Less
Submitted 18 December, 2018;
originally announced December 2018.
-
LogCanvas: Visualizing Search History Using Knowledge Graphs
Authors:
Luyan Xu,
Zeon Trevor Fernando,
Xuan Zhou,
Wolfgang Nejdl
Abstract:
In this demo paper, we introduce LogCanvas, a platform for user search history visualisation. Different from the existing visualisation tools, LogCanvas focuses on hel** users re-construct the semantic relationship among their search activities. LogCanvas segments a user's search history into different sessions and generates a knowledge graph to represent the information exploration process in e…
▽ More
In this demo paper, we introduce LogCanvas, a platform for user search history visualisation. Different from the existing visualisation tools, LogCanvas focuses on hel** users re-construct the semantic relationship among their search activities. LogCanvas segments a user's search history into different sessions and generates a knowledge graph to represent the information exploration process in each session. A knowledge graph is composed of the most important concepts or entities discovered by each search query as well as their relationships. It thus captures the semantic relationship among the queries. LogCanvas offers a session timeline viewer and a snippets viewer to enable users to re-find their previous search results efficiently. LogCanvas also provides a collaborative perspective to support a group of users in sharing search results and experience.
△ Less
Submitted 15 August, 2018;
originally announced August 2018.
-
Pedestrian Trajectory Prediction with Structured Memory Hierarchies
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
This paper presents a novel framework for human trajectory prediction based on multimodal data (video and radar). Motivated by recent neuroscience discoveries, we propose incorporating a structured memory component in the human trajectory prediction pipeline to capture historical information to improve performance. We introduce structured LSTM cells for modelling the memory content hierarchically,…
▽ More
This paper presents a novel framework for human trajectory prediction based on multimodal data (video and radar). Motivated by recent neuroscience discoveries, we propose incorporating a structured memory component in the human trajectory prediction pipeline to capture historical information to improve performance. We introduce structured LSTM cells for modelling the memory content hierarchically, preserving the spatiotemporal structure of the information and enabling us to capture both short-term and long-term context. We demonstrate how this architecture can be extended to integrate salient information from multiple modalities to automatically store and retrieve important information for decision making without any supervision. We evaluate the effectiveness of the proposed models on a novel multimodal dataset that we introduce, consisting of 40,000 pedestrian trajectories, acquired jointly from a radar system and a CCTV camera system installed in a public place. The performance is also evaluated on the publicly available New York Grand Central pedestrian database. In both settings, the proposed models demonstrate their capability to better anticipate future pedestrian motion compared to existing state of the art.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
Meta-Learning by the Baldwin Effect
Authors:
Chrisantha Thomas Fernando,
Jakub Sygnowski,
Simon Osindero,
Jane Wang,
Tom Schaul,
Denis Teplyashin,
Pablo Sprechmann,
Alexander Pritzel,
Andrei A. Rusu
Abstract:
The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by sha** the hyperparameters and the initi…
▽ More
The scope of the Baldwin effect was recently called into question by two papers that closely examined the seminal work of Hinton and Nowlan. To this date there has been no demonstration of its necessity in empirically challenging tasks. Here we show that the Baldwin effect is capable of evolving few-shot supervised and reinforcement learning mechanisms, by sha** the hyperparameters and the initial parameters of deep learning algorithms. Furthermore it can genetically accommodate strong learning biases on the same set of problems as a recent machine learning algorithm called MAML "Model Agnostic Meta-Learning" which uses second-order gradients instead of evolution to learn a set of reference parameters (initial weights) that can allow rapid adaptation to tasks sampled from a distribution. Whilst in simple cases MAML is more data efficient than the Baldwin effect, the Baldwin effect is more general in that it does not require gradients to be backpropagated to the reference parameters or hyperparameters, and permits effectively any number of gradient updates in the inner loop. The Baldwin effect learns strong learning dependent biases, rather than purely genetically accommodating fixed behaviours in a learning independent manner.
△ Less
Submitted 22 June, 2018; v1 submitted 6 June, 2018;
originally announced June 2018.
-
Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-Agent Trajectories
Authors:
Tharindu Fernando,
Sridha Sridharan,
Clinton Fookes,
Simon Denman
Abstract:
With the explosion in the availability of spatio-temporal tracking data in modern sports, there is an enormous opportunity to better analyse, learn and predict important events in adversarial group environments. In this paper, we propose a deep decision tree architecture for discriminative dictionary learning from adversarial multi-agent trajectories. We first build up a hierarchy for the tree str…
▽ More
With the explosion in the availability of spatio-temporal tracking data in modern sports, there is an enormous opportunity to better analyse, learn and predict important events in adversarial group environments. In this paper, we propose a deep decision tree architecture for discriminative dictionary learning from adversarial multi-agent trajectories. We first build up a hierarchy for the tree structure by adding each layer and performing feature weight based clustering in the forward pass. We then fine tune the player role weights using back propagation. The hierarchical architecture ensures the interpretability and the integrity of the group representation. The resulting architecture is a decision tree, with leaf-nodes capturing a dictionary of multi-agent group interactions. Due to the ample volume of data available, we focus on soccer tracking data, although our approach can be used in any adversarial multi-agent domain. We present applications of proposed method for simulating soccer games as well as evaluating and quantifying team strategies.
△ Less
Submitted 14 May, 2018;
originally announced May 2018.
-
Learning Temporal Strategic Relationships using Generative Adversarial Imitation Learning
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
This paper presents a novel framework for automatic learning of complex strategies in human decision making. The task that we are interested in is to better facilitate long term planning for complex, multi-step events. We observe temporal relationships at the subtask level of expert demonstrations, and determine the different strategies employed in order to successfully complete a task. To capture…
▽ More
This paper presents a novel framework for automatic learning of complex strategies in human decision making. The task that we are interested in is to better facilitate long term planning for complex, multi-step events. We observe temporal relationships at the subtask level of expert demonstrations, and determine the different strategies employed in order to successfully complete a task. To capture the relationship between the subtasks and the overall goal, we utilise two external memory modules, one for capturing dependencies within a single expert demonstration, such as the sequential relationship among different sub tasks, and a global memory module for modelling task level characteristics such as best practice employed by different humans based on their domain expertise. Furthermore, we demonstrate how the hidden state representation of the memory can be used as a reward signal to smooth the state transitions, eradicating subtle changes. We evaluate the effectiveness of the proposed model for an autonomous highway driving application, where we demonstrate its capability to learn different expert policies and outperform state-of-the-art methods. The scope in industrial applications extends to any robotics and automation application which requires learning from complex demonstrations containing series of subtasks.
△ Less
Submitted 13 May, 2018;
originally announced May 2018.
-
Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
Visual saliency patterns are the result of a variety of factors aside from the image being parsed, however existing approaches have ignored these. To address this limitation, we propose a novel saliency estimation model which leverages the semantic modelling power of conditional generative adversarial networks together with memory architectures which capture the subject's behavioural patterns and…
▽ More
Visual saliency patterns are the result of a variety of factors aside from the image being parsed, however existing approaches have ignored these. To address this limitation, we propose a novel saliency estimation model which leverages the semantic modelling power of conditional generative adversarial networks together with memory architectures which capture the subject's behavioural patterns and task dependent factors. We make contributions aiming to bridge the gap between bottom-up feature learning capabilities in modern deep learning architectures and traditional top-down hand-crafted features based methods for task specific saliency modelling. The conditional nature of the proposed framework enables us to learn contextual semantics and relationships among different tasks together, instead of learning them separately for each task. Our studies not only shed light on a novel application area for generative adversarial networks, but also emphasise the importance of task specific saliency modelling and demonstrate the plausibility of fully capturing this context via an augmented memory architecture.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Advers…
▽ More
Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.
△ Less
Submitted 8 March, 2018;
originally announced March 2018.
-
Exploring Cross-Domain Data Dependencies for Smart Homes to Improve Energy Efficiency
Authors:
Shamaila Iram,
Terrence Fernando,
May Bassanino
Abstract:
Over the past decade, the idea of smart homes has been conceived as a potential solution to counter energy crises or to at least mitigate its intensive destructive consequences in the residential building sector.
Over the past decade, the idea of smart homes has been conceived as a potential solution to counter energy crises or to at least mitigate its intensive destructive consequences in the residential building sector.
△ Less
Submitted 11 October, 2017;
originally announced October 2017.
-
Tree Memory Networks for Modelling Long-term Temporal Dependencies
Authors:
Tharindu Fernando,
Simon Denman,
Aaron McFadyen,
Sridha Sridharan,
Clinton Fookes
Abstract:
In the domain of sequence modelling, Recurrent Neural Networks (RNN) have been capable of achieving impressive results in a variety of application areas including visual question answering, part-of-speech tagging and machine translation. However this success in modelling short term dependencies has not successfully transitioned to application areas such as trajectory prediction, which require capt…
▽ More
In the domain of sequence modelling, Recurrent Neural Networks (RNN) have been capable of achieving impressive results in a variety of application areas including visual question answering, part-of-speech tagging and machine translation. However this success in modelling short term dependencies has not successfully transitioned to application areas such as trajectory prediction, which require capturing both short term and long term relationships. In this paper, we propose a Tree Memory Network (TMN) for modelling long term and short term relationships in sequence-to-sequence map** problems. The proposed network architecture is composed of an input module, controller and a memory module. In contrast to related literature, which models the memory as a sequence of historical states, we model the memory as a recursive tree structure. This structure more effectively captures temporal dependencies across both short term and long term sequences using its hierarchical structure. We demonstrate the effectiveness and flexibility of the proposed TMN in two practical problems, aircraft trajectory modelling and pedestrian trajectory modelling in a surveillance setting, and in both cases we outperform the current state-of-the-art. Furthermore, we perform an in depth analysis on the evolution of the memory module content over time and provide visual evidence on how the proposed TMN is able to map both long term and short term relationships efficiently via a hierarchical structure.
△ Less
Submitted 20 May, 2018; v1 submitted 12 March, 2017;
originally announced March 2017.
-
Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection
Authors:
Tharindu Fernando,
Simon Denman,
Sridha Sridharan,
Clinton Fookes
Abstract:
As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deep-learning has demonstrated the power of learning features directly fr…
▽ More
As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deep-learning has demonstrated the power of learning features directly from the data; and related research in recurrent neural networks has shown exemplary results in sequence-to-sequence problems such as neural machine translation and neural image caption generation. Motivated by these approaches, we propose a novel method to predict the future motion of a pedestrian given a short history of their, and their neighbours, past behaviour. The novelty of the proposed method is the combined attention model which utilises both "soft attention" as well as "hard-wired" attention in order to map the trajectory information from the local neighbourhood to the future positions of the pedestrian of interest. We illustrate how a simple approximation of attention weights (i.e hard-wired) can be merged together with soft attention weights in order to make our model applicable for challenging real world scenarios with hundreds of neighbours. The navigational capability of the proposed method is tested on two challenging publicly available surveillance databases where our model outperforms the current-state-of-the-art methods. Additionally, we illustrate how the proposed architecture can be directly applied for the task of abnormal event detection without handcrafting the features.
△ Less
Submitted 17 February, 2017;
originally announced February 2017.
-
ArchiveWeb: collaboratively extending and exploring web archive collections - How would you like to work with your collections?
Authors:
Zeon Trevor Fernando,
Ivana Marenzi,
Wolfgang Nejdl
Abstract:
Curated web archive collections contain focused digital content which is collected by archiving organizations, groups, and individuals to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections through the ArchiveWeb system.…
▽ More
Curated web archive collections contain focused digital content which is collected by archiving organizations, groups, and individuals to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections through the ArchiveWeb system. ArchiveWeb has been developed using an iterative evaluation-driven design-based research approach, with considerable user feedback at all stages. The first part of this paper describes the important insights we gained from our initial requirements engineering phase during the first year of the project and the main functionalities of the current ArchiveWeb system for searching, constructing, exploring, and discussing web archive collections. The second part summarizes the feedback we received on this version from archiving organizations and libraries, as well as our corresponding plans for improving and extending the system for the next release.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
ArchiveWeb: Collaboratively Extending and Exploring Web Archive Collections
Authors:
Zeon Trevor Fernando,
Ivana Marenzi,
Wolfgang Nejdl,
Rishita Kalyani
Abstract:
Curated web archive collections contain focused digital contents which are collected by archiving organizations to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections through the ArchiveWeb system. ArchiveWeb has been de…
▽ More
Curated web archive collections contain focused digital contents which are collected by archiving organizations to provide a representative sample covering specific topics and events to preserve them for future exploration and analysis. In this paper, we discuss how to best support collaborative construction and exploration of these collections through the ArchiveWeb system. ArchiveWeb has been developed using an iterative evaluation-driven design-based research approach, with considerable user feedback at all stages. This paper describes the functionalities of our current prototype for searching, constructing, exploring and discussing web archive collections, as well as feedback on this prototype from seven archiving organizations, and our plans for improving the next release of the system.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
A self-tuning Firefly algorithm to tune the parameters of Ant Colony System (ACSFA)
Authors:
M. K. A. Ariyaratne,
T. G. I. Fernando,
S. Weerakoon
Abstract:
Ant colony system (ACS) is a promising approach which has been widely used in problems such as Travelling Salesman Problems (TSP), Job shop scheduling problems (JSP) and Quadratic Assignment problems (QAP). In its original implementation, parameters of the algorithm were selected by trial and error approach. Over the last few years, novel approaches have been proposed on adapting the parameters of…
▽ More
Ant colony system (ACS) is a promising approach which has been widely used in problems such as Travelling Salesman Problems (TSP), Job shop scheduling problems (JSP) and Quadratic Assignment problems (QAP). In its original implementation, parameters of the algorithm were selected by trial and error approach. Over the last few years, novel approaches have been proposed on adapting the parameters of ACS in improving its performance. The aim of this paper is to use a framework introduced for self-tuning optimization algorithms combined with the firefly algorithm (FA) to tune the parameters of the ACS solving symmetric TSP problems. The FA optimizes the problem specific parameters of ACS while the parameters of the FA are tuned by the selected framework itself. With this approach, the user neither has to work with the parameters of ACS nor the parameters of FA. Using common symmetric TSP problems we demonstrate that the framework fits well for the ACS. A detailed statistical analysis further verifies the goodness of the new ACS over the existing ACS and also of the other techniques used to tune the parameters of ACS.
△ Less
Submitted 26 October, 2016;
originally announced October 2016.
-
LearnWeb-OER: Improving Accessibility of Open Educational Resources
Authors:
Jaspreet Singh,
Zeon Trevor Fernando,
Saniya Chawla
Abstract:
In addition to user-generated content, Open Educational Resources are increasingly made available on the Web by several institutions and organizations with the aim of being re-used. Nevertheless, it is still difficult for users to find appropriate resources for specific learning scenarios among the vast amount offered on the Web. Our goal is to give users the opportunity to search for authentic re…
▽ More
In addition to user-generated content, Open Educational Resources are increasingly made available on the Web by several institutions and organizations with the aim of being re-used. Nevertheless, it is still difficult for users to find appropriate resources for specific learning scenarios among the vast amount offered on the Web. Our goal is to give users the opportunity to search for authentic resources from the Web and reuse them in a learning context. The LearnWeb-OER platform enhances collaborative searching and sharing of educational resources providing specific means and facilities for education. In the following, we provide a description of the functionalities that support users in collaboratively collecting, selecting, annotating and discussing search results and learning resources.
△ Less
Submitted 9 September, 2015;
originally announced September 2015.
-
Capturing, Documenting and Visualizing Search Contexts for building Multimedia Corpora
Authors:
Zeon Trevor Fernando
Abstract:
In Social Science research, multimedia documents are often collected to answer particular research questions like: "Which of the aesthetic properties of a photo are considered important on the web" or "How has Street Art developed over the past 50 years". Therefore, a researcher generally issues multiple queries to a number of search engines. This activity may span over long time intervals and res…
▽ More
In Social Science research, multimedia documents are often collected to answer particular research questions like: "Which of the aesthetic properties of a photo are considered important on the web" or "How has Street Art developed over the past 50 years". Therefore, a researcher generally issues multiple queries to a number of search engines. This activity may span over long time intervals and results in a collection which can be further analyzed. Documenting the collection building process which includes the context of the carried out searches is imperative for social scientists to reproduce their research. Such context documentation consists of several user actions and search attributes like: the issued queries; the results clicked and saved; duration a particular result was viewed for; the set of results that was displayed but neither clicked, nor saved; as well as user annotations like comments or tags. In this work we will describe a search process tracking module and a search history visualization module. These modules can be integrated into keyword based search systems through a REST API which was developed to help capture, document and revisit past search contexts while building a web corpora. Finally, we detail the implementation of how the module was integrated into the LearnWeb2.0 platform - a multimedia web2.0 search and sharing application which can obtain resources from various web2.0 tools such as Youtube, Bing, Flickr, etc using keyword search.
△ Less
Submitted 12 March, 2015;
originally announced March 2015.