-
Challenges and Opportunities of NLP for HR Applications: A Discussion Paper
Authors:
Jochen L. Leidner,
Mark Stevenson
Abstract:
Over the course of the recent decade, tremendous progress has been made in the areas of machine learning and natural language processing, which opened up vast areas of potential application use cases, including hiring and human resource management. We review the use cases for text analytics in the realm of human resources/personnel management, including actually realized as well as potential but n…
▽ More
Over the course of the recent decade, tremendous progress has been made in the areas of machine learning and natural language processing, which opened up vast areas of potential application use cases, including hiring and human resource management. We review the use cases for text analytics in the realm of human resources/personnel management, including actually realized as well as potential but not yet implemented ones, and we analyze the opportunities and risks of these.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
RLStop: A Reinforcement Learning Stop** Method for TAR
Authors:
Reem Bin-Hezam,
Mark Stevenson
Abstract:
We present RLStop, a novel Technology Assisted Review (TAR) stop** rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark d…
▽ More
We present RLStop, a novel Technology Assisted Review (TAR) stop** rule based on reinforcement learning that helps minimise the number of documents that need to be manually reviewed within TAR applications. RLStop is trained on example rankings using a reward function to identify the optimal point to stop examining documents. Experiments at a range of target recall levels on multiple benchmark datasets (CLEF e-Health, TREC Total Recall, and Reuters RCV1) demonstrated that RLStop substantially reduces the workload required to screen a document collection for relevance. RLStop outperforms a wide range of alternative approaches, achieving performance close to the maximum possible for the task under some circumstances.
△ Less
Submitted 7 June, 2024; v1 submitted 3 May, 2024;
originally announced May 2024.
-
Document Set Expansion with Positive-Unlabelled Learning Using Intractable Density Estimation
Authors:
Haiyang Zhang,
Qiuyi Chen,
Yuanjie Zou,
Yushan Pan,
Jia Wang,
Mark Stevenson
Abstract:
The Document Set Expansion (DSE) task involves identifying relevant documents from large collections based on a limited set of example documents. Previous research has highlighted Positive and Unlabeled (PU) learning as a promising approach for this task. However, most PU methods rely on the unrealistic assumption of knowing the class prior for positive samples in the collection. To address this l…
▽ More
The Document Set Expansion (DSE) task involves identifying relevant documents from large collections based on a limited set of example documents. Previous research has highlighted Positive and Unlabeled (PU) learning as a promising approach for this task. However, most PU methods rely on the unrealistic assumption of knowing the class prior for positive samples in the collection. To address this limitation, this paper introduces a novel PU learning framework that utilizes intractable density estimation models. Experiments conducted on PubMed and Covid datasets in a transductive setting showcase the effectiveness of the proposed method for DSE. Code is available from https://github.com/Beautifuldog01/Document-set-expansion-puDE.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Document Set Expansion with Positive-Unlabeled Learning: A Density Estimation-based Approach
Authors:
Haiyang Zhang,
Qiuyi Chen,
Yuanjie Zou,
Yushan Pan,
Jia Wang,
Mark Stevenson
Abstract:
Document set expansion aims to identify relevant documents from a large collection based on a small set of documents that are on a fine-grained topic. Previous work shows that PU learning is a promising method for this task. However, some serious issues remain unresolved, i.e. typical challenges that PU methods suffer such as unknown class prior and imbalanced data, and the need for transductive e…
▽ More
Document set expansion aims to identify relevant documents from a large collection based on a small set of documents that are on a fine-grained topic. Previous work shows that PU learning is a promising method for this task. However, some serious issues remain unresolved, i.e. typical challenges that PU methods suffer such as unknown class prior and imbalanced data, and the need for transductive experimental settings. In this paper, we propose a novel PU learning framework based on density estimation, called puDE, that can handle the above issues. The advantage of puDE is that it neither constrained to the SCAR assumption and nor require any class prior knowledge. We demonstrate the effectiveness of the proposed method using a series of real-world datasets and conclude that our method is a better alternative for the DSE task.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Brodsky et al's defence does not work
Authors:
P. M. Stevenson
Abstract:
In their defence of "maximum conformality" methods, Brodsky et al make the astonishing claim that any RG transformation a'=a(1+V_1 a + ...) in QCD must have V_1 proportional to b=(33-2n_f)/6. It is well known that this is not true. I emphasize again the correctness and central importance of the Celmaster-Gonsalves relation for the prescription dependence of the Lambda parameter.
In their defence of "maximum conformality" methods, Brodsky et al make the astonishing claim that any RG transformation a'=a(1+V_1 a + ...) in QCD must have V_1 proportional to b=(33-2n_f)/6. It is well known that this is not true. I emphasize again the correctness and central importance of the Celmaster-Gonsalves relation for the prescription dependence of the Lambda parameter.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Combining Counting Processes and Classification Improves a Stop** Rule for Technology Assisted Review
Authors:
Reem Bin-Hezam,
Mark Stevenson
Abstract:
Technology Assisted Review (TAR) stop** rules aim to reduce the cost of manually assessing documents for relevance by minimising the number of documents that need to be examined to ensure a desired level of recall. This paper extends an effective stop** rule using information derived from a text classifier that can be trained without the need for any additional annotation. Experiments on multi…
▽ More
Technology Assisted Review (TAR) stop** rules aim to reduce the cost of manually assessing documents for relevance by minimising the number of documents that need to be examined to ensure a desired level of recall. This paper extends an effective stop** rule using information derived from a text classifier that can be trained without the need for any additional annotation. Experiments on multiple data sets (CLEF e-Health, TREC Total Recall, TREC Legal and RCV1) showed that the proposed approach consistently improves performance and outperforms several alternative methods.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Stop** Methods for Technology Assisted Reviews based on Point Processes
Authors:
Mark Stevenson,
Reem Bin-Hezam
Abstract:
Technology Assisted Review (TAR), which aims to reduce the effort required to screen collections of documents for relevance, is used to develop systematic reviews of medical evidence and identify documents that must be disclosed in response to legal proceedings. Stop** methods are algorithms which determine when to stop screening documents during the TAR process, hel** to ensure that workload…
▽ More
Technology Assisted Review (TAR), which aims to reduce the effort required to screen collections of documents for relevance, is used to develop systematic reviews of medical evidence and identify documents that must be disclosed in response to legal proceedings. Stop** methods are algorithms which determine when to stop screening documents during the TAR process, hel** to ensure that workload is minimised while still achieving a high level of recall. This paper proposes a novel stop** method based on point processes, which are statistical models that can be used to represent the occurrence of random events. The approach uses rate functions to model the occurrence of relevant documents in the ranking and compares four candidates, including one that has not previously been used for this purpose (hyperbolic). Evaluation is carried out using standard datasets (CLEF e-Health, TREC Total Recall, TREC Legal), and this work is the first to explore stop** method robustness by reporting performance on a range of rankings of varying effectiveness. Results show that the proposed method achieves the desired level of recall without requiring an excessive number of documents to be examined in the majority of cases and also compares well against multiple alternative approaches.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Spin-photon entanglement with direct photon emission in the telecom C-band
Authors:
P. Laccotripes,
T. Müller,
R. M. Stevenson,
J. Skiba-Szymanska,
D. A. Ritchie,
A. J. Shields
Abstract:
The ever-evolving demands for computational power and for a securely connected world dictate the development of quantum networks where entanglement is distributed between connected parties. Solid-state quantum emitters in the telecom C-band are a promising platform for quantum communication applications due to the minimal absorption of photons at these wavelengths, "on-demand" generation of single…
▽ More
The ever-evolving demands for computational power and for a securely connected world dictate the development of quantum networks where entanglement is distributed between connected parties. Solid-state quantum emitters in the telecom C-band are a promising platform for quantum communication applications due to the minimal absorption of photons at these wavelengths, "on-demand" generation of single photon flying qubits, and ease of integration with existing network infrastructure. Here, we use an InAs/InP quantum dot to implement an optically active spin-qubit, based on a negatively charged exciton where the electron spin degeneracy is lifted using a Voigt magnetic field. We investigate the coherent interactions of the spin-qubit system under resonant excitation, demonstrating high fidelity spin initialisation and coherent control using picosecond pulses. We further use these tools to measure the coherence of a single, undisturbed electron spin in our system. Finally, we report the first demonstration of spin-photon entanglement in a solid-state system capable of direct emission into the telecom C-band.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Scalable Label-efficient Footpath Network Generation Using Remote Sensing Data and Self-supervised Learning
Authors:
Xinye Wanyan,
Sachith Seneviratne,
Kerry Nice,
Jason Thompson,
Marcus White,
Nano Langenheim,
Mark Stevenson
Abstract:
Footpath map**, modeling, and analysis can provide important geospatial insights to many fields of study, including transport, health, environment and urban planning. The availability of robust Geographic Information System (GIS) layers can benefit the management of infrastructure inventories, especially at local government level with urban planners responsible for the deployment and maintenance…
▽ More
Footpath map**, modeling, and analysis can provide important geospatial insights to many fields of study, including transport, health, environment and urban planning. The availability of robust Geographic Information System (GIS) layers can benefit the management of infrastructure inventories, especially at local government level with urban planners responsible for the deployment and maintenance of such infrastructure. However, many cities still lack real-time information on the location, connectivity, and width of footpaths, and/or employ costly and manual survey means to gather this information. This work designs and implements an automatic pipeline for generating footpath networks based on remote sensing images using machine learning models. The annotation of segmentation tasks, especially labeling remote sensing images with specialized requirements, is very expensive, so we aim to introduce a pipeline requiring less labeled data. Considering supervised methods require large amounts of training data, we use a self-supervised method for feature representation learning to reduce annotation requirements. Then the pre-trained model is used as the encoder of the U-Net for footpath segmentation. Based on the generated masks, the footpath polygons are extracted and converted to footpath networks which can be loaded and visualized by geographic information systems conveniently. Validation results indicate considerable consistency when compared to manually collected GIS layers. The footpath network generation pipeline proposed in this work is low-cost and extensible, and it can be applied where remote sensing images are available. Github: https://github.com/WennyXY/FootpathSeg.
△ Less
Submitted 17 September, 2023;
originally announced September 2023.
-
Polarization-selective enhancement of telecom wavelength quantum dot transitions in an elliptical bullseye resonator
Authors:
Andrea Barbiero,
Ginny Shooter,
Tina Müller,
Joanna Skiba-Szymanska,
R. Mark Stevenson,
Lucy E. Goff,
David A. Ritchie,
Andrew J. Shields
Abstract:
Semiconductor quantum dots are promising candidates for the generation of nonclassical light. Coupling a quantum dot to a device capable of providing polarization-selective enhancement of optical transitions is highly beneficial for advanced functionalities such as efficient resonant driving schemes or applications based on optical cyclicity. Here, we demonstrate broadband polarization-selective e…
▽ More
Semiconductor quantum dots are promising candidates for the generation of nonclassical light. Coupling a quantum dot to a device capable of providing polarization-selective enhancement of optical transitions is highly beneficial for advanced functionalities such as efficient resonant driving schemes or applications based on optical cyclicity. Here, we demonstrate broadband polarization-selective enhancement by coupling a quantum dot emitting in the telecom O-band to an elliptical bullseye resonator. We report bright single-photon emission with a degree of linear polarization of 96%, Purcell factor of 3.9, and count rates up to 3 MHz. Furthermore, we present a measurement of two-photon interference without any external polarization filtering and demonstrate compatibility with compact Stirling cryocoolers by operating the device at temperatures up to 40 K. These results represent an important step towards practical integration of optimal quantum dot photon sources in deployment-ready setups.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Bio-SIEVE: Exploring Instruction Tuning Large Language Models for Systematic Review Automation
Authors:
Ambrose Robinson,
William Thorne,
Ben P. Wu,
Abdullah Pandor,
Munira Essat,
Mark Stevenson,
Xingyi Song
Abstract:
Medical systematic reviews can be very costly and resource intensive. We explore how Large Language Models (LLMs) can support and be trained to perform literature screening when provided with a detailed set of selection criteria. Specifically, we instruction tune LLaMA and Guanaco models to perform abstract screening for medical systematic reviews. Our best model, Bio-SIEVE, outperforms both ChatG…
▽ More
Medical systematic reviews can be very costly and resource intensive. We explore how Large Language Models (LLMs) can support and be trained to perform literature screening when provided with a detailed set of selection criteria. Specifically, we instruction tune LLaMA and Guanaco models to perform abstract screening for medical systematic reviews. Our best model, Bio-SIEVE, outperforms both ChatGPT and trained traditional approaches, and generalises better across medical domains. However, there remains the challenge of adapting the model to safety-first scenarios. We also explore the impact of multi-task training with Bio-SIEVE-Multi, including tasks such as PICO extraction and exclusion reasoning, but find that it is unable to match single-task Bio-SIEVE's performance. We see Bio-SIEVE as an important step towards specialising LLMs for the biomedical systematic review process and explore its future developmental opportunities. We release our models, code and a list of DOIs to reconstruct our dataset for reproducibility.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
`Maximal conformality' does not work
Authors:
P. M. Stevenson
Abstract:
The so-called "principle of maximal conformality" is ineffective and does nothing to resolve the renormalization-scheme-dependence problem. Some essential facts about that problem are summarized. It is stressed that RG invariance is a symmetry and that any viable method for resolving the scheme-dependence problem should be formulatable in terms of the invariants of that symmetry.
The so-called "principle of maximal conformality" is ineffective and does nothing to resolve the renormalization-scheme-dependence problem. Some essential facts about that problem are summarized. It is stressed that RG invariance is a symmetry and that any viable method for resolving the scheme-dependence problem should be formulatable in terms of the invariants of that symmetry.
△ Less
Submitted 26 October, 2023; v1 submitted 9 August, 2023;
originally announced August 2023.
-
On the Security Vulnerabilities of Text-to-SQL Models
Authors:
Xutan Peng,
Yipeng Zhang,
**gfeng Yang,
Mark Stevenson
Abstract:
Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL…
▽ More
Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.
△ Less
Submitted 11 May, 2024; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Urban feature analysis from aerial remote sensing imagery using self-supervised and semi-supervised computer vision
Authors:
Sachith Seneviratne,
Jasper S. Wijnands,
Kerry Nice,
Haifeng Zhao,
Branislava Godic,
Suzanne Mavoa,
Rajith Vidanaarachchi,
Mark Stevenson,
Leandro Garcia,
Ruth F. Hunter,
Jason Thompson
Abstract:
Analysis of overhead imagery using computer vision is a problem that has received considerable attention in academic literature. Most techniques that operate in this space are both highly specialised and require expensive manual annotation of large datasets. These problems are addressed here through the development of a more generic framework, incorporating advances in representation learning whic…
▽ More
Analysis of overhead imagery using computer vision is a problem that has received considerable attention in academic literature. Most techniques that operate in this space are both highly specialised and require expensive manual annotation of large datasets. These problems are addressed here through the development of a more generic framework, incorporating advances in representation learning which allows for more flexibility in analysing new categories of imagery with limited labeled data. First, a robust representation of an unlabeled aerial imagery dataset was created based on the momentum contrast mechanism. This was subsequently specialised for different tasks by building accurate classifiers with as few as 200 labeled images. The successful low-level detection of urban infrastructure evolution over a 10-year period from 60 million unlabeled images, exemplifies the substantial potential of our approach to advance quantitative urban research.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
High performance single-photon sources at telecom wavelength based on broadband hybrid circular Bragg gratings
Authors:
Andrea Barbiero,
Jan Huwer,
Joanna Skiba-Szymanska,
David J. P. Ellis,
R. Mark Stevenson,
Tina Müller,
Ginny Shooter,
Lucy E. Goff,
David A. Ritchie,
Andrew J. Shields
Abstract:
Semiconductor quantum dots embedded in hybrid circular Bragg gratings are a promising platform for the efficient generation of nonclassical light. The scalable fabrication of multiple devices with similar performance is highly desirable for their practical use as sources of single and entangled photons, while the ability to operate at telecom wavelength is essential for their integration with the…
▽ More
Semiconductor quantum dots embedded in hybrid circular Bragg gratings are a promising platform for the efficient generation of nonclassical light. The scalable fabrication of multiple devices with similar performance is highly desirable for their practical use as sources of single and entangled photons, while the ability to operate at telecom wavelength is essential for their integration with the existing fiber infrastructure. In this work we combine the promising properties of broadband hybrid circular Bragg gratings with a membrane-transfer process performed on 3" wafer scale. We develop and study single-photon sources based on InAs/GaAs quantum dots emitting in the telecom O-band, demonstrating bright single-photon emission with Purcell factor > 5 and count rates up to 10 MHz. Furthermore, we address the question of reproducibility by benchmarking the performance of 10 devices covering a wide spectral range of 50 nm within the O-band.
△ Less
Submitted 26 May, 2022;
originally announced May 2022.
-
Coherent light scattering from a telecom C-band quantum dot
Authors:
L. Wells,
T. Müller,
R. M. Stevenson,
J. Skiba-Szymanska,
D. A. Ritchie,
A. J. Shields
Abstract:
Quantum networks have the potential to transform secure communication via quantum key distribution and enable novel concepts in distributed quantum computing and sensing. Coherent quantum light generation at telecom wavelengths is fundamental for fibre-based network implementations, but Fourier-limited emission and subnatural linewidth photons have so far only been reported from systems operating…
▽ More
Quantum networks have the potential to transform secure communication via quantum key distribution and enable novel concepts in distributed quantum computing and sensing. Coherent quantum light generation at telecom wavelengths is fundamental for fibre-based network implementations, but Fourier-limited emission and subnatural linewidth photons have so far only been reported from systems operating in the visible to near-infrared wavelength range. Here, we use InAs/InP quantum dots to demonstrate photons with coherence times much longer than the Fourier limit at telecom wavelength. Evidence of the responsible elastic laser scattering mechanism is observed in a distinct signature in two-photon interference measurements, and is confirmed using a direct measurement of the emission coherence. Further, we show that even the inelastically scattered photons have coherence times within the error bars of the Fourier limit. Finally, we make direct use of the minimal attenuation in fibre for these photons by measuring two-photon interference after 25 km of fibre, thereby demonstrating indistinguishability of photons emitted about 100 000 excitation cycles apart.
△ Less
Submitted 1 June, 2022; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Self-Supervision, Remote Sensing and Abstraction: Representation Learning Across 3 Million Locations
Authors:
Sachith Seneviratne,
Kerry A. Nice,
Jasper S. Wijnands,
Mark Stevenson,
Jason Thompson
Abstract:
Self-supervision based deep learning classification approaches have received considerable attention in academic literature. However, the performance of such methods on remote sensing imagery domains remains under-explored. In this work, we explore contrastive representation learning methods on the task of imagery-based city classification, an important problem in urban computing. We use satellite…
▽ More
Self-supervision based deep learning classification approaches have received considerable attention in academic literature. However, the performance of such methods on remote sensing imagery domains remains under-explored. In this work, we explore contrastive representation learning methods on the task of imagery-based city classification, an important problem in urban computing. We use satellite and map imagery across 2 domains, 3 million locations and more than 1500 cities. We show that self-supervised methods can build a generalizable representation from as few as 200 cities, with representations achieving over 95\% accuracy in unseen cities with minimal additional training. We also find that the performance discrepancy of such methods, when compared to supervised methods, induced by the domain discrepancy between natural imagery and abstract imagery is significant for remote sensing imagery. We compare all analysis against existing supervised models from academic literature and open-source our models for broader usage and further criticism.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Fine structure splitting analysis of cavity-enhanced telecom-wavelength InAs quantum dots grown on a GaAs(111)A vicinal substrate
Authors:
Andrea Barbiero,
Artur Tuktamyshev,
Geoffrey Pirard,
Jan Huwer,
Tina Müller,
R. Mark Stevenson,
Sergio Bietti,
Stefano Vichi,
Alexey Fedorov,
Gabriel Bester,
Stefano Sanguinetti,
Andrew J. Shields
Abstract:
The effcient generation of entangled photons at telecom wavelength is crucial for the success of many quantum communication protocols and the development of fiber-based quantum networks. Entangled light can be generated by solid state quantum emitters with naturally low fine structure splitting, such as highly symmetric InAs quantum dots (QDs) grown on (111)-oriented surfaces. Incorporating this k…
▽ More
The effcient generation of entangled photons at telecom wavelength is crucial for the success of many quantum communication protocols and the development of fiber-based quantum networks. Entangled light can be generated by solid state quantum emitters with naturally low fine structure splitting, such as highly symmetric InAs quantum dots (QDs) grown on (111)-oriented surfaces. Incorporating this kind of QDs into optical cavities is critical to achieve sufficient signal intensitiesfor applications, but has so far shown major complications. In this work we present droplet epitaxy of telecom-wavelength InAs QDs within an optical cavity on a vicinal (2° miscut) GaAs(111)A substrate. We show a remarkable enhancement of the photon extraction efficiency compared to previous reports together with a reduction of the density that facilitates the isolation of single spectral lines. Moreover, we characterise the exciton fine structure splitting and employ numerical simulations under the framework of the empirical pseudopotential and configuration interaction methods to study the impact of the miscut on the optical properties of the QDs. We demonstrate that the presence of miscut steps influences the polarisation of the excitonic states and introduces a preferential orientation in the $C_{3v}$ symmetry of the surface.
△ Less
Submitted 30 September, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
Design study for an efficient semiconductor quantum light source operating in the telecom C-band based on an electrically-driven circular Bragg grating
Authors:
Andrea Barbiero,
Jan Huwer,
Joanna Skiba-Szymanska,
Tina Müller,
R. Mark Stevenson,
Andrew J. Shields
Abstract:
The development of efficient sources of single photons and entangled photon pairs emitting in the low-loss wavelength region around 1550 nm is crucial for long-distance quantum communication. Moreover, direct fiber coupling and electrical carrier injection are highly desirable for deployment in compact and user-friendly systems integrated with the existing fiber infrastructure. Here we present a d…
▽ More
The development of efficient sources of single photons and entangled photon pairs emitting in the low-loss wavelength region around 1550 nm is crucial for long-distance quantum communication. Moreover, direct fiber coupling and electrical carrier injection are highly desirable for deployment in compact and user-friendly systems integrated with the existing fiber infrastructure. Here we present a detailed design study of circular Bragg gratings etched in InP slabs and operating in the telecom C-band. These devices enable the simultaneous enhancement of the X and XX spectral lines, with collection efficiency in NA=0.65 close to 90% for the wavelength range 1520-1580 nm and Purcell factor up to 15. We also investigate the coupling into single mode fiber, which exceeds 70% in UHNA4. Finally, we propose a modified device design directly compatible with electrical carrier injection, reporting Purcell factors up to 20 and collection efficiency in NA=0.65 close to 70% for the whole telecom C-band.
△ Less
Submitted 17 March, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction
Authors:
Matthew Stevenson,
Christophe Mues,
Cristián Bravo
Abstract:
LiDAR (short for "Light Detection And Ranging" or "Laser Imaging, Detection, And Ranging") technology can be used to provide detailed three-dimensional elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging has been predominantly confined to the environmental and archaeological domains. However, the geographically granular and open-source nature of this data also lends itsel…
▽ More
LiDAR (short for "Light Detection And Ranging" or "Laser Imaging, Detection, And Ranging") technology can be used to provide detailed three-dimensional elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging has been predominantly confined to the environmental and archaeological domains. However, the geographically granular and open-source nature of this data also lends itself to an array of societal, organizational and business applications where geo-demographic type data is utilised. Arguably, the complexity involved in processing this multi-dimensional data has thus far restricted its broader adoption. In this paper, we propose a series of convenient task-agnostic tile elevation embeddings to address this challenge, using recent advances from unsupervised Deep Learning. We test the potential of our embeddings by predicting seven English indices of deprivation (2019) for small geographies in the Greater London area. These indices cover a range of socio-economic outcomes and serve as a proxy for a wide variety of downstream tasks to which the embeddings can be applied. We consider the suitability of this data not just on its own but also as an auxiliary source of data in combination with demographic features, thus providing a realistic use case for the embeddings. Having trialled various model/embedding configurations, we find that our best performing embeddings lead to Root-Mean-Squared-Error (RMSE) improvements of up to 21% over using standard demographic features alone. We also demonstrate how our embedding pipeline, using Deep Learning combined with K-means clustering, produces coherent tile segments which allow the latent embedding features to be interpreted.
△ Less
Submitted 1 August, 2022; v1 submitted 2 December, 2021;
originally announced December 2021.
-
Cross-Lingual Word Embedding Refinement by $\ell_{1}$ Norm Optimisation
Authors:
Xutan Peng,
Chenghua Lin,
Mark Stevenson
Abstract:
Cross-Lingual Word Embeddings (CLWEs) encode words from two or more languages in a shared high-dimensional space in which vectors representing words with similar meaning (regardless of language) are closely located. Existing methods for building high-quality CLWEs learn map**s that minimise the $\ell_{2}$ norm loss function. However, this optimisation objective has been demonstrated to be sensit…
▽ More
Cross-Lingual Word Embeddings (CLWEs) encode words from two or more languages in a shared high-dimensional space in which vectors representing words with similar meaning (regardless of language) are closely located. Existing methods for building high-quality CLWEs learn map**s that minimise the $\ell_{2}$ norm loss function. However, this optimisation objective has been demonstrated to be sensitive to outliers. Based on the more robust Manhattan norm (aka. $\ell_{1}$ norm) goodness-of-fit criterion, this paper proposes a simple post-processing step to improve CLWEs. An advantage of this approach is that it is fully agnostic to the training process of the original CLWEs and can therefore be applied widely. Extensive experiments are performed involving ten diverse languages and embeddings trained on different corpora. Evaluation results based on bilingual lexicon induction and cross-lingual transfer for natural language inference tasks show that the $\ell_{1}$ refinement substantially outperforms four state-of-the-art baselines in both supervised and unsupervised settings. It is therefore recommended that this strategy be adopted as a standard for CLWE methods.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis
Authors:
Xutan Peng,
Guanyi Chen,
Chenghua Lin,
Mark Stevenson
Abstract:
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications. However, existing studies focus on improving the final model performance without acknowledging the computational cost of the proposed approaches, in terms of execution time and environmental impact. This paper proposes a simple yet effective KGE framework which ca…
▽ More
Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications. However, existing studies focus on improving the final model performance without acknowledging the computational cost of the proposed approaches, in terms of execution time and environmental impact. This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes compared with state-of-the-art approaches, while producing competitive performance. We highlight three technical innovations: full batch learning via relational matrices, closed-form Orthogonal Procrustes Analysis for KGEs, and non-negative-sampling training. In addition, as the first KGE method whose entity embeddings also store full relation information, our trained models encode rich semantics and are highly interpretable. Comprehensive experiments and ablation studies involving 13 strong baselines and two standard datasets verify the effectiveness and efficiency of our algorithm.
△ Less
Submitted 17 April, 2021; v1 submitted 9 April, 2021;
originally announced April 2021.
-
UserReg: A Simple but Strong Model for Rating Prediction
Authors:
Haiyang Zhang,
Ivan Ganchev,
Nikola S. Nikolov,
Mark Stevenson
Abstract:
Collaborative filtering (CF) has achieved great success in the field of recommender systems. In recent years, many novel CF models, particularly those based on deep learning or graph techniques, have been proposed for a variety of recommendation tasks, such as rating prediction and item ranking. These newly published models usually demonstrate their performance in comparison to baselines or existi…
▽ More
Collaborative filtering (CF) has achieved great success in the field of recommender systems. In recent years, many novel CF models, particularly those based on deep learning or graph techniques, have been proposed for a variety of recommendation tasks, such as rating prediction and item ranking. These newly published models usually demonstrate their performance in comparison to baselines or existing models in terms of accuracy improvements. However, others have pointed out that many newly proposed models are not as strong as expected and are outperformed by very simple baselines.
This paper proposes a simple linear model based on Matrix Factorization (MF), called UserReg, which regularizes users' latent representations with explicit feedback information for rating prediction. We compare the effectiveness of UserReg with three linear CF models that are widely-used as baselines, and with a set of recently proposed complex models that are based on deep learning or graph techniques. Experimental results show that UserReg achieves overall better performance than the fine-tuned baselines considered and is highly competitive when compared with other recently proposed models. We conclude that UserReg can be used as a strong baseline for future CF research.
△ Less
Submitted 15 February, 2021;
originally announced February 2021.
-
Identifying safe intersection design through unsupervised feature extraction from satellite imagery
Authors:
Jasper S. Wijnands,
Haifeng Zhao,
Kerry A. Nice,
Jason Thompson,
Katherine Scully,
**gqiu Guo,
Mark Stevenson
Abstract:
The World Health Organization has listed the design of safer intersections as a key intervention to reduce global road trauma. This article presents the first study to systematically analyze the design of all intersections in a large country, based on aerial imagery and deep learning. Approximately 900,000 satellite images were downloaded for all intersections in Australia and customized computer…
▽ More
The World Health Organization has listed the design of safer intersections as a key intervention to reduce global road trauma. This article presents the first study to systematically analyze the design of all intersections in a large country, based on aerial imagery and deep learning. Approximately 900,000 satellite images were downloaded for all intersections in Australia and customized computer vision techniques emphasized the road infrastructure. A deep autoencoder extracted high-level features, including the intersection's type, size, shape, lane markings, and complexity, which were used to cluster similar designs. An Australian telematics data set linked infrastructure design to driving behaviors captured during 66 million kilometers of driving. This showed more frequent hard acceleration events (per vehicle) at four- than three-way intersections, relatively low hard deceleration frequencies at T-intersections, and consistently low average speeds on roundabouts. Overall, domain-specific feature extraction enabled the identification of infrastructure improvements that could result in safer driving behaviors, potentially reducing road trauma.
△ Less
Submitted 28 October, 2020;
originally announced October 2020.
-
Robustness and Reliability of Gender Bias Assessment in Word Embeddings: The Role of Base Pairs
Authors:
Haiyang Zhang,
Alison Sneyd,
Mark Stevenson
Abstract:
It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this. However, the extent to which the methods are capturing social stereotypes inherited from the data has been debated. Bias is a complex concept and there exist multiple ways to define it. Previous work has leveraged gender word pairs to measure bias and extract biased analogies. W…
▽ More
It has been shown that word embeddings can exhibit gender bias, and various methods have been proposed to quantify this. However, the extent to which the methods are capturing social stereotypes inherited from the data has been debated. Bias is a complex concept and there exist multiple ways to define it. Previous work has leveraged gender word pairs to measure bias and extract biased analogies. We show that the reliance on these gendered pairs has strong limitations: bias measures based off of them are not robust and cannot identify common types of real-world bias, whilst analogies utilising them are unsuitable indicators of bias. In particular, the well-known analogy "man is to computer-programmer as woman is to homemaker" is due to word similarity rather than societal bias. This has important implications for work on measuring bias in embeddings and related work debiasing embeddings.
△ Less
Submitted 27 October, 2020; v1 submitted 6 October, 2020;
originally announced October 2020.
-
Identifying Automatically Generated Headlines using Transformers
Authors:
Antonis Maronikolakis,
Hinrich Schutze,
Mark Stevenson
Abstract:
False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset con…
▽ More
False information spread via the internet and social media influences public opinion and user activity, while generative models enable fake content to be generated faster and more cheaply than had previously been possible. In the not so distant future, identifying fake content generated by deep learning models will play a key role in protecting users from misinformation. To this end, a dataset containing human and computer-generated headlines was created and a user study indicated that humans were only able to identify the fake headlines in 47.8% of the cases. However, the most accurate automatic approach, transformers, achieved an overall accuracy of 85.7%, indicating that content generated from language models can be filtered out accurately.
△ Less
Submitted 25 April, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Automatic Generation of Topic Labels
Authors:
Areej Alokaili,
Nikolaos Aletras,
Mark Stevenson
Abstract:
Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic…
▽ More
Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic assignment of labels to topics has relied on a two-stage approach: (1) candidate labels are retrieved from a large pool (e.g. Wikipedia article titles); and then (2) re-ranked based on their semantic similarity to the topic terms. However, these extractive approaches can only assign candidate labels from a restricted set that may not include any suitable ones. This paper proposes using a sequence-to-sequence neural-based approach to generate labels that does not suffer from this limitation. The model is trained over a new large synthetic dataset created using distant supervision. The method is evaluated by comparing the labels it generates to ones rated by humans.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
1GHz clocked distribution of electrically generated entangled photon pairs
Authors:
Ginny Shooter,
Ziheng Xiang,
Jonathan R. A. Müller,
Joanna Skiba-Szymanska,
Jan Huwer,
Jonathan Griffiths,
Thomas Mitchell,
Matthew Anderson,
Tina Müller,
Andrey B. Krysa,
R. Mark Stevenson,
Jon Heffernan,
David A. Ritchie,
Andrew J. Shields
Abstract:
Quantum networks are essential for realising distributed quantum computation and quantum communication. Entangled photons are a key resource, with applications such as quantum key distribution, quantum relays, and quantum repeaters. All components integrated in a quantum network must be synchronised and therefore comply with a certain clock frequency. In quantum key distribution, the most mature t…
▽ More
Quantum networks are essential for realising distributed quantum computation and quantum communication. Entangled photons are a key resource, with applications such as quantum key distribution, quantum relays, and quantum repeaters. All components integrated in a quantum network must be synchronised and therefore comply with a certain clock frequency. In quantum key distribution, the most mature technology, clock rates have reached and exceeded 1GHz. Here we show the first electrically pulsed sub-Poissonian entangled photon source compatible with existing fiber networks operating at this clock rate. The entangled LED is based on InAs/InP quantum dots emitting in the main telecom window, with a multi-photon probability of less than 10% per emission cycle and a maximum entanglement fidelity of 89%. We use this device to demonstrate GHz clocked distribution of entangled qubits over an installed fiber network between two points 4.6km apart.
△ Less
Submitted 2 November, 2021; v1 submitted 30 April, 2020;
originally announced April 2020.
-
Understanding Linearity of Cross-Lingual Word Embedding Map**s
Authors:
Xutan Peng,
Mark Stevenson,
Chenghua Lin,
Chen Li
Abstract:
The technique of Cross-Lingual Word Embedding (CLWE) plays a fundamental role in tackling Natural Language Processing challenges for low-resource languages. Its dominant approaches assumed that the relationship between embeddings could be represented by a linear map**, but there has been no exploration of the conditions under which this assumption holds. Such a research gap becomes very critical…
▽ More
The technique of Cross-Lingual Word Embedding (CLWE) plays a fundamental role in tackling Natural Language Processing challenges for low-resource languages. Its dominant approaches assumed that the relationship between embeddings could be represented by a linear map**, but there has been no exploration of the conditions under which this assumption holds. Such a research gap becomes very critical recently, as it has been evidenced that relaxing map**s to be non-linear can lead to better performance in some cases. We, for the first time, present a theoretical analysis that identifies the preservation of analogies encoded in monolingual word embeddings as a necessary and sufficient condition for the ground-truth CLWE map** between those embeddings to be linear. On a novel cross-lingual analogy dataset that covers five representative analogy categories for twelve distinct languages, we carry out experiments which provide direct empirical support for our theoretical claim. These results offer additional insight into the observations of other researchers and contribute inspiration for the development of more effective cross-lingual representation learning strategies.
△ Less
Submitted 11 June, 2022; v1 submitted 2 April, 2020;
originally announced April 2020.
-
The value of text for small business default prediction: A deep learning approach
Authors:
Matthew Stevenson,
Christophe Mues,
Cristián Bravo
Abstract:
Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard…
▽ More
Compared to consumer lending, Micro, Small and Medium Enterprise (mSME) credit risk modelling is particularly challenging, as, often, the same sources of information are not available. Therefore, it is standard policy for a loan officer to provide a textual loan assessment to mitigate limited data availability. In turn, this statement is analysed by a credit expert alongside any available standard credit data. In our paper, we exploit recent advances from the field of Deep Learning and Natural Language Processing (NLP), including the BERT (Bidirectional Encoder Representations from Transformers) model, to extract information from 60 000 textual assessments provided by a lender. We consider the performance in terms of the AUC (Area Under the receiver operating characteristic Curve) and Brier Score metrics and find that the text alone is surprisingly effective for predicting default. However, when combined with traditional data, it yields no additional predictive capability, with performance dependent on the text's length. Our proposed deep learning model does, however, appear to be robust to the quality of the text and therefore suitable for partly automating the mSME lending process. We also demonstrate how the content of loan assessments influences performance, leading us to a series of recommendations on a new strategy for collecting future mSME loan assessments.
△ Less
Submitted 7 July, 2021; v1 submitted 19 March, 2020;
originally announced March 2020.
-
GHz-clocked teleportation of time-bin qubits with a telecom C-band quantum dot
Authors:
M. Anderson,
T. Müller,
J. Huwer,
J. Skiba-Szymanska,
A. B. Krysa,
R. M. Stevenson,
J. Heffernan,
D. A. Ritchie,
A. J. Shields
Abstract:
Teleportation is a fundamental concept of quantum mechanics with an important application in extending the range of quantum communication channels via quantum relay nodes. To be compatible with real-world technology such as secure quantum key distribution over fibre networks, such a relay node must operate at GHz clock rates and accept time-bin encoded qubits in the low-loss telecom band around 15…
▽ More
Teleportation is a fundamental concept of quantum mechanics with an important application in extending the range of quantum communication channels via quantum relay nodes. To be compatible with real-world technology such as secure quantum key distribution over fibre networks, such a relay node must operate at GHz clock rates and accept time-bin encoded qubits in the low-loss telecom band around 1550 nm. Here, we show that InAs/InP droplet epitaxy quantum dots with their sub-Poissonian emission near 1550 nm are ideally suited for the realisation of this technology. To create the necessary on-demand photon emission at GHz clock rates, we develop a flexible pulsed optical excitation scheme, and demonstrate that the fast driving conditions are compatible with a low multiphoton emission rate. We show further that, even under these driving conditions, photon pairs obtained from the biexciton cascade show an entanglement fidelity close to 90\%, comparable to the value obtained under cw excitation. Using asymetric Mach Zehnder interferometers and our photon source, we finally construct a time-bin qubit quantum relay able to receive and send time-bin encoded photons, and demonstrate mean teleportation fidelities of $0.82\pm0.01$, exceeding the classical limit by nearly 10 standard deviations.
△ Less
Submitted 20 January, 2020;
originally announced January 2020.
-
Active reset of a radiative cascade for superequilibrium entangled photon generation
Authors:
Jonathan R. A. Müller,
R. Mark Stevenson,
Joanna Skiba-Szymanska,
Ginny Shooter,
Jan Huwer,
Ian Farrer,
David A. Ritchie,
Andrew J. Shields
Abstract:
The generation rate of entangled photons emitted from cascaded few-level systems is intrinsically limited by the lifetime of the radiative transitions. Here, we overcome this limit for entangled photon pairs from quantum dots via a novel driving regime based on an active reset of the radiative cascade. We show theoretically and experimentally the driving regime to enable the generation of entangle…
▽ More
The generation rate of entangled photons emitted from cascaded few-level systems is intrinsically limited by the lifetime of the radiative transitions. Here, we overcome this limit for entangled photon pairs from quantum dots via a novel driving regime based on an active reset of the radiative cascade. We show theoretically and experimentally the driving regime to enable the generation of entangled photon pairs with higher fidelity and intensity compared to the optimum continuously driven equilibrium state. Finally, we electrically generate entangled photon pairs with a total fidelity of $(79.5 \pm 1.1)\%$ at a record clock rate of 1.15GHz.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Schrödinger-ANI: An Eight-Element Neural Network Interaction Potential with Greatly Expanded Coverage of Druglike Chemical Space
Authors:
James M. Stevenson,
Leif D. Jacobson,
Yutong Zhao,
Chuanjie Wu,
Jon Maple,
Karl Leswing,
Edward Harder,
Robert Abel
Abstract:
We have developed a neural network potential energy function for use in drug discovery, with chemical element support extended from 41% to 94% of druglike molecules based on ChEMBL. We expand on the work of Smith et al., with their highly accurate network for the elements H, C, N, O, creating a network for H, C, N, O, S, F, Cl, P. We focus particularly on the calculation of relative conformer ener…
▽ More
We have developed a neural network potential energy function for use in drug discovery, with chemical element support extended from 41% to 94% of druglike molecules based on ChEMBL. We expand on the work of Smith et al., with their highly accurate network for the elements H, C, N, O, creating a network for H, C, N, O, S, F, Cl, P. We focus particularly on the calculation of relative conformer energies, for which we show that our new potential energy function has an RMSE of 0.70 kcal/mol for prospective druglike molecule conformers, substantially better than the previous state of the art. The speed and accuracy of this model could greatly accelerate the parameterization of protein-ligand binding free energy calculations for novel druglike molecules.
△ Less
Submitted 22 November, 2019;
originally announced December 2019.
-
Real-time monitoring of driver drowsiness on mobile platforms using 3D neural networks
Authors:
Jasper S. Wijnands,
Jason Thompson,
Kerry A. Nice,
Gideon D. P. A. Aschwanden,
Mark Stevenson
Abstract:
Driver drowsiness increases crash risk, leading to substantial road trauma each year. Drowsiness detection methods have received considerable attention, but few studies have investigated the implementation of a detection approach on a mobile phone. Phone applications reduce the need for specialised hardware and hence, enable a cost-effective roll-out of the technology across the driving population…
▽ More
Driver drowsiness increases crash risk, leading to substantial road trauma each year. Drowsiness detection methods have received considerable attention, but few studies have investigated the implementation of a detection approach on a mobile phone. Phone applications reduce the need for specialised hardware and hence, enable a cost-effective roll-out of the technology across the driving population. While it has been shown that three-dimensional (3D) operations are more suitable for spatiotemporal feature learning, current methods for drowsiness detection commonly use frame-based, multi-step approaches. However, computationally expensive techniques that achieve superior results on action recognition benchmarks (e.g. 3D convolutions, optical flow extraction) create bottlenecks for real-time, safety-critical applications on mobile devices. Here, we show how depthwise separable 3D convolutions, combined with an early fusion of spatial and temporal information, can achieve a balance between high prediction accuracy and real-time inference requirements. In particular, increased accuracy is achieved when assessment requires motion information, for example, when sunglasses conceal the eyes. Further, a custom TensorFlow-based smartphone application shows the true impact of various approaches on inference times and demonstrates the effectiveness of real-time monitoring based on out-of-sample data to alert a drowsy driver. Our model is pre-trained on ImageNet and Kinetics and fine-tuned on a publicly available Driver Drowsiness Detection dataset. Fine-tuning on large naturalistic driving datasets could further improve accuracy to obtain robust in-vehicle performance. Overall, our research is a step towards practical deep learning applications, potentially preventing micro-sleeps and reducing road trauma.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
The 'Paris-end' of town? Urban typology through machine learning
Authors:
Kerry A. Nice,
Jason Thompson,
Jasper S. Wijnands,
Gideon D. P. A. Aschwanden,
Mark Stevenson
Abstract:
The confluence of recent advances in availability of geospatial information, computing power, and artificial intelligence offers new opportunities to understand how and where our cities differ or are alike. Departing from a traditional `top-down' analysis of urban design features, this project analyses millions of images of urban form (consisting of street view, satellite imagery, and street maps)…
▽ More
The confluence of recent advances in availability of geospatial information, computing power, and artificial intelligence offers new opportunities to understand how and where our cities differ or are alike. Departing from a traditional `top-down' analysis of urban design features, this project analyses millions of images of urban form (consisting of street view, satellite imagery, and street maps) to find shared characteristics. A (novel) neural network-based framework is trained with imagery from the largest 1692 cities in the world and the resulting models are used to compare within-city locations from Melbourne and Sydney to determine the closest connections between these areas and their international comparators. This work demonstrates a new, consistent, and objective method to begin to understand the relationship between cities and their health, transport, and environmental consequences of their design. The results show specific advantages and disadvantages using each type of imagery. Neural networks trained with map imagery will be highly influenced by the mix of roads, public transport, and green and blue space as well as the structure of these elements. The colours of natural and built features stand out as dominant characteristics in satellite imagery. The use of street view imagery will emphasise the features of a human scaled visual geography of streetscapes. Finally, and perhaps most importantly, this research also answers the age-old question, ``Is there really a `Paris-end' to your city?''.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
The Nature of Human Settlement: Building an understanding of high performance city design
Authors:
Kerry A. Nice,
Gideon D. P. A. Aschwanden,
Jasper S. Wijnands,
Jason Thompson,
Haifeng Zhao,
Mark Stevenson
Abstract:
In an impending urban age where the majority of the world's population will live in cities, it is critical that we improve our understanding of the strengths and limitations of existing city designs to ensure they are safe, clean, can deliver health co-benefits and importantly, are sustainable into the future. To enable this, a systematic and efficient means of performing inter- and intra-city com…
▽ More
In an impending urban age where the majority of the world's population will live in cities, it is critical that we improve our understanding of the strengths and limitations of existing city designs to ensure they are safe, clean, can deliver health co-benefits and importantly, are sustainable into the future. To enable this, a systematic and efficient means of performing inter- and intra-city comparisons based on urban form is required. Until now, methods for comparing cities have been limited by scalability, often reliant upon non-standardised local input data that can be costly and difficult to obtain. To address this, we have developed a unique approach to determine the mix, distribution, and composition of neighbourhood types in cities based on dimensions of block size and regularity, sorted by a self-organising map. We illustrate the utility of the method to provide an understanding of the underlying city morphology by overlaying spatially standardised city metrics such as air pollution and transport activity across a set of 1667 global cities with populations exceeding 300,000. The unique approach reports associations between specific mixes of neighbourhood typologies and quantities of moving vehicles (r=0.97), impervious surfaces (r=0.86), and air pollution levels (aerosol optical depth r=0.58 and NO$_{2}$ r=0.57). What this illustrates, is that this unique approach can identify the characteristics and neighbourhood mixes of well-performing urban areas while also producing unique `city fingerprints' that can be used to provide new metrics, insights, and drive improvements in city design for the future.
△ Less
Submitted 8 October, 2019;
originally announced October 2019.
-
Sky pixel detection in outdoor imagery using an adaptive algorithm and machine learning
Authors:
Kerry A. Nice,
Jasper S. Wijnands,
Ariane Middel,
**gcheng Wang,
Yiming Qiu,
Nan Zhao,
Jason Thompson,
Gideon D. P. A. Aschwanden,
Haifeng Zhao,
Mark Stevenson
Abstract:
Computer vision techniques enable automated detection of sky pixels in outdoor imagery. In urban climate, sky detection is an important first step in gathering information about urban morphology and sky view factors. However, obtaining accurate results remains challenging and becomes even more complex using imagery captured under a variety of lighting and weather conditions.
To address this prob…
▽ More
Computer vision techniques enable automated detection of sky pixels in outdoor imagery. In urban climate, sky detection is an important first step in gathering information about urban morphology and sky view factors. However, obtaining accurate results remains challenging and becomes even more complex using imagery captured under a variety of lighting and weather conditions.
To address this problem, we present a new sky pixel detection system demonstrated to produce accurate results using a wide range of outdoor imagery types. Images are processed using a selection of mean-shift segmentation, K-means clustering, and Sobel filters to mark sky pixels in the scene. The algorithm for a specific image is chosen by a convolutional neural network, trained with 25,000 images from the Skyfinder data set, reaching 82% accuracy for the top three classes. This selection step allows the sky marking to follow an adaptive process and to use different techniques and parameters to best suit a particular image. An evaluation of fourteen different techniques and parameter sets shows that no single technique can perform with high accuracy across varied Skyfinder and Google Street View data sets. However, by using our adaptive process, large increases in accuracy are observed. The resulting system is shown to perform better than other published techniques.
△ Less
Submitted 9 December, 2019; v1 submitted 7 October, 2019;
originally announced October 2019.
-
A tuneable telecom-wavelength entangled light emitting diode
Authors:
Z. -H. Xiang,
J. Huwer,
J. Skiba-Szymanska,
R. M. Stevenson,
D. J. P. Ellis,
I. Farrer,
M. B. Ward,
D. A. Ritchie,
A. J. Shields
Abstract:
Entangled light emitting diodes based on semiconductor quantum dots are promising devices for security sensitive quantum network applications, thanks to their natural lack of multi photon-pair generation. Apart from telecom wavelength emission, network integrability of these sources ideally requires electrical operation for deployment in compact systems in the field. For multiplexing of entangled…
▽ More
Entangled light emitting diodes based on semiconductor quantum dots are promising devices for security sensitive quantum network applications, thanks to their natural lack of multi photon-pair generation. Apart from telecom wavelength emission, network integrability of these sources ideally requires electrical operation for deployment in compact systems in the field. For multiplexing of entangled photons with classical data traffic, emission in the telecom O-band and tuneability to the nearest wavelength channel in compliance with coarse wavelength division multiplexing standards (20 nm channel spacing) is highly desirable. Here we show the first fully electrically operated telecom entangled light emitting diode with wavelength tuneability of more than 25nm, deployed in an installed fiber network. With the source tuned to 1310.00 nm, we demonstrate multiplexing of true single entangled photons with classical data traffic and achieve entanglement fidelities above 95% on an installed fiber in a city.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Modelling Stop** Criteria for Search Results using Poisson Processes
Authors:
Alison Sneyd,
Mark Stevenson
Abstract:
Text retrieval systems often return large sets of documents, particularly when applied to large collections. Stop** criteria can reduce the number of these documents that need to be manually evaluated for relevance by predicting when a suitable level of recall has been achieved. In this work, a novel method for determining a stop** criterion is proposed that models the rate at which relevant d…
▽ More
Text retrieval systems often return large sets of documents, particularly when applied to large collections. Stop** criteria can reduce the number of these documents that need to be manually evaluated for relevance by predicting when a suitable level of recall has been achieved. In this work, a novel method for determining a stop** criterion is proposed that models the rate at which relevant documents occur using a Poisson process. This method allows a user to specify both a minimum desired level of recall to achieve and a desired probability of having achieved it. We evaluate our method on a public dataset and compare it with previous techniques for determining stop** criteria.
△ Less
Submitted 13 September, 2019;
originally announced September 2019.
-
Photon phase shift at the few-photon level and optical switching by a quantum dot in a microcavity
Authors:
L. M. Wells,
S. Kalliakos,
B. Villa,
D. J. P. Ellis,
R. M. Stevenson,
A. J. Bennett,
I. Farrer,
D. A. Ritchie,
A. J. Shields
Abstract:
We exploit the nonlinearity arising from the spin-photon interaction in an InAs quantum dot to demonstrate phase shifts of scattered light pulses at the single-photon level. Photon phase shifts of close to 90 degrees are achieved using a charged quantum dot in a micropillar cavity. We also demonstrate a photon phase switch by using a spin-pum** mechanism through Raman transitions in an in-plane…
▽ More
We exploit the nonlinearity arising from the spin-photon interaction in an InAs quantum dot to demonstrate phase shifts of scattered light pulses at the single-photon level. Photon phase shifts of close to 90 degrees are achieved using a charged quantum dot in a micropillar cavity. We also demonstrate a photon phase switch by using a spin-pum** mechanism through Raman transitions in an in-plane magnetic field. The experimental findings are supported by a theoretical model which explores the dynamics of the system. Our results demonstrate the potential of quantum dot-induced nonlinearities for quantum information processing.
△ Less
Submitted 18 July, 2019;
originally announced July 2019.
-
Streetscape augmentation using generative adversarial networks: insights related to health and wellbeing
Authors:
Jasper S. Wijnands,
Kerry A. Nice,
Jason Thompson,
Haifeng Zhao,
Mark Stevenson
Abstract:
Deep learning using neural networks has provided advances in image style transfer, merging the content of one image (e.g., a photo) with the style of another (e.g., a painting). Our research shows this concept can be extended to analyse the design of streetscapes in relation to health and wellbeing outcomes. An Australian population health survey (n=34,000) was used to identify the spatial distrib…
▽ More
Deep learning using neural networks has provided advances in image style transfer, merging the content of one image (e.g., a photo) with the style of another (e.g., a painting). Our research shows this concept can be extended to analyse the design of streetscapes in relation to health and wellbeing outcomes. An Australian population health survey (n=34,000) was used to identify the spatial distribution of health and wellbeing outcomes, including general health and social capital. For each outcome, the most and least desirable locations formed two domains. Streetscape design was sampled using around 80,000 Google Street View images per domain. Generative adversarial networks translated these images from one domain to the other, preserving the main structure of the input image, but transforming the `style' from locations where self-reported health was bad to locations where it was good. These translations indicate that areas in Melbourne with good general health are characterised by sufficient green space and compactness of the urban environment, whilst streetscape imagery related to high social capital contained more and wider footpaths, fewer fences and more grass. Beyond identifying relationships, the method is a first step towards computer-generated design interventions that have the potential to improve population health and wellbeing.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
Optimization for factorized quantities in perturbative QCD
Authors:
P. M. Stevenson
Abstract:
Perturbative calculations of factorized physical quantities, such as moments of structure functions, suffer from renormalization- and factorization-scheme dependence. The application of the principle of minimal sensitivity to "optimize" the scheme choices is reconsidered, correcting deficiencies in the earlier literature. The proper scheme variables, RG equations, and invariants are identified. Ea…
▽ More
Perturbative calculations of factorized physical quantities, such as moments of structure functions, suffer from renormalization- and factorization-scheme dependence. The application of the principle of minimal sensitivity to "optimize" the scheme choices is reconsidered, correcting deficiencies in the earlier literature. The proper scheme variables, RG equations, and invariants are identified. Earlier results of Nakkagawa and Niegawa are recovered, even though their starting point is, at best, unnecessarily complicated. In particular, the optimized coefficients of the coefficient function C are shown to vanish, so that C^opt=1. The resulting simplifications mean that the optimization procedure is as simple as that for purely-perturbative physical quantities.
△ Less
Submitted 8 May, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Re-Ranking Words to Improve Interpretability of Automatically Generated Topics
Authors:
Areej Alokaili,
Nikolaos Aletras,
Mark Stevenson
Abstract:
Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often…
▽ More
Topics models, such as LDA, are widely used in Natural Language Processing. Making their output interpretable is an important area of research with applications to areas such as the enhancement of exploratory search interfaces and the development of interpretable machine learning models. Conventionally, topics are represented by their n most probable words, however, these representations are often difficult for humans to interpret. This paper explores the re-ranking of topic words to generate more interpretable topic representations. A range of approaches are compared and evaluated in two experiments. The first uses crowdworkers to associate topics represented by different word rankings with related documents. The second experiment is an automatic approach based on a document retrieval task applied on multiple domains. Results in both experiments demonstrate that re-ranking words improves topic interpretability and that the most effective re-ranking schemes were those which combine information about the importance of words both within topics and their relative frequency in the entire corpus. In addition, close correlation between the results of the two evaluation approaches suggests that the automatic method proposed here could be used to evaluate re-ranking methods without the need for human judgements.
△ Less
Submitted 29 March, 2019;
originally announced March 2019.
-
Topology of Hybrid Analytifications
Authors:
Thibaud Lemanissier,
Matthew Stevenson
Abstract:
We investigate the topological properties of Berkovich analytifications over hybrid fields, that is a field equipped with the maximum of its native norm and the trivial norm. We prove that the analytification of the affine line or of a smooth projective curve over a countable Archimedean hybrid field is contractible, and show that it can be non-contractible when the field is uncountable. Further,…
▽ More
We investigate the topological properties of Berkovich analytifications over hybrid fields, that is a field equipped with the maximum of its native norm and the trivial norm. We prove that the analytification of the affine line or of a smooth projective curve over a countable Archimedean hybrid field is contractible, and show that it can be non-contractible when the field is uncountable. Further, we prove that the analytification of affine space over a non-Archimedean hybrid field or over a discrete valuation ring is contractible. As an application, we show that the Berkovich affine line over the ring of integers of a number field is contractible.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Quantum teleportation using highly coherent emission from telecom C-band quantum dots
Authors:
M. Anderson,
T. Müller,
J. Huwer,
J. Skiba-Szymanska,
A. B. Krysa,
R. M. Stevenson,
J. Heffernan,
D. A. Ritchie,
A. J. Shields
Abstract:
A practical way to link separate nodes in quantum networks is to send photons over the standard telecom fibre network. This requires sub-Poissonian photon sources in the telecom wavelength band around 1550 nm, where the photon coherence time has to be sufficient to enable the many interference-based technologies at the heart of quantum networks. Here, we show that droplet epitaxy InAs/InP quantum…
▽ More
A practical way to link separate nodes in quantum networks is to send photons over the standard telecom fibre network. This requires sub-Poissonian photon sources in the telecom wavelength band around 1550 nm, where the photon coherence time has to be sufficient to enable the many interference-based technologies at the heart of quantum networks. Here, we show that droplet epitaxy InAs/InP quantum dots emitting in the telecom C-band can provide photons with coherence times exceeding 1 ns even under non-resonant excitation, more than a factor two longer than values reported for shorter wavelength quantum dots under similar conditions. We demonstrate that these coherence times enable near-optimal interference with a C-band laser qubit, with visibilities only limited by the quantum dot multiphoton emission. Using entangled photons, we further show teleportation of such qubits in six different bases with average fidelity reaching 88.3$\pm$4%. Beyond direct applications in long-distance quantum communication, the high degree of coherence in these quantum dots is promising for future spin based telecom quantum network applications.
△ Less
Submitted 8 January, 2019;
originally announced January 2019.
-
The C-Band All-Sky Survey (C-BASS): Digital backend for the northern survey
Authors:
M. A. Stevenson,
T. J. Pearson,
Michael E. Jones,
C. J. Copley,
C. Dickinson,
J. J. John,
O. G. King,
S. J. C. Muchovej,
Angela C. Taylor
Abstract:
The C-Band All-Sky Survey (C-BASS) is an all-sky full-polarization survey at a frequency of 5 GHz, designed to provide data complementary to the all-sky surveys of WMAP and Planck and future CMB B-mode polarization imaging surveys. We describe the design and performance of the digital backend used for the northern part of the survey. In particular we describe the features that efficiently implemen…
▽ More
The C-Band All-Sky Survey (C-BASS) is an all-sky full-polarization survey at a frequency of 5 GHz, designed to provide data complementary to the all-sky surveys of WMAP and Planck and future CMB B-mode polarization imaging surveys. We describe the design and performance of the digital backend used for the northern part of the survey. In particular we describe the features that efficiently implement the demodulation and filtering required to suppress contaminating signals in the time-ordered data, and the capability for real-time correction of detector non-linearity and receiver balance.
△ Less
Submitted 28 January, 2019; v1 submitted 14 November, 2018;
originally announced November 2018.
-
On the geometric P=W conjecture
Authors:
Mirko Mauri,
Enrica Mazzon,
Matthew Stevenson
Abstract:
We formulate the geometric P=W conjecture for singular character varieties. We establish it for compact Riemann surfaces of genus one, and obtain partial results in arbitrary genus. To this end, we employ non-Archimedean, birational and degeneration techniques to study the topology of the dual boundary complex of certain character varieties. We also clarify the relation between the geometric and t…
▽ More
We formulate the geometric P=W conjecture for singular character varieties. We establish it for compact Riemann surfaces of genus one, and obtain partial results in arbitrary genus. To this end, we employ non-Archimedean, birational and degeneration techniques to study the topology of the dual boundary complex of certain character varieties. We also clarify the relation between the geometric and the cohomological P=W conjectures.
△ Less
Submitted 16 May, 2022; v1 submitted 28 October, 2018;
originally announced October 2018.
-
The C-Band All-Sky Survey (C-BASS): Constraining diffuse Galactic radio emission in the North Celestial Pole region
Authors:
C. Dickinson,
A. Barr,
H. C. Chiang,
C. Copley,
R. D. P. Grumitt,
S. E. Harper,
H. M. Heilgendorff,
L. R. P. Jew,
J. L. Jonas,
Michael E. Jones,
J. P. Leahy,
J. Leech,
E. M. Leitch,
S. J. C. Muchovej,
T. J. Pearson,
M. W. Peel,
A. C. S. Readhead,
J. Sievers,
M. A. Stevenson,
Angela C. Taylor
Abstract:
The C-Band All-Sky Survey C-BASS is a high-sensitivity all-sky radio survey at an angular resolution of 45 arcmin and a frequency of 4.7 GHz. We present a total intensity 4.7 GHz map of the North Celestial Pole (NCP) region of sky, above declination +80 deg, which is limited by source confusion at a level of ~0.6 mK rms. We apply the template-fitting (cross-correlation) technique to WMAP and Planc…
▽ More
The C-Band All-Sky Survey C-BASS is a high-sensitivity all-sky radio survey at an angular resolution of 45 arcmin and a frequency of 4.7 GHz. We present a total intensity 4.7 GHz map of the North Celestial Pole (NCP) region of sky, above declination +80 deg, which is limited by source confusion at a level of ~0.6 mK rms. We apply the template-fitting (cross-correlation) technique to WMAP and Planck data, using the C-BASS map as the synchrotron template, to investigate the contribution of diffuse foreground emission at frequencies ~20-40 GHz. We quantify the anomalous microwave emission (AME) that is correlated with far-infrared dust emission. The AME amplitude does not change significantly (<10%) when using the higher frequency C-BASS 4.7 GHz template instead of the traditional Haslam 408 MHz map as a tracer of synchrotron radiation. We measure template coefficients of $9.93\pm0.35$ and $9.52\pm0.34$ K per unit $τ_{353}$ when using the Haslam and C-BASS synchrotron templates, respectively. The AME contributes $55\pm2\,μ$K rms at 22.8 GHz and accounts for ~60% of the total foreground emission. Our results suggest that a harder (flatter spectrum) component of synchrotron emission is not dominant at frequencies >5 GHz; the best-fitting synchrotron temperature spectral index is $β=-2.91\pm0.04$ from 4.7 to 22.8 GHz and $β=-2.85\pm0.14$ from 22.8 to 44.1 GHz. Free-free emission is weak, contributing ~$7\,μ$K rms (~7%) at 22.8 GHz. The best explanation for the AME is still electric dipole emission from small spinning dust grains.
△ Less
Submitted 19 February, 2019; v1 submitted 27 October, 2018;
originally announced October 2018.
-
Long-term transmission of entangled photons from single quantum dot over deployed fiber
Authors:
Zi-Heng Xiang,
Jan Huwer,
R. Mark Stevenson,
Joanna Skiba-Szymanska,
Martin B. Ward,
Ian Farrer,
David A. Ritchie,
Andrew J. Shields
Abstract:
Non-classical light sources based on a single quantum emitter are considered as core technology for multiple quantum network architectures. A large variety of sources has been developed, but the generated photons remained far from being utilized in established standard fiber networks. Here, we report a week-long transmission of polarization-entangled photons from a single InAs/GaAs quantum dot ove…
▽ More
Non-classical light sources based on a single quantum emitter are considered as core technology for multiple quantum network architectures. A large variety of sources has been developed, but the generated photons remained far from being utilized in established standard fiber networks. Here, we report a week-long transmission of polarization-entangled photons from a single InAs/GaAs quantum dot over a metropolitan network fiber. The emitted photons are in the telecommunication O-band, favored for fiber optical communication. We employ a polarization stabilization system overcoming changes of birefringence introduced by 18.23km of installed fiber. Stable transmission of polarization-encoded entanglement with a high fidelity of 91% is achieved, facilitating the operation of sub-Poissonian quantum light sources over existing fiber networks.
△ Less
Submitted 27 July, 2018;
originally announced July 2018.
-
The C-Band All-Sky Survey (C-BASS): Design and capabilities
Authors:
Michael E. Jones,
Angela C. Taylor,
Moumita Aich,
C. J. Copley,
H. Cynthia Chiang,
R. J. Davis,
C. Dickinson,
R. D. P. Grumitt,
Yaser Hafez,
Heiko M. Heilgendorff,
C. M. Holler,
M. O. Irfan,
Luke R. P. Jew,
J. J. John,
J. Jonas,
O. G. King,
J. P. Leahy,
J. Leech,
E. M. Leitch,
S. J. C. Muchovej,
T. J. Pearson,
M. W. Peel,
A. C. S. Readhead,
Jonathan Sievers,
M. A. Stevenson
, et al. (1 additional authors not shown)
Abstract:
The C-Band All-Sky Survey (C-BASS) is an all-sky full-polarisation survey at a frequency of 5 GHz, designed to provide complementary data to the all-sky surveys of WMAP and Planck, and future CMB B-mode polarization imaging surveys. The observing frequency has been chosen to provide a signal that is dominated by Galactic synchrotron emission, but suffers little from Faraday rotation, so that the m…
▽ More
The C-Band All-Sky Survey (C-BASS) is an all-sky full-polarisation survey at a frequency of 5 GHz, designed to provide complementary data to the all-sky surveys of WMAP and Planck, and future CMB B-mode polarization imaging surveys. The observing frequency has been chosen to provide a signal that is dominated by Galactic synchrotron emission, but suffers little from Faraday rotation, so that the measured polarization directions provide a good template for higher frequency observations, and carry direct information about the Galactic magnetic field. Telescopes in both northern and southern hemispheres with matched optical performance are used to provide all-sky coverage from a ground-based experiment. A continuous-comparison radiometer and a correlation polarimeter on each telescope provide stable imaging properties such that all angular scales from the instrument resolution of 45 arcmin up to full sky are accurately measured. The northern instrument has completed its survey and the southern instrument has started observing. We expect that C-BASS data will significantly improve the component separation analysis of Planck and other CMB data, and will provide important constraints on the properties of anomalous Galactic dust and the Galactic magnetic field.
△ Less
Submitted 19 July, 2018; v1 submitted 11 May, 2018;
originally announced May 2018.