-
Samplable Anonymous Aggregation for Private Federated Data Analysis
Authors:
Kunal Talwar,
Shan Wang,
Audra McMillan,
Vojta **a,
Vitaly Feldman,
Bailey Basile,
Aine Cahill,
Yi Sheng Chan,
Mike Chatzidakis,
Junye Chen,
Oliver Chick,
Mona Chitnis,
Suman Ganta,
Yusuf Goren,
Filip Granqvist,
Kristine Guo,
Frederic Jacobs,
Omid Javidbakht,
Albert Liu,
Richard Low,
Dan Mascenik,
Steve Myers,
David Park,
Wonhee Park,
Gianni Parsa
, et al. (11 additional authors not shown)
Abstract:
We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust as…
▽ More
We revisit the problem of designing scalable protocols for private statistics and private federated learning when each device holds its private data. Our first contribution is to propose a simple primitive that allows for efficient implementation of several commonly used algorithms, and allows for privacy accounting that is close to that in the central setting without requiring the strong trust assumptions it entails. Second, we propose a system architecture that implements this primitive and perform a security analysis of the proposed system.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Dictionary Learning under Symmetries via Group Representations
Authors:
Subhroshekhar Ghosh,
Aaron Y. R. Low,
Yong Sheng Soh,
Zhuohang Feng,
Brendan K. Y. Tan
Abstract:
The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We s…
▽ More
The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We specifically study this problem under the lens of mathematical representation theory. Leveraging the power of non-abelian Fourier analysis for functions over compact groups, we prescribe an algorithmic recipe for learning dictionaries that obey such invariances. We relate the dictionary learning problem in the physical domain, which is naturally modelled as being infinite dimensional, with the associated computational problem, which is necessarily finite dimensional. We establish that the dictionary learning problem can be effectively understood as an optimization instance over certain matrix orbitopes having a particular block-diagonal structure governed by the irreducible representations of the group of symmetries. This perspective enables us to introduce a band-limiting procedure which obtains dimensionality reduction in applications. We provide guarantees for our computational ansatz to provide a desirable dictionary learning outcome. We apply our paradigm to investigate the dictionary learning problem for the groups SO(2) and SO(3). While the SO(2)-orbitope admits an exact spectrahedral description, substantially less is understood about the SO(3)-orbitope. We describe a tractable spectrahedral outer approximation of the SO(3)-orbitope, and contribute an alternating minimization paradigm to perform optimization in this setting. We provide numerical experiments to highlight the efficacy of our approach in learning SO(3)-invariant dictionaries, both on synthetic and on real world data.
△ Less
Submitted 25 July, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Autonomous Mosquito Habitat Detection Using Satellite Imagery and Convolutional Neural Networks for Disease Risk Map**
Authors:
Sriram Elango,
Nandini Ramachandran,
Russanne Low
Abstract:
Mosquitoes are known vectors for disease transmission that cause over one million deaths globally each year. The majority of natural mosquito habitats are areas containing standing water that are challenging to detect using conventional ground-based technology on a macro scale. Contemporary approaches, such as drones, UAVs, and other aerial imaging technology are costly when implemented and are on…
▽ More
Mosquitoes are known vectors for disease transmission that cause over one million deaths globally each year. The majority of natural mosquito habitats are areas containing standing water that are challenging to detect using conventional ground-based technology on a macro scale. Contemporary approaches, such as drones, UAVs, and other aerial imaging technology are costly when implemented and are only most accurate on a finer spatial scale whereas the proposed convolutional neural network(CNN) approach can be applied for disease risk map** and further guide preventative efforts on a more global scale. By assessing the performance of autonomous mosquito habitat detection technology, the transmission of mosquito-borne diseases can be prevented in a cost-effective manner. This approach aims to identify the spatiotemporal distribution of mosquito habitats in extensive areas that are difficult to survey using ground-based technology by employing computer vision on satellite imagery for proof of concept. The research presents an evaluation and the results of 3 different CNN models to determine their accuracy of predicting large-scale mosquito habitats. For this approach, a dataset was constructed containing a variety of geographical features. Larger land cover variables such as ponds/lakes, inlets, and rivers were utilized to classify mosquito habitats while minute sites were omitted for higher accuracy on a larger scale. Using the dataset, multiple CNN networks were trained and evaluated for accuracy of habitat prediction. Utilizing a CNN-based approach on readily available satellite imagery is cost-effective and scalable, unlike most aerial imaging technology. Testing revealed that YOLOv4 obtained greater accuracy in mosquito habitat detection for identifying large-scale mosquito habitats.
△ Less
Submitted 11 March, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
An End-to-end Point of Interest (POI) Conflation Framework
Authors:
Raymond Low,
Zeynep D. Tekler,
Lynette Cheah
Abstract:
Point of interest (POI) data serves as a valuable source of semantic information for places of interest and has many geospatial applications in real estate, transportation, and urban planning. With the availability of different data sources, POI conflation serves as a valuable technique for enriching data quality and coverage by merging the POI data from multiple sources. This study proposes a nov…
▽ More
Point of interest (POI) data serves as a valuable source of semantic information for places of interest and has many geospatial applications in real estate, transportation, and urban planning. With the availability of different data sources, POI conflation serves as a valuable technique for enriching data quality and coverage by merging the POI data from multiple sources. This study proposes a novel end-to-end POI conflation framework consisting of six steps, starting with data procurement, schema standardisation, taxonomy map**, POI matching, POI unification, and data verification. The feasibility of the proposed framework was demonstrated in a case study conducted in the eastern region of Singapore, where the POI data from five data sources was conflated to form a unified POI dataset. Based on the evaluation conducted, the resulting unified dataset was found to be more comprehensive and complete than any of the five POI data sources alone. Furthermore, the proposed approach for identifying POI matches between different data sources outperformed all baseline approaches with a matching accuracy of 97.6% with an average run time below 3 minutes when matching over 12,000 POIs to result in 8,699 unique POIs, thereby demonstrating the framework's scalability for large scale implementation in dense urban contexts.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.