-
The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America
Authors:
Benjamin Charles Germain Lee,
Jaime Mears,
Eileen Jakeway,
Meghan Ferriter,
Chris Adams,
Nathan Yarasavage,
Deborah Thomas,
Kate Zwaard,
Daniel S. Weld
Abstract:
Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic newspapers. Over 16 million pages of historic American newspapers have been digitized for Chronicling America to date, complete with high-resolution images and machine-readable METS/ALTO OCR. Of considerable int…
▽ More
Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic newspapers. Over 16 million pages of historic American newspapers have been digitized for Chronicling America to date, complete with high-resolution images and machine-readable METS/ALTO OCR. Of considerable interest to Chronicling America users is a semantified corpus, complete with extracted visual content and headlines. To accomplish this, we introduce a visual content recognition model trained on bounding box annotations of photographs, illustrations, maps, comics, and editorial cartoons collected as part of the Library of Congress's Beyond Words crowdsourcing initiative and augmented with additional annotations including those of headlines and advertisements. We describe our pipeline that utilizes this deep learning model to extract 7 classes of visual content: headlines, photographs, illustrations, maps, comics, editorial cartoons, and advertisements, complete with textual content such as captions derived from the METS/ALTO OCR, as well as image embeddings for fast image similarity querying. We report the results of running the pipeline on 16.3 million pages from the Chronicling America corpus and describe the resulting Newspaper Navigator dataset, the largest dataset of extracted visual content from historic newspapers ever produced. The Newspaper Navigator dataset, finetuned visual content recognition model, and all source code are placed in the public domain for unrestricted re-use.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
A Fourier (k-) space design approach for controllable photonic band and localization states in aperiodic lattices
Authors:
Subhasish Chakraborty,
Michael C. Parker,
Robert J. Mears
Abstract:
In this paper we present a systematic study of photonic bandgap engineering using aperiodic lattices (ALs). Up to now ALs have tended to be defined by specific formulae (e.g. Fibonacci, Cantor), and theories have neglected other useful ALs along with the vast majority of non-useful (random) ALs. Here, we present a practical and efficient Fourier space-based general theory to identify all those A…
▽ More
In this paper we present a systematic study of photonic bandgap engineering using aperiodic lattices (ALs). Up to now ALs have tended to be defined by specific formulae (e.g. Fibonacci, Cantor), and theories have neglected other useful ALs along with the vast majority of non-useful (random) ALs. Here, we present a practical and efficient Fourier space-based general theory to identify all those ALs having useful band properties, which are characterized by well-defined Fourier (i.e. lattice momentum) components. Direct control of field localization comes via control of the Parseval strength competition between the different Fourier components characterizing a lattice. Real-space optimization of ALs tends to be computationally demanding. However, via our Fourier space-based simulated annealing inverse optimization algorithm, we efficiently tailor the relative strength of the AL Fourier components for precise control of photonic band and localization properties.
△ Less
Submitted 27 September, 2005;
originally announced September 2005.
-
Aperiodic lattices for tunable photonic bandgaps and localization
Authors:
Subhasish Chakraborty,
Michael C. Parker,
Robert J. Mears
Abstract:
Photonic bandgap engineering using aperiodic lattices (ALs) is systematically studied. Up to now ALs have tended to be defined by specific formulae (e.g. Fibonacci, Cantor), and theories have neglected other useful ALs along with the vast majority of non-useful (random) ALs. Here we present a practical and efficient Fourier space-based general theory, to identify all those ALs having useful band…
▽ More
Photonic bandgap engineering using aperiodic lattices (ALs) is systematically studied. Up to now ALs have tended to be defined by specific formulae (e.g. Fibonacci, Cantor), and theories have neglected other useful ALs along with the vast majority of non-useful (random) ALs. Here we present a practical and efficient Fourier space-based general theory, to identify all those ALs having useful band properties, which are characterized by well-defined Fourier (i.e. lattice momentum) components. Direct real-space optimization of ALs tends to be computationally demanding, and is also difficult to generalise beyond 1D. However, via our Fourier space-based inverse optimization algorithm, we efficiently tailor the relative strength of the AL Fourier components for precise control of photonic band and localization properties.
△ Less
Submitted 11 July, 2005;
originally announced July 2005.