Search | arXiv e-print repository

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Authors: Kepan Nan, Rui Xie, Penghao Zhou, Tiehan Fan, Zhenheng Yang, Zhijie Chen, Xiang Li, Jian Yang, Ying Tai

Abstract: Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is ch… ▽ More Text-to-video (T2V) generation has recently garnered significant attention thanks to the large multi-modality model Sora. However, T2V generation still faces two important challenges: 1) Lacking a precise open sourced high-quality dataset. The previous popular video datasets, e.g. WebVid-10M and Panda-70M, are either with low quality or too large for most research institutions. Therefore, it is challenging but crucial to collect a precise high-quality text-video pairs for T2V generation. 2) Ignoring to fully utilize textual information. Recent T2V methods have focused on vision transformers, using a simple cross attention module for video generation, which falls short of thoroughly extracting semantic information from text prompt. To address these issues, we introduce OpenVid-1M, a precise high-quality dataset with expressive captions. This open-scenario dataset contains over 1 million text-video pairs, facilitating research on T2V generation. Furthermore, we curate 433K 1080p videos from OpenVid-1M to create OpenVidHD-0.4M, advancing high-definition video generation. Additionally, we propose a novel Multi-modal Video Diffusion Transformer (MVDiT) capable of mining both structure information from visual tokens and semantic information from text tokens. Extensive experiments and ablation studies verify the superiority of OpenVid-1M over previous datasets and the effectiveness of our MVDiT. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 15 pages, 9 figures

arXiv:2309.11160 [pdf, other]

Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation

Authors: Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan

Abstract: Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained tem… ▽ More Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images. However, this task was seldom explored. In this work, based on IPMT, a state-of-the-art few-shot image segmentation method that combines external support guidance information with adaptive query guidance cues, we propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data. We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively. Frame prototypes are further used for each frame independently to handle fine-grained adaptive guidance and enable bidirectional clip-frame prototype communication. To reduce the influence of noisy memory, we propose to leverage the structural similarity relation among different predicted regions and the support for selecting reliable memory frames. Furthermore, a new segmentation loss is also proposed to enhance the category discriminability of the learned prototypes. Experimental results demonstrate that our proposed video IPMT model significantly outperforms previous models on two benchmark datasets. Code is available at https://github.com/nankepan/VIPMT. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: ICCV 2023

arXiv:2308.02162 [pdf, other]

Learning Referring Video Object Segmentation from Weak Annotation

Authors: Wangbo Zhao, Kepan Nan, Songyang Zhang, Kai Chen, Dahua Lin, Yang You

Abstract: Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object. Although existing RVOS methods have achieved significant performance, they depend on densely-annotated datasets, which are expensive and time-consuming to obtain. In this paper, we propose a new annotation scheme that reduces the annotation effo… ▽ More Referring video object segmentation (RVOS) is a task that aims to segment the target object in all video frames based on a sentence describing the object. Although existing RVOS methods have achieved significant performance, they depend on densely-annotated datasets, which are expensive and time-consuming to obtain. In this paper, we propose a new annotation scheme that reduces the annotation effort by 8 times, while providing sufficient supervision for RVOS. Our scheme only requires a mask for the frame where the object first appears and bounding boxes for the rest of the frames. Based on this scheme, we develop a novel RVOS method that exploits weak annotations effectively. Specifically, we build a simple but effective baseline model, SimRVOS, for RVOS with weak annotation. Then, we design a cross frame segmentation module, which uses the language-guided dynamic filters from one frame to segment the target object in other frames to thoroughly leverage the valuable mask annotation and bounding boxes. Finally, we develop a bi-level contrastive learning method to enhance the pixel-level discriminative representation of the model with weak annotation. We conduct extensive experiments to show that our method achieves comparable or even superior performance to fully-supervised methods, without requiring dense mask annotations. △ Less

Submitted 14 December, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

arXiv:2301.07547 [pdf]

Quad-cascade picture of turbulence

Authors: Wei Zhao, Yanxia Shi, Yueqiang Zhu, Ming Zeng, Guangyin **g, Keyi Nan, Yu Chen, Chen Zhang, Tianyun Zhao, Kaige Wang, **tao Bai

Abstract: Although its ubiquitous emergence in nature and variety of systems, turbulence possesses spatio-temporal chaotic, intermittent fluctuations, and makes it impossible to be precisely predicted. Persistent attempts for almost a century have been devoted to capture the invariant laws and hidden deeply universality out of the vast disorder and chaotic nature of turbulence. The celebrated Kolmogorov -5/… ▽ More Although its ubiquitous emergence in nature and variety of systems, turbulence possesses spatio-temporal chaotic, intermittent fluctuations, and makes it impossible to be precisely predicted. Persistent attempts for almost a century have been devoted to capture the invariant laws and hidden deeply universality out of the vast disorder and chaotic nature of turbulence. The celebrated Kolmogorov -5/3 law is robust, but not comprehensive to describe the diverse turbulences, especially in the turbulence driven by external volume forces, e.g. thermal convection, electrokinetic turbulence and etc. Here, we reveal that the fluxes of kinetic energy and scalar variance must be highly coupled to establish a universal conservation law and consequently we successfully unify a much diversity of scaling laws. As an example, in a microfluidic electrokinetic turbulence, additional scaling of -5/3, -9/5 and -7/3 are experimentally found in the power spectra of concentration. With this proposed model, a full quad-cascade picture is eventually complete to unify the various scaling laws for the most complicated physical problem of turbulence. △ Less

Submitted 19 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

arXiv:2212.05022 [pdf, ps, other]

doi 10.1103/PhysRevLett.130.166102

Ultraslow settling kinetics of frictional cohesive powders

Authors: Kai Nan, Robert S. Hoy

Abstract: Using discrete element method simulations, we show that the settling of frictional cohesive grains under ramped-pressure compression exhibits strong history dependence and slow dynamics that are not present for grains that lack either cohesion or friction. Systems prepared by beginning with a dilute state and then ram** the pressure to a small positive value $P_{\rm final}$ over a time… ▽ More Using discrete element method simulations, we show that the settling of frictional cohesive grains under ramped-pressure compression exhibits strong history dependence and slow dynamics that are not present for grains that lack either cohesion or friction. Systems prepared by beginning with a dilute state and then ram** the pressure to a small positive value $P_{\rm final}$ over a time $τ_{\rm ramp}$ settle at packing fractions given by an inverse-logarithmic rate law, $φ_{\rm settled}(τ_{\rm ramp}) = φ_{\rm settled}(\infty) + A/[1 + B\ln(1 + τ_{\rm ramp}/τ_{\rm slow})]$. This law is analogous to the one obtained from classical tap** experiments on noncohesive grains, but crucially different in that $τ_{\rm slow}$ is set by the slow dynamics of structural void stabilization rather than the faster dynamics of bulk densification. We formulate a kinetic free-void-volume theory that predicts this $φ_{\rm settled}(τ_{\rm ramp})$, with $φ_{\rm settled}(\infty) = φ_{\rm ALP}$ and $A = φ_{\rm settled}(0) - φ_{\rm ALP}$, where $φ_{\rm ALP} \equiv .135$ is the ``adhesive loose packing'' fraction found by Liu \textit{et al.} [W.\ Liu, Y.\ **, S. Chen, H.\ A.\ Makse and S.\ Li, \textit{Soft Matt.} \textbf{13}, 421 (2017)]. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: Supplemental Material available upon request

arXiv:2210.03222 [pdf, other]

doi 10.1103/PhysRevLett.129.132701

Deep underground laboratory measurement of $^{13}$C($α$,$n$)$^{16}$O in the Gamow windows of the $s$- and $i$-processes

Authors: B. Gao, T. Y. Jiao, Y. T. Li, H. Chen, W. P. Lin, Z. An, L. H. Ru, Z. C. Zhang, X. D. Tang, X. Y. Wang, N. T. Zhang, X. Fang, D. H. Xie, Y. H. Fan, L. Ma, X. Zhang, F. Bai, P. Wang, Y. X. Fan, G. Liu, H. X. Huang, Q. Wu, Y. B. Zhu, J. L. Chai, J. Q. Li , et al. (50 additional authors not shown)

Abstract: The $^{13}$C($α$,$n$)$^{16}$O reaction is the main neutron source for the slow-neutron-capture (s-) process in Asymptotic Giant Branch stars and for the intermediate (i-) process. Direct measurements at astrophysical energies in above-ground laboratories are hindered by the extremely small cross sections and vast cosmic-ray induced background. We performed the first consistent direct measurement i… ▽ More The $^{13}$C($α$,$n$)$^{16}$O reaction is the main neutron source for the slow-neutron-capture (s-) process in Asymptotic Giant Branch stars and for the intermediate (i-) process. Direct measurements at astrophysical energies in above-ground laboratories are hindered by the extremely small cross sections and vast cosmic-ray induced background. We performed the first consistent direct measurement in the range of $E_{\rm c.m.}=$0.24 MeV to 1.9 MeV using the accelerators at the China **** Underground Laboratory (CJPL) and Sichuan University. Our measurement covers almost the entire i-process Gamow window in which the large uncertainty of the previous experiments has been reduced from 60\% down to 15\%, eliminates the large systematic uncertainty in the extrapolation arising from the inconsistency of existing data sets, and provides a more reliable reaction rate for the studies of the s- and i-processes along with the first direct determination of the alpha strength for the near-threshold state. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Journal ref: Physical Review Letters 129, 132701 (2022)

arXiv:2205.12060 [pdf, ps, other]

Unexpected ductility in semiflexible polymer glasses with $N_e = C_\infty$

Authors: Joseph D. Dietz, Kai Nan, Robert S. Hoy

Abstract: Semiflexible polymer glasses (SPGs), including those formed by the recently synthesized semiflexible conjugated polymers (SCPs), are expected to be brittle because classical formulas for their craze extension ratio $λ_{\rm craze}$ and fracture stretch $λ_{\rm frac}$ predict that systems with $N_e = C_\infty$ have $λ_{\rm craze} = λ_{\rm frac} = 1$ and hence cannot be deformed to large strains. Usi… ▽ More Semiflexible polymer glasses (SPGs), including those formed by the recently synthesized semiflexible conjugated polymers (SCPs), are expected to be brittle because classical formulas for their craze extension ratio $λ_{\rm craze}$ and fracture stretch $λ_{\rm frac}$ predict that systems with $N_e = C_\infty$ have $λ_{\rm craze} = λ_{\rm frac} = 1$ and hence cannot be deformed to large strains. Using molecular dynamics simulations, we show that in fact such glasses can form stable crazes with $λ_{\rm craze} \simeq N_e^{1/4} \simeq C_\infty^{1/4}$, and that they fracture at $λ_{\rm frac} = (3N_e^{1/2} - 2)^{1/2} \simeq (3C_\infty^{1/2} - 2)^{1/2}$. We argue that the classical formulas for $λ_{\rm craze}$ and $λ_{\rm frac}$ fail to describe SPGs' mechanical response because they do not account for Kuhn segments' ability to stretch during deformation. △ Less

Submitted 31 August, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: Accepted for publication in PRL

arXiv:2204.03963 [pdf]

Mixing and flow transition in an optimized electrokinetic turbulent micromixer

Authors: Keyi Nan, Yanxia Shi, Tianyun Zhao, Xiaowei Tang, Yueqiang Zhu, Kaige Wang, **tao Bai, Wei Zhao

Abstract: Micromixer is a key element in lab on a chip for broad applications in the analysis and measurement of chemistry and engineering. Previous investigations reported electrokinetic (EK) turbulence could be realized in a Y-type micromixer with a cross-sectional dimension of 100 $μ$m order. Although the ultrafast turbulent mixing can be generated at a bulk flow Reynolds number of O(1), the micromixer h… ▽ More Micromixer is a key element in lab on a chip for broad applications in the analysis and measurement of chemistry and engineering. Previous investigations reported electrokinetic (EK) turbulence could be realized in a Y-type micromixer with a cross-sectional dimension of 100 $μ$m order. Although the ultrafast turbulent mixing can be generated at a bulk flow Reynolds number of O(1), the micromixer has not been optimized. In this investigation, we systematically investigated the influence of electric field intensity, AC frequency, electric conductivity ratio, and channel width at the entrance on the mixing effect and transition electric Rayleigh number in the "Y" type electrokinetic micromixer. It is found the optimal mixing is realized in a 350 $μ$m wide micromixer, under 100 kHz and 1.14*10^5 V/m AC electric field, with an electric conductivity ratio of 1:3000. Under the conditions, a maximum degree of mixedness of 0.93 can be achieved at 84 $μ$m from the entrance and 100 ms. A further investigation of the critical electric field and the critical electric Rayleigh number indicates the most unstable condition of EK flow instability is inconsistent with that of the optimal mixing in EK turbulence. To predict the evolution of EK flow under high $Ra_{e}$, it is necessary to apply a computational turbulence model, instead of linear instability analysis. △ Less

Submitted 11 April, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

Comments: 9 pages, 7 figures

arXiv:2109.12654 [pdf]

doi 10.3847/1538-4357/ac12ce

The $^{59}$Fe(n, γ)$^{60}$Fe Cross Section from the Surrogate Ratio Method and Its Effect on the $^{60}$Fe Nucleosynthesis

Authors: S. Q. Yan, X. Y. Li, K. Nishio, M. Lugaro, Z. H. Li, H. Makii, M. Pignatari, Y. B. Wang, R. Orlandi, K. Hirose, K. Tsukada, P. Mohr, G. S. Li, J. G. Wang, B. S. Gao, Y. L. Han, B. Guo, Y. J. Li, Y. P. Shen, T. K. Sato, Y. Ito, F. Suzaki, J. Su, Y. Y. Yang, J. S. Wang , et al. (17 additional authors not shown)

Abstract: The long-lived $^{60}$Fe (with a half-life of 2.62 Myr) is a crucial diagnostic of active nucleosynthesis in the Milky Way galaxy and in supernovae near the solar system. The neutron-capture reaction $^{59}$Fe(n,$γ$)$^{60}$Fe on $^{59}$Fe (half-life = 44.5 days) is the key reaction for the production of $^{60}$Fe in massive stars. This reaction cross section has been previously constrained by the… ▽ More The long-lived $^{60}$Fe (with a half-life of 2.62 Myr) is a crucial diagnostic of active nucleosynthesis in the Milky Way galaxy and in supernovae near the solar system. The neutron-capture reaction $^{59}$Fe(n,$γ$)$^{60}$Fe on $^{59}$Fe (half-life = 44.5 days) is the key reaction for the production of $^{60}$Fe in massive stars. This reaction cross section has been previously constrained by the Coulomb dissociation experiment, which offered partial constraint on the $E$1 $γ$-ray strength function but a negligible constraint on the $M$1 and $E$2 components. In this work, for the first time, we use the surrogate ratio method to experimentally determine the $^{59}$Fe(n,$γ$)$^{60}$Fe cross sections in which all the components are included. We derived a Maxwellian-averaged cross section of 27.5 $\pm$ 3.5 mb at $kT$= 30 keV and 13.4 $\pm$ 1.7 mb at $kT$= 90 keV, roughly 10 - 20% higher than previous estimates. We analyzed the impact of our new reaction rates in nucleosynthesis models of massive stars and found that uncertainties in the production of $^{60}$Fe from the $^{59}$Fe(n,$γ$)$^{60}$Fe rate are at most of 25%. We conclude that stellar physics uncertainties now play a major role in the accurate evaluation of the stellar production of $^{60}$Fe. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 9 pages with 6 figures

arXiv:2007.13942 [pdf, ps, other]

Does the Sastry transition control cavitation in simple liquids?

Authors: Caitlin M. Gish, Kai Nan, Robert S. Hoy

Abstract: We examine the Sastry (athermal cavitation) transitions for model monatomic liquids interacting via Lennard-Jones as well as shorter- and longer-ranged pair potentials. Low-temperature thermodynamically stable liquids have $ρ< ρ_S$ except when the attractive forces are long-ranged. For moderate- and short-ranged attractions, stable liquids with $ρ> ρ_S$ exist at higher temperatures; the pressures… ▽ More We examine the Sastry (athermal cavitation) transitions for model monatomic liquids interacting via Lennard-Jones as well as shorter- and longer-ranged pair potentials. Low-temperature thermodynamically stable liquids have $ρ< ρ_S$ except when the attractive forces are long-ranged. For moderate- and short-ranged attractions, stable liquids with $ρ> ρ_S$ exist at higher temperatures; the pressures in these liquids are high, but the Sastry transition may strongly influence their cavitation under dynamic hydrostatic expansion. The temperature $T^*$ at which stable $ρ> ρ_S$ liquids emerge is $\sim 0.84ε/k_B$ for Lennard-Jones liquids; $T^*$ decreases (increases) rapidly with increasing (decreasing) pair-interaction range. In particular, for short-ranged potentials, $T^*$ is above the critical temperature. All liquids' inherent structures are isostructural (isomorphic) for densities below (above) the Sastry density $ρ_S$. Overall, our results suggest that the barriers to cavitation in most simple liquids under ambient conditions where significant cavitation is likely to occur are primarily vibrational-energetic and entropic rather than configurational-energetic. The most likely exceptions to this rule are liquids with long-ranged pair interactions, such as alkali metals. The most likely exceptions to this rule are liquids with long-ranged pair interactions, such as alkali metals. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: 9 pages, 6 figures

arXiv:2006.04432 [pdf, other]

AdaDeep: A Usage-Driven, Automated Deep Model Compression Framework for Enabling Ubiquitous Intelligent Mobiles

Authors: Sicong Liu, Junzhao Du, Kaiming Nan, ZimuZhou, Atlas Wang, Yingyan Lin

Abstract: Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a tremendously growing demand for bringing DNN-powered intelligence into mobile platforms. While the potential of deploying DNNs on resource-constrained platforms has been demonstrated by DNN compression techniques, the current practice suffers from two limitations: 1) merely stand-alone compression schemes are investigated even thoug… ▽ More Recent breakthroughs in Deep Neural Networks (DNNs) have fueled a tremendously growing demand for bringing DNN-powered intelligence into mobile platforms. While the potential of deploying DNNs on resource-constrained platforms has been demonstrated by DNN compression techniques, the current practice suffers from two limitations: 1) merely stand-alone compression schemes are investigated even though each compression technique only suit for certain types of DNN layers; and 2) mostly compression techniques are optimized for DNNs' inference accuracy, without explicitly considering other application-driven system performance (e.g., latency and energy cost) and the varying resource availability across platforms (e.g., storage and processing capability). To this end, we propose AdaDeep, a usage-driven, automated DNN compression framework for systematically exploring the desired trade-off between performance and resource constraints, from a holistic system level. Specifically, in a layer-wise manner, AdaDeep automatically selects the most suitable combination of compression techniques and the corresponding compression hyperparameters for a given DNN. Thorough evaluations on six datasets and across twelve devices demonstrate that AdaDeep can achieve up to $18.6\times$ latency reduction, $9.8\times$ energy-efficiency improvement, and $37.3\times$ storage reduction in DNNs while incurring negligible accuracy loss. Furthermore, AdaDeep also uncovers multiple novel combinations of compression techniques. △ Less

Submitted 8 June, 2020; originally announced June 2020.

Journal ref: IEEE transactions on mobile computing, 2020

Showing 1–11 of 11 results for author: Nan, K