Large-Scale TV Dataset
Partial Video Copy Detection

The STVD-PVCD dataset deals with the performance evaluation of partial video copy detection methods in computer vision. It is designed from a protocol with a TV capture [1, 2] ensuring a deeper scalability, a robust groundtruthing and a control of degradations for a fine performance evaluation. It is the largest public dataset on the task with a near 83k videos having a total duration of 10,660 hours.

The dataset is composed of a reference set, six test sets A to F (presenting different categories and levels of degradation) with the groundtruth. It is provided as different files containing:

The groundtruth is provided as a CSV file having the format
Ref_Video; Pos_Video; Ref_Length; Pos_Length; Start_Copy where

e.g. ref_a; pos_a; 112; 842; 100

The test sets A to F are detailed in [1, 2] and for short below.

For the needs of visualization and testing, some samples (reference, positive, negative videos with the grountruth) are given in the next table for the different test sets.

Reference Positive Groundtruth Negative
sample A ref_a pos_a gth_a neg_a
sample B ref_b pos_b gth_b neg_b
sample C ref_c pos_c gth_c neg_c
sample D ref_d pos_d gth_d neg_d
sample E ref_e pos_e gth_e neg_e
sample F ref_f pos_f gth_f neg_f

The different files constituting the dataset are given below protected with a password. The dataset is available for non-commercial research purposes. Before to download the dataset, get the agreement (in english or french version) and sign it. Then, send the scanned version to Mathieu Delalandre email. After verifying your request, we will contact you with the password to unzip the dataset.

The different files constituting the dataset are given here. We provide first the files for the reference videos and groundtruth. The test sets A to F are given in the next table (STVD is still under publication, the test set F will be delivered later).

Positive videos Negative videos Total duration (h) Size (GiB) Link thumb
set A 3,780 12,165 1,960 458 download partial video copy detection
set B 3,780 3,780 860 18.6 download partial video copy detection
set C 3,780 12,165 1,960 6.5 download partial video copy detection
set D 3,780 12,165 1,960 20.8 download partial video copy detection
set E 3,780 12,165 1,960 21.8 download partial video copy detection
set F 3,780 12,165 1,960 NA NA partial video copy detection

NB. Our storage service at the UT delivers at 3-16 MB/s for downloading (from a low / high speed connection, respectively) with concurrent access.

For kick-off, we list here works with experiments on the STVD-PVCD dataset.

Set Refs
B [LVH2022], ........
C [TNF2022], [LVH2022]
D [LVH2023], ........

Please cite one of the following papers, in english [1] or french [2], if you use this dataset.

  1. V.H. Le, M. Delalandre and D. Conte. A large-Scale TV Dataset for partial video copy detection. International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science (LNCS), vol 13233, pp. 388-399, 2022.
  2. V.H. Le, M. Delalandre and D. Conte. Une large base de données pour la détection de segments de vidéos TV. Journées Francophones des Jeunes Chercheurs en Vision par Ordinateur (ORASIS), 2021.