Large-Scale TV Dataset
Partial Video Copy Detection
The STVD-PVCD dataset deals with the performance evaluation of partial video copy detection methods in computer vision. It is designed from a protocol with a TV capture [1, 2] ensuring a deeper scalability, a robust groundtruthing and a control of degradations for a fine performance evaluation. It is the largest public dataset on the task with a near 83k videos having a total duration of 10,660 hours.
The dataset is composed of a reference set, six test sets A to F (presenting different categories and levels of degradation) with the groundtruth. It is provided as different files containing:
- short reference videos with ids,
- long positive/negative videos for testing/training,
- the groundtruth with the reference ids, timestamps and durations.
The groundtruth is provided as a CSV file having the format
Ref_Video; Pos_Video; Ref_Length; Pos_Length; Start_Copy where
Ref_Videois the label / file name of the reference video,
Pos_Videois the label / file name of the positive video,
Ref_Lengthis the length of the reference video in number of frames with a 30 FPS rate,
Pos_Lengthis the length of the positive video in number of frames with a 30 FPS rate, that is we have
Start_Copyis the index of the first frame of the reference video copy appearing in the positive video such as
ref_a; pos_a; 112; 842; 100
The test sets A to F are detailed in [1, 2] and for short below.
- Set A: is a root capture to tune the characterization tasks.
- Set B: is a "hello world" test set.
- Set C: is a test set with scalability and pixel attack.
- Set D: is a test set with scalability and global transformations.
- Set E: applies video speeding with scalability.
- Set F: combines the test sets C, D and E.
For the needs of visualization and testing, some samples (reference, positive, negative videos with the grountruth) are given in the next table for the different test sets.
The different files constituting the dataset are given below protected with a password. The dataset is available for non-commercial research purposes. Before to download the dataset, get the agreement (in english or french version) and sign it. Then, send the scanned version to Mathieu Delalandre . After verifying your request, we will contact you with the password to unzip the dataset.
The different files constituting the dataset are given here. We provide first the files for the reference videos and groundtruth. The test sets A to F are given in the next table (STVD is still under publication, the test set F will be delivered later).
|Positive videos||Negative videos||Total duration (h)||Size (GiB)||Link||thumb|
NB. Our storage service at the UT delivers at 3-16 MB/s for downloading (from a low / high speed connection, respectively) with concurrent access.
For kick-off, we list here works with experiments on the STVD-PVCD dataset.
Please cite one of the following papers, in english  or french , if you use this dataset.
- V.H. Le, M. Delalandre and D. Conte. A large-Scale TV Dataset for partial video copy detection. International Conference on Image Analysis and Processing (ICIAP), Lecture Notes in Computer Science (LNCS), vol 13233, pp. 388-399, 2022.
- V.H. Le, M. Delalandre and D. Conte. Une large base de données pour la détection de segments de vidéos TV. Journées Francophones des Jeunes Chercheurs en Vision par Ordinateur (ORASIS), 2021.