ACM Multimedia System Conference Dataset Archive

This page hosts traces from the 2013–2016 ACM Multimedia Systems Conferences (ACM MMSys).

Datasets from MMSys 2012 and MMSys 2011 are available for download here.

2016

Use of the datasets in published work should be acknowledged by a full citation to the authors’ papers at the MMSys conference:

Proceedings of ACM MMSys '16, Klagenfurt am Wörthersee, Austria, May 10-13, 2016

Refer to each paper individually for copyright and licensing information.

GSET Somi: A Game-Specific Eye Tracking Dataset for Somi

In this paper, we present an eye tracking dataset of computer game players who played the side-scrolling cloud game Somi. The game was streamed in the form of video from the cloud to the player. This dataset can be used for designing and testing game-specific visual attention models. The source code of the game is also available to facilitate further modifications and adjustments. For collecting this data, male and female candidates were asked to play the game in front of a remote eye-tracking device. For each player, we recorded gaze points, video frames of the gameplay, and mouse and keyboard commands. For each video frame, a list of its game objects with their locations and sizes was also recorded. This data, synchronized with eye-tracking data, allows one to calculate the amount of attention that each object or group of objects draw from each player. As a benchmark, we also show various attention patterns could be identified among players.

Paper: GSET Somi: A Game-Specific Eye Tracking Dataset for Somi
Authors: H. Ahmadi, S. Tootaghaj, S. Mowlaei, M. Hashemi, S. Shirmohammadi
Link: https://dx.doi.org/10.1145/2910017.2910616
Data set: http://www.site.uottawa.ca/~shervin/gaze/

GeoUGV: User-Generated Mobile Video Dataset with Fine Granularity Spatial Metadata

When analyzing and processing videos, it has become increasingly important in many applications to also consider contextual information, in addition to the content. With the ubiquity of sensor-rich smartphones, acquiring a continuous stream of geo-spatial metadata that includes the location and orientation of a camera together with the video frames has become practical. However, no such detailed dataset is publicly available. In this paper we present an extensive geo-tagged video dataset named GeoUGV that has been collected as part of the MediaQ and GeoVid projects. The key features of the dataset are that each video file is accompanied by a metadata sequence of geo-tags consisting of GPS locations, compass directions, and spatial keywords at fine-grained intervals. The GeoUGV dataset has been collected by volunteer users and its statistics can be summarized as follows: 2,397 videos containing 208,976 video frames that are geo-tagged, collected by 289 users in more than 20 cities across the world over a period of 10 years (2007–2016). We hope that this dataset will be useful for researchers, scientists and practitioners alike in their work.

Paper: GeoUGV: User-Generated Mobile Video Dataset with Fine Granularity Spatial Metadata
Authors: Y. Lu, H. To, A. Alfarrarjeh, S. H. Kim, Y. Yin, R. Zimmermann, C. Shahabi
Link: https://dx.doi.org/10.1145/2910017.2910617
Data set: http://mediaq.usc.edu/dataset/ or http://geovid.org/dataset/

Comprehensive Mobile Bandwidth Traces from Vehicular Networks

Bandwidth fluctuation in mobile networks severely effects the quality of service (QoS) of bandwidth-sensitive applications such as video streaming. Using bandwidth statistics it is possible to predict the network behaviour and take proactive actions to counter network fluctuations, which in turn can improve the QoS. In this paper, we present comprehensive bandwidth datasets from extensive measurement campaigns conducted in Sydney on both 3G and 4G networks under vehicular driving conditions. A particularly distinguishing feature of our dataset is that we have collected data from repeated trips along a few routes. Thus our data can be useful to obtain statistically significant results on network performance in an urban setting. We outline the measurement methodology and present key insights obtained from the collected traces. We have made our dataset available to the wider research community.

Paper: Comprehensive Mobile Bandwidth Traces from Vehicular Networks
Authors: A. Bokani, M. Hassan, S. Kanhere, J. Yao, G. Zhong
Link: https://dx.doi.org/10.1145/2910017.2910618

Right Inflight? A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation

The dataset Right Inflight was developed to support the exploration of the match between video content and the situation in which that content is watched. Specifically, we look at videos that are suitable to be watched on an airplane, where the main assumption is that that viewers watch movies with the intent of relaxing themselves and letting time pass quickly, despite the inconvenience and discomfort of flight. The aim of the dataset is to support the development of recommender systems, as well as computer vision and multimedia retrieval algorithms capable of automatically predicting which videos are suitable for inflight consumption. Our ultimate goal is to promote a deeper understanding of how people experience video content, and of how technology can support people in finding or selecting video content that supports them in regulating their internal states in certain situations. Right Inflight consists of 318 human-annotated movies, for which we provide links to trailers, a set of pre-computed low-level visual, audio and text features as well as user ratings. The annotation was performed by crowdsourcing workers, who were asked to judge the appropriateness of movies for inflight consumption.

Paper: Right Inflight? A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation
Authors: M. Riegler, M. Larson, C. Spampinato, P. Halvorsen, M. Lux, J. Markussen, K. Pogorelov, C. Griwodz, H. Stensland
Link: https://dx.doi.org/10.1145/2910017.2910619
Data set: https://www.dropbox.com/sh/j7nunecnzfjrp2r/AAC1BAf5JEv-rGUW9h02L2X2a?dl=0

This dataset is designed to support research in the areas of information retrieval that foster new technologies for improving both the relevance and the diversification of search results with explicit focus on the social media context. The dataset consists of Creative Commons data for around 153 one-concept Flickr queries and 45,375 images for development and 139 Flickr queries (69 one-concept - 70 multi-concept) and 41,394 images for testing; metadata, Wikipedia pages and content descriptors for text and visual modalities. Data is annotated for the relevance and the diversity of the photos. An additional dataset used to train the credibility descriptors (an automatic estimation of the quality (correctness) of a particular user’s tags) provides information for ca. 685 Flickr users and metadata for more than 3.5M images. Important: much of the Information has been obtained by crawling the Internet and from Flickr. Every possible measure has been taken to ensure that the content has been released under a Creative Commons license that allow redistribution. However, the authors cannot fully guarantee that the collection contains absolutely no content without a Creative Commons license. Such content could potentially enter the collection if it was not correctly marked at the source. In what concerns the content descriptors, features are provided on an as-is basis with no guaranty of being correct. The dataset was validated during the 2015 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.

Paper: Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries
Authors: B. Ionescu, A.L. Gînscă, B. Boteanu, M. Lupu, A. Popescu, H. Müller
Link: https://dx.doi.org/10.1145/2910017.2910620
Data set: http://imag.pub.ro/~bionescu/index_files/Page13288.htm

Heimdallr: A Dataset for Sport Analysis

Heimdallr is a dataset that aims to serve two different purposes. The first purpose is action recognition and pose estimation, which requires a dataset of annotated sequences of athlete skeletons. We employed a crowdsourcing platform where people around the world were asked to annotate frames and obtained more than 3000 fully annotated frames for 42 different sequences with a variety of poses and actions. The second purpose is an improved understanding of crowdworkers, and for this purpose, we collected over 10000 written feedbacks from 592 crowdworkers. This is valuable information for crowdsourcing researchers who explore algorithms for worker quality assessment. In addition to the complete dataset, we also provide the code for the application that has been used to collect the data as an open source software.

Paper: Heimdallr: A Dataset for Sport Analysis
Authors: M. Riegler, D.-T. Dang-Nguyen, B. Winther, C. Griwodz, K. Pogorelov, P. Halvorsen
Link: https://dx.doi.org/10.1145/2910017.2910621
Data set: https://bitbucket.org/mpg_code/bagadus-humanactionretrieval/src/c2e62728cacccd64de79f4ecb09410261a410644?at=master and https://www.dropbox.com/sh/05lfejrwvk65k3q/AAC9IbzNI1Ivjh7sUb1lwuEAa?dl=0

A new HD and UHD video eye tracking dataset

The emergence of UHD video format induces larger screens and involves a wider stimulated visual angle. Therefore, its effect on visual attention can be questioned since it can impact quality assessment, metrics but also the whole chain of video processing and creation. Moreover, changes in visual attention from different viewing conditions challenge visual attention models. In this paper, we present a new HD and UHD video eye tracking dataset composed of 37 high quality videos observed by more than 35 naive observers. This dataset can be used to compare viewing behavior and visual saliency in HD and UHD, as well as for any study on dynamic visual attention in videos.

Paper: A new HD and UHD video eye tracking dataset
Authors: T. Vigier, J. Rousseau, M. Perreira Da Silva, P. Le Callet
Link: https://dx.doi.org/10.1145/2910017.2910622
Data set: http://ivc.univ-nantes.fr/en/databases/HD_UHD_Eyetracking_Videos/

SMART: a Light Field image quality dataset

In this article, the design of a Light Field image dataset is presented. The availability of an image dataset is useful for design, testing, and benchmarking Light Field image processing algorithms. As first step, the image content selection criteria have been defined based on selected image quality key-attributes, i.e. spatial information, colorfulness, texture key features, depth of field, etc. Next, image scenes have been selected and captured by using the Lytro Illum Light Field camera. Performed analysis shows that the considered set of images is sufficient for addressing a wide range of attributes relevant to assess Light Field image quality.

Paper: SMART: a Light Field image quality dataset
Authors: P. Paudyal, R. Olsson, M. Sjöström, F. Battisti, M. Carli
Link: https://dx.doi.org/10.1145/2910017.2910623
Data set: https://www.comlab.uniroma3.it/SMART.html

Event discovery from single pictures is a challenging problem that has raised significant interest in the last decade. During this time, a number of interesting solutions have been proposed to tackle event discovery in still images. However, a large scale benchmarking image dataset for the evaluation and comparison of event discovery algorithms from single images is still lagging behind. To this aim, in this paper we provide a large-scale properly annotated and balanced dataset of 490,000 images, covering every aspect of 14 different types of social events, selected among the most shared ones in the social network. In the dataset we tried our best to cover every aspect of the considered social events by collecting images for the same event-types with diverse contents in terms of viewpoints, colors, group pictures vs. single portrait and outdoor vs. indoor images, where the high variability of the represented information can be effectively explored to ensure better performances in event classification. Such a large-scale collection of event-related images is intended to become a powerful support tool for the research community in multimedia analysis by providing a common benchmark for training, testing, validation and comparison of existing and novel algorithms.

Paper: USED: A Large Scale Social Event Detection Dataset
Authors: K. Ahmad, N. Conci, G. Boato, F. G. De Natale
Link: https://dx.doi.org/10.1145/2910017.2910624
Data set: https://loki.disi.unitn.it/~used/

Datasets for AVC (H.264) and HEVC (H.265) for Evaluating Dynamic Adaptive Streaming over HTTP (DASH)

In this work we present datasets for both trace-based simulation and real-time testbed evaluation of Dynamic Adaptive Streaming over HTTP (DASH). Our trace-based simulation dataset provides a means of evaluation in frameworks such as NS-2 and NS-3, while our testbed evaluation dataset offers a means of analysing the delivery of content over a physical network and associated adaptation mechanisms at the client. Our datasets are available in both H.264 and H.265 with encoding rates comparative to the representations and resolutions of content distribution providers such as Netflix, Hulu and YouTube.

The goal of our dataset is to provide researchers with a sufficiently large dataset, in both number, and duration, of clips which provides a comparison between both encoding schemes. We provide options for evaluating not only different content and genres, but also the underlying encoding metrics, such as transmission cost, segment distribution (the range of the oscillation of the segment sizes) and associated delivery issues such as jitter and re-buffering. Finally, we also offer our datasets in a header-only compressed format, which allows researchers to download the entire dataset and uncompress locally, thus ensuring that our datasets are accessible both online via remote and local servers.

Paper: Datasets for AVC (H.264) and HEVC (H.265) for Evaluating Dynamic Adaptive Streaming over HTTP (DASH)
Authors: J. Quinlan, A. Zahran, C. Sreenan
Link: https://dx.doi.org/10.1145/2910017.2910625
Data set: https://www.cs.ucc.ie/misl/research/current/ivid_dataset

2015

Use of the datasets in published work should be acknowledged by a full citation to the authors’ papers at the MMSys conference:

Proceedings of ACM MMSys '15, Portland, Oregon, March 18-20, 2015

Multi-sensor Concert Recording Dataset Including Professional and User-generated Content

We present a novel dataset for multi-view video and spatial audio. An ensemble of ten musicians from the BBC Philharmonic Orchestra performed in the orchestra’s rehearsal studio in Salford, UK, on 25th March 2014. This presented a controlled environment in which to capture a dataset that could be used to simulate a large event, whilst allowing control over the conditions and performance. The dataset consists of hundreds of video and audio clips captured during 18 takes of performances, using a broad range of professional- and consumer-grade equipment, up to 4K video and high-end spatial microphones. In addition to the audiovisual essence, sensor metadata has been captured, and ground truth annotations, in particular for temporal synchronization and spatial alignment, have been created. A part of the dataset has also been prepared for adaptive content streaming. The dataset is released under a Creative Commons Attribution Non-Commercial Share Alike license and hosted on a specifically adapted content management platform.

Paper: Multi-sensor Concert Recording Dataset Including Professional and User-generated Content
Authors: W. Bailer, C. Pike, R. Bauwens, R. Grandl, M. Matton, M. Thaler
Link: https://dx.doi.org/10.1145/2713168.2713191
Data set: https://icosole.lab.vrt.be/

In this paper we introduce a new dataset and its evaluation tools, Div150Cred, that was designed to support shared evaluation of diversification techniques in different areas of social media photo retrieval and related areas. The dataset comes with associated relevance and diversity assessments performed by human annotators. The data consists of 300 landmark locations represented via 45,375 Flickr photos, 16M photo links for around 3,000 users, metadata, Wikipedia pages and content descriptors for text and visual modalities. To facilitate distribution, only Creative Commons content was included in the dataset. The proposed dataset was validated during the 2014 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative.

Paper: Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset
Authors: B. Ionescu, A. Popescu, M. Lupu, A. Gînscă, B. Boteanu, H. Müller
Link: https://dx.doi.org/10.1145/2713168.2713192
Data set: Local copy or https://imag.pub.ro/~bionescu/index_files/Page6657.htm

A Scalable Video Coding Dataset and Toolchain for Dynamic Adaptive Streaming over HTTP

With video streaming becoming more and more popular, the number of devices that are capable of streaming videos over the Internet is growing. This leads to a heterogeneous device landscape with varying demands. Dynamic Adaptive Streaming over HTTP (DASH) offers an elegant solution to these demands. Smart adaptation logics are able to adjust the clients’ streaming quality according to several (local) parameters. Recent research indicated benefits of blending Scalable Video Coding (SVC) with DASH, especially considering Future Internet architectures. However, except for the DASH Dataset with a single SVC encoded video, no other datasets are publicly available. The contribution of this paper is two-fold. First, a DASH/SVC dataset, containing multiple videos at varying bitrates and spatial resolutions including 1080p, is presented. Second, a toolchain for multiplexing SVC encoded videos is provided, therefore making our results reproducible and allowing researchers to generate their own datasets.

Paper: A Scalable Video Coding Dataset and Toolchain for Dynamic Adaptive Streaming over HTTP
Authors: C. Kreuzberger, D. Posch, H. Hellwagner
Link: https://dx.doi.org/10.1145/2713168.2713193
Data set: https://concert.itec.aau.at/SVCDataset/

RAISE - A Raw Images Dataset for Digital Image Forensics

Digital forensics is a relatively new research area which aims at authenticating digital media by detecting possible digital forgeries. Indeed, the ever increasing availability of multimedia data on the web, coupled with the great advances reached by computer graphical tools, makes the modification of an image and the creation of visually compelling forgeries an easy task for any user. This in turns creates the need of reliable tools to validate the trustworthiness of the represented information. In such a context, we present here RAISE, a large dataset of 8156 high-resolution raw images, depicting various subjects and scenarios, properly annotated and available together with accompanying metadata. Such a wide collection of untouched and diverse data is intended to become a powerful resource for, but not limited to, forensic researchers by providing a common benchmark for a fair comparison, testing and evaluation of existing and next generation forensic algorithms. In this paper we describe how RAISE has been collected and organized, discuss how digital image forensics and many other multimedia research areas may benefit of this new publicly available benchmark dataset and test a very recent forensic technique for JPEG compression detection.

Paper: RAISE - A Raw Images Dataset for Digital Image Forensics
Authors: D. Dang-Nguyen, C. Pasquini, V. Conotter, G. Boato
Link: https://dx.doi.org/10.1145/2713168.2713194
Data set: https://mmlab.science.unitn.it/RAISE/ (350 GB)

YouTube Live and Twitch: A Tour of User-Generated Live Streaming Systems

User-Generated live video streaming systems are services that allow anybody to broadcast a video stream over the Internet. These Over-The-Top services have recently gained popularity, in particular with e-sport, and can now be seen as competitors of the traditional cable TV. In this paper, we present a dataset for further works on these systems. This dataset contains data on the two main user-generated live streaming systems: Twitch and the live service of YouTube. We got three months of traces of these services from January to April 2014. Our dataset includes, at every five minutes, the identifier of the online broadcaster, the number of people watching the stream, and various other media information. In this paper, we introduce the dataset and we make a preliminary study to show the size of the dataset and its potentials. We first show that both systems generate a significant traffic with frequent peaks at more than 1 Tbps. Thanks to more than a million unique uploaders, Twitch is in particular able to offer a rich service at anytime. Our second main observation is that the popularity of these channels is more heterogeneous than what have been observed in other services gathering user-generated content.

Paper: YouTube Live and Twitch: A Tour of User-Generated Live Streaming Systems
Authors: K. Pires, G. Simon
Link: https://dx.doi.org/10.1145/2713168.2713195
Data set: https://dash.ipv6.enstb.fr/dataset/live-sessions/

The Toulouse Vanishing Points Dataset

In this paper we present the Toulouse Vanishing Points Dataset, a public photographs database of Manhattan scenes taken with an iPad Air 1. The purpose of this dataset is the evaluation of vanishing points estimation algorithms. Its originality is the addition of Inertial Measurement Unit (IMU) data synchronized with the camera under the form of rotation matrices. Moreover, contrary to existing works which provide vanishing points of reference in the form of single points, we computed uncertainty regions.

Paper: The Toulouse Vanishing Points Dataset
Authors: V. Angladon, S. Gasparini, V. Charvillat
Link: https://dx.doi.org/10.1145/2713168.2713196
Data set: Local copy or http://ubee.enseeiht.fr/tvpd

Stanford I2V: A News Video Dataset for Query-by-Image Experiments

Reproducible research in the area of visual search depends on the availability of large annotated datasets. In this paper, we address the problem of querying a video database by images that might share some contents with one or more video clips. We present a new large dataset, called Stanford I2V. We have collected more than 3,800 hours of newscast videos and annotated more than 200 ground-truth queries. In the following, the dataset is described in detail, the collection methodology is outlined and retrieval performance for a benchmark algorithm is presented. These results may serve as a baseline for future research and provide an example of the intended use of the Stanford I2V dataset.

Paper: Stanford I2V: A News Video Dataset for Query-by-Image Experiments
Authors: A. Araujo, J. Chaves, D. Chen, R. Angst, B. Girod
Link: https://dx.doi.org/10.1145/2713168.2713197
Data set: https://purl.stanford.edu/zx935qw7203 (more information)

Data Set of Fall Events and Daily Activities from Inertial Sensors

Wearable sensors are becoming popular for remote health monitoring as technology improves and cost reduces. One area in which wearable sensors are increasingly being used is falls monitoring. The elderly, in particular are vulnerable to falls and require continuous monitoring. Indeed, many attempts, with insufficient success have been made towards accurate, robust and generic falls and Activities of Daily Living (ADL) classification. A major challenge in developing solutions for fall detection is access to sufficiently large data set. This paper presents a description of the data set and the experimental protocols designed by the authors for the simulation of falls, near-falls and ADL. Forty-two volunteers were recruited to participate in an experiment that involved a set of scripted protocols. Four types of falls (forward, backward, lateral left and right) and several ADL were simulated. This data set is intended for the evaluation of fall detection algorithms by combining daily activities and transitions from one posture to another with falls. In our prior work, machine learning based fall detection algorithms were developed and evaluated. Results showed that our algorithm was able to discriminate between falls and ADL with an F-measure of 94%.

Paper: Data Set of Fall Events and Daily Activities from Inertial Sensors
Authors: O. Ojetola, E. Gaura, J. Brusey
Link: https://dx.doi.org/10.1145/2713168.2713198
Data set: Local copy or https://cogentee.coventry.ac.uk/datasets/fall_adl_data.zip
Addendum: This addendum includes additional data from several new subjects. This additional data consists of two parts: The first part has subjects with sensors on their chest, thigh, and waist, and the second part have chest and the thigh data.

A Multi-Lens Stereoscopic Synthetic Video Dataset

This dataset paper describes a multi-lens stereoscopic synthetically generated video dataset and model. Creating a multi-lens video stream requires that the lens be placed at a spacing less than one inch. While such cameras exist on the market, they are not “professional” enough to allow for necessary things such as zoom-lens control or synchronization between cameras. This dataset provides 20 synthetic models, an associated multi-lens walkthrough, and the uncompressed video from its generation. This dataset can be used for multi-view compression research, view-interpolation, or other computer graphics related research.

Paper: A Multi-Lens Stereoscopic Synthetic Video Dataset
Authors: F. Zhang, W. Feng, F. Liu
Link: https://dx.doi.org/10.1145/2713168.2713199
Data set: (tarball) or https://maserati.cs.pdx.edu/Dataset

2014

Use of the datasets in published work should be acknowledged by a full citation to the authors’ papers at the MMSys conference:

Proceedings of ACM MMSys '14, March 19 - March 21, 2014, Singapore, Singapore

Ultra high definition HEVC DASH data set

This is a Ultra High Definition HEVC DASH dataset ranging from HD to UHD in different bit rates. This data set may be used to simulate UHD DASH services, whether on-demand or live, using real-life professional quality content.

Paper: Ultra high definition HEVC DASH data set, J. Le Feuvre, J-M. Thiesse, M. Parmentier, M. Raulet, C. Daguet
Link: https://dx.doi.org/10.1145/2557642.2563672
Data set: https://download.tsi.telecom-paristech.fr/gpac/dataset/dash/uhd

LaRED: A Large RGB-D Extensible Hand Gesture Dataset

This is a Large RGB-D Extensible hand gesture data set, recorded with an Intel’s newly-developed short range depth camera.

Paper: LaRED: a large RGB-D extensible hand gesture dataset, Yuan-Sheng Hsiao, Jordi Sanchez-Riera, Tekoing Lim, Kai-Lung Hua, Wen-Huang Cheng
Link: https://dx.doi.org/10.1145/2557642.2563669
Data set: https://mclab.citi.sinica.edu.tw/dataset/lared/lared.html

This data set, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machine-crowd integration).

Paper: Div400: a social image retrieval result diversification dataset, Bogdan Ionescu, Anca-Livia Radu, María Menéndez, Henning Müller, Adrian Popescu, Babak Loni
Link: https://dx.doi.org/10.1145/2557642.2563670
Data set: https://skulddata.cs.umass.edu/traces/mmsys/2014/user01.tar

Measuring DASH Streaming Performance from the End Users Perspective using Neubot

This data set provides data, which collected by a DASH module built on top of Neubot, an open source tool for the collection of network measurements.

Paper: Measuring DASH streaming performance from the end users perspective using neubot, Simone Basso, Antonio Servetti, Enrico Masala, Juan Carlos De Martin
Link: https://dx.doi.org/10.1145/2557642.2563671
Data set: https://skulddata.cs.umass.edu/traces/mmsys/2014/user02.tar

World-Wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging

This is a dataset of geotagged photos on a world-wide scale. The dataset contains a sample of more than 14 million geotagged photos crawled from Flickr with the corresponding metadata.

Paper: World-wide scale geotagged image dataset for automatic image annotation and reverse geotagging, Hatem Mousselly-Sergieh, Daniel Watzinger, Bastian Huber, Mario Döller, Elöd Egyed-Zsigmond, Harald Kosch
Link: http://dx.doi.org/10.1145/2557642.2563669
Data set: http://skulddata.cs.umass.edu/traces/mmsys/2014/user03.tar

This set consists of about 430,000 photos from Flickr together with the underlying ground truth consisting of about 21,000 social events. All the photos are accompanied by their textual metadata. The ground truth for the event groupings has been derived from event calendars on the Web that have been created collaboratively by people.

Paper: ReSEED: social event dEtection dataset, Timo Reuter, Symeon Papadopoulos, Vasilios Mezaris, Philipp Cimiano
Link: https://dx.doi.org/10.1145/2557642.2563674
Data set: https://skulddata.cs.umass.edu/traces/mmsys/2014/user04.tar

The dataset contains more than 32000 images, their context and social metadata, related to the fashion and clothing domain.

Paper: Fashion 10000: an enriched social image dataset for fashion and clothing, Babak Loni, Lei Yen Cheung, Michael Riegler, Alessandro Bozzon, Luke Gottlieb, Martha Larson
Link: https://dx.doi.org/10.1145/2557642.2563675
Data set: https://skulddata.cs.umass.edu/traces/mmsys/2014/user05.tar

The EBU MIM-SCAIE Content Set for Automatic Information Extraction on Broadcast Media

This data set that has been made available by the European Broadcasting Union (EBU). The content in the set consists of broadcast media content collected from different broadcasters around the world. This content set is made available to the research community in order to evaluate automatic information extraction tools on this broadcast media. The set also contains ground truth data and annotations for several automatic information extraction tasks.

Paper: The EBU MIM-SCAIE content set for automatic information extraction on broadcast media, Mike Matton, Alberto Messina, Werner Bailer, Jean-Pierre Évain
Link: https://dx.doi.org/10.1145/2557642.2563676
Data set: https://ebu-scaie.lab.vrt.be/mammie

Soccer Video and Player Position Dataset

This is a dataset of body-sensor traces and corresponding videos from several professional soccer games captured in late 2013 at the Alfheim Stadium in Tromsø, Norway. Player data, including field position, heading, and speed are sampled at 20Hz using the highly accurate ZXY Sport Tracking system.

Paper: Soccer video and player position dataset, Svein Arne Pettersen, Dag Johansen, Håvard Johansen, Vegard Berg-Johansen, Vamsidhar Reddy Gaddam, Asgeir Mortensen, Ragnar Langseth, Carsten Griwodz, Håkon Kvale Stensland, Pål Halvorsen
Link: https://dx.doi.org/10.1145/2557642.2563677
Data set: https://home.ifi.uio.no/paalh/dataset/alfheim/

YawDD: A Yawning Detection Dataset

YawDD provides two video datasets of drivers with various facial characteristics, to be used for designing and testing algorithms and models for yawning detection.

Paper: YawDD: a yawning detection dataset, Shabnam Abtahi, Mona Omidyeganeh, Shervin Shirmohammadi, Behnoosh Hariri
Link: https://dx.doi.org/10.1145/2557642.2563678
Data set: https://skulddata.cs.umass.edu/traces/mmsys/2014/user06.tar

2013

Use of the datasets in published work should be acknowledged by a full citation to the authors’ papers at the MMSys conference:

Proceedings of ACM MMSys '13, February 27 - March 1, 2013, Oslo, Norway

More than 160 thousand Flickr photos and their accompanying metadata, as well as a list of 149 manually selected and annotated target events, each of which is defined as a set of relevant photos.

Paper: The 2012 social event detection dataset. Symeon Papadopoulos, Emmanouil Schinas, Vasileios Mezaris, Raphaël Troncy, Ioannis Kompatsiaris,
Dataset

A Professionally Annotated and Enriched Multimodal Data Set on Popular Music

A multimodal data set of professionally annotated music, including editorial metadata about songs, albums, and artists, as well as MusicBrainz identifiers to facilitate linking to other data sets.

Paper: A professionally annotated and enriched multimodal data set on popular music. Markus Schedl, Nicola Orio, Cynthia C. S. Liem, Geoffroy Peeters.
Dataset

Commute Path Bandwidth Traces from 3G Networks: Analysis and Applications

Real-world measurements of throughput achieved at the application layer when adaptive HTTP streaming was performed over 3G networks using mobile devices.

Paper: Commute path bandwidth traces from 3G networks: analysis and applications. Haakon Riiser, Paul Vigmostad, Carsten Griwodz, Pål Halvorsen.
Dataset

Video Surveillance Online Repository (ViSOR)

An open platform for collecting, annotating, and sharing surveillance videos. Most of the included videos are annotated, based on a reference ontology which integrates hundreds of concepts, some of them coming from the LSCOM and MediaMill ontologies.

Paper: Video Surveillance Online Repository (ViSOR): an integrated framework. Roberto Vezzani, Rita Cucchiara.
Repository (external link)

A mix of general images as well as images that are focused on fashion (i.e., relevant to particular clothing items or fashion accessories). The dataset contains 4810 images and related metadata.

Paper: Fashion-focused creative commons social dataset. Babak Loni, Maria Menendez, Mihai Georgescu, Luca Galli, Claudio Massari, Ismail Sengor Altingovde, Davide Martinenghi, Mark Melenhorst, Raynor Vliegendhart, Martha Larson.
Dataset

A dataset containing comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple ‘social levels’.

Paper: Blip10000: a social video dataset containing SPUG content for tagging and retrieval. S. Schmiedeke, P. Xu, I. Ferrané, M. Eskevich, C. Kofler, M. Larson, Y. Estève, L. Lamel, G. Jones, T. Sikora.
Dataset

The Jiku Mobile Video Dataset

A dataset containing videos that could represent characteristics of mobile videos captured in realistic scenarios, consisting of videos simultaneously recorded using mobile devices by multiple users attending performance events.

Paper: The jiku mobile video dataset. Mukesh Saini, Seshadri Padmanabha Venkatagiri, Wei Tsang Ooi, Mun Choon Chan.
Full dataset download script
Selective video download page (external link)

SopCast P2P Live Streaming Traces

Logs from a very popular P2P live streaming application, the SopCast.

Paper: SopCast P2P live streaming: live session traces and analysis. Alex Borges Vieira, Ana Paula Couto da Silva, Francisco Henrique, Glauber Goncalves, Pedro de Carvalho Gomes.
Dataset

Monitoring Mobile Video Delivery to Android Devices

A dataset of wireless network behavior, geo-coordinates, and packet traces for popular streaming applications on Android certified devices, gathered in a 3G network for both HTTP and peer-to-peer video streaming applications.

Paper: Monitoring mobile video delivery to Android devices. Philipp M. Eittenberger, Michael Hamatschek, Marcel Großmann, Udo R. Krieger.
Dataset

Distributed DASH Dataset

D-DASH is a dataset of content for the Dynamic Adaptive Streaming over HTTP (DASH) standard from MPEG.

Paper: Distributed DASH dataset. Stefan Lederer, Christopher Mueller, Christian Timmerer, Cyril Concolato, Jean Le Feuvre, Karel Fliegel.
Dataset

Consumer video dataset with marked head trajectories

A dataset gathered using a handheld camcorder and a mobile phone that includes ground truth data on person head trajectories and other people marked in the background in MPEG-7-based metadata model.

Paper: Consumer video dataset with marked head trajectories. Jouni Sarvanko, Mika Rautiainen, Arto Heikkinen, Mika Ylianttila.
Dataset

ACM Multimedia System Conference Dataset Archive

2016

GSET Somi: A Game-Specific Eye Tracking Dataset for Somi

GeoUGV: User-Generated Mobile Video Dataset with Fine Granularity Spatial Metadata

Comprehensive Mobile Bandwidth Traces from Vehicular Networks

Right Inflight? A Dataset for Exploring the Automatic Prediction of Movies Suitable for a Watching Situation

Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries

Heimdallr: A Dataset for Sport Analysis

A new HD and UHD video eye tracking dataset

SMART: a Light Field image quality dataset

USED: A Large Scale Social Event Detection Dataset

Datasets for AVC (H.264) and HEVC (H.265) for Evaluating Dynamic Adaptive Streaming over HTTP (DASH)

2015

Multi-sensor Concert Recording Dataset Including Professional and User-generated Content

Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset

A Scalable Video Coding Dataset and Toolchain for Dynamic Adaptive Streaming over HTTP

RAISE - A Raw Images Dataset for Digital Image Forensics

YouTube Live and Twitch: A Tour of User-Generated Live Streaming Systems

The Toulouse Vanishing Points Dataset

Stanford I2V: A News Video Dataset for Query-by-Image Experiments

Data Set of Fall Events and Daily Activities from Inertial Sensors

A Multi-Lens Stereoscopic Synthetic Video Dataset

2014

Ultra high definition HEVC DASH data set

LaRED: A Large RGB-D Extensible Hand Gesture Dataset

Div400: A Social Image Retrieval Result Diversification Dataset

Measuring DASH Streaming Performance from the End Users Perspective using Neubot

World-Wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging

ReSEED: Social Event dEtection Dataset

Fashion 10000: An Enriched Social Image Dataset for Fashion and Clothing

The EBU MIM-SCAIE Content Set for Automatic Information Extraction on Broadcast Media

Soccer Video and Player Position Dataset

YawDD: A Yawning Detection Dataset

2013

The 2012 Social Event Detection Dataset

A Professionally Annotated and Enriched Multimodal Data Set on Popular Music

Commute Path Bandwidth Traces from 3G Networks: Analysis and Applications

Video Surveillance Online Repository (ViSOR)

Fashion-focused Creative Commons Social dataset

Blip10000: A social Video Dataset containing SPUG Content for Tagging and Retrieval

The Jiku Mobile Video Dataset

SopCast P2P Live Streaming Traces

Monitoring Mobile Video Delivery to Android Devices

Distributed DASH Dataset

Consumer video dataset with marked head trajectories