ACM Multimedia Systems Conference Dataset Archive

This page hosts traces from the 2013-2015 ACM Multimedia Systems Conferences (ACM MMSys).

Datasets from MMSys 2012 and MMSys 2011 are available for download here.


Use of the datasets in published work should be acknowledged by a full citation to the authors' papers at the MMSys conference:

Proceedings of ACM MMSys '15, Portland, Oregon, March 18-20, 2015

Multi-sensor Concert Recording Dataset Including Professional and User-generated Content

We present a novel dataset for multi-view video and spatial audio. An ensemble of ten musicians from the BBC Philharmonic Orchestra performed in the orchestra's rehearsal studio in Salford, UK, on 25th March 2014. This presented a controlled environment in which to capture a dataset that could be used to simulate a large event, whilst allowing control over the conditions and performance. The dataset consists of hundreds of video and audio clips captured during 18 takes of performances, using a broad range of professional- and consumer-grade equipment, up to 4K video and high-end spatial microphones. In addition to the audiovisual essence, sensor metadata has been captured, and ground truth annotations, in particular for temporal synchronization and spatial alignment, have been created. A part of the dataset has also been prepared for adaptive content streaming. The dataset is released under a Creative Commons Attribution Non-Commercial Share Alike license and hosted on a specifically adapted content management platform.

Div150Cred: A Social Image Retrieval Result Diversification with User Tagging Credibility Dataset

In this paper we introduce a new dataset and its evaluation tools, Div150Cred, that was designed to support shared evaluation of diversification techniques in different areas of social media photo retrieval and related areas. The dataset comes with associated relevance and diversity assessments performed by human annotators. The data consists of 300 landmark locations represented via 45,375 Flickr photos, 16M photo links for around 3,000 users, metadata, Wikipedia pages and content descriptors for text and visual modalities. To facilitate distribution, only Creative Commons content was included in the dataset. The proposed dataset was validated during the 2014 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative.

A Scalable Video Coding Dataset and Toolchain for Dynamic Adaptive Streaming over HTTP

With video streaming becoming more and more popular, the number of devices that are capable of streaming videos over the Internet is growing. This leads to a heterogeneous device landscape with varying demands. Dynamic Adaptive Streaming over HTTP (DASH) offers an elegant solution to these demands. Smart adaptation logics are able to adjust the clients' streaming quality according to several (local) parameters. Recent research indicated benefits of blending Scalable Video Coding (SVC) with DASH, especially considering Future Internet architectures. However, except for the DASH Dataset with a single SVC encoded video, no other datasets are publicly available. The contribution of this paper is two-fold. First, a DASH/SVC dataset, containing multiple videos at varying bitrates and spatial resolutions including 1080p, is presented. Second, a toolchain for multiplexing SVC encoded videos is provided, therefore making our results reproducible and allowing researchers to generate their own datasets.

RAISE - A Raw Images Dataset for Digital Image Forensics

Digital forensics is a relatively new research area which aims at authenticating digital media by detecting possible digital forgeries. Indeed, the ever increasing availability of multimedia data on the web, coupled with the great advances reached by computer graphical tools, makes the modification of an image and the creation of visually compelling forgeries an easy task for any user. This in turns creates the need of reliable tools to validate the trustworthiness of the represented information. In such a context, we present here RAISE, a large dataset of 8156 high-resolution raw images, depicting various subjects and scenarios, properly annotated and available together with accompanying metadata. Such a wide collection of untouched and diverse data is intended to become a powerful resource for, but not limited to, forensic researchers by providing a common benchmark for a fair comparison, testing and evaluation of existing and next generation forensic algorithms. In this paper we describe how RAISE has been collected and organized, discuss how digital image forensics and many other multimedia research areas may benefit of this new publicly available benchmark dataset and test a very recent forensic technique for JPEG compression detection.

YouTube Live and Twitch: A Tour of User-Generated Live Streaming Systems

User-Generated live video streaming systems are services that allow anybody to broadcast a video stream over the Internet. These Over-The-Top services have recently gained popularity, in particular with e-sport, and can now be seen as competitors of the traditional cable TV. In this paper, we present a dataset for further works on these systems. This dataset contains data on the two main user-generated live streaming systems: Twitch and the live service of YouTube. We got three months of traces of these services from January to April 2014. Our dataset includes, at every five minutes, the identifier of the online broadcaster, the number of people watching the stream, and various other media information. In this paper, we introduce the dataset and we make a preliminary study to show the size of the dataset and its potentials. We first show that both systems generate a significant traffic with frequent peaks at more than 1 Tbps. Thanks to more than a million unique uploaders, Twitch is in particular able to offer a rich service at anytime. Our second main observation is that the popularity of these channels is more heterogeneous than what have been observed in other services gathering user-generated content.

The Toulouse Vanishing Points Dataset

In this paper we present the Toulouse Vanishing Points Dataset, a public photographs database of Manhattan scenes taken with an iPad Air 1. The purpose of this dataset is the evaluation of vanishing points estimation algorithms. Its originality is the addition of Inertial Measurement Unit (IMU) data synchronized with the camera under the form of rotation matrices. Moreover, contrary to existing works which provide vanishing points of reference in the form of single points, we computed uncertainty regions.

Stanford I2V: A News Video Dataset for Query-by-Image Experiments

Reproducible research in the area of visual search depends on the availability of large annotated datasets. In this paper, we address the problem of querying a video database by images that might share some contents with one or more video clips. We present a new large dataset, called Stanford I2V. We have collected more than 3,800 hours of newscast videos and annotated more than 200 ground-truth queries. In the following, the dataset is described in detail, the collection methodology is outlined and retrieval performance for a benchmark algorithm is presented. These results may serve as a baseline for future research and provide an example of the intended use of the Stanford I2V dataset.

Data Set of Fall Events and Daily Activities from Inertial Sensors

Wearable sensors are becoming popular for remote health monitoring as technology improves and cost reduces. One area in which wearable sensors are increasingly being used is falls monitoring. The elderly, in particular are vulnerable to falls and require continuous monitoring. Indeed, many attempts, with insufficient success have been made towards accurate, robust and generic falls and Activities of Daily Living (ADL) classi cation. A major challenge in developing solutions for fall detection is access to sufficiently large data set. This paper presents a description of the data set and the experimental protocols designed by the authors for the simulation of falls, near-falls and ADL. Forty-two volunteers were recruited to participate in an experiment that involved a set of scripted protocols. Four types of falls (forward, backward, lateral left and right) and several ADL were simulated. This data set is intended for the evaluation of fall detection algorithms by combining daily activities and transitions from one posture to another with falls. In our prior work, machine learning based fall detection algorithms were developed and evaluated. Results showed that our algorithm was able to discriminate between falls and ADL with an F-measure of 94%.

A Multi-Lens Stereoscopic Synthetic Video Dataset

This dataset paper describes a multi-lens stereoscopic synthetically generated video dataset and model. Creating a multi-lens video stream requires that the lens be placed at a spacing less than one inch. While such cameras exist on the market, they are not “professional” enough to allow for necessary things such as zoom-lens control or synchronization between cameras. This dataset provides 20 synthetic models, an associated multi-lens walkthrough, and the uncompressed video from its generation. This dataset can be used for multi-view compression research, view-interpolation, or other computer graphics related research.


Use of the datasets in published work should be acknowledged by a full citation to the authors' papers at the MMSys conference:

Proceedings of ACM MMSys '14, March 19 - March 21, 2014, Singapore, Singapore

Ultra high definition HEVC DASH data set

This is a Ultra High Definition HEVC DASH dataset ranging from HD to UHD in different bit rates. This data set may be used to simulate UHD DASH services, whether on-demand or live, using real-life professional quality content.

LaRED: A Large RGB-D Extensible Hand Gesture Dataset

This is a Large RGB-D Extensible hand gesture data set, recorded with an Intel's newly-developed short range depth camera.

Div400: A Social Image Retrieval Result Diversification Dataset

This data set, Div400, that was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machine-crowd integration).

Measuring DASH Streaming Performance from the End Users Perspective using Neubot

This data set provides data, which collected by a DASH module built on top of Neubot, an open source tool for the collection of network measurements.

World-Wide Scale Geotagged Image Dataset for Automatic Image Annotation and Reverse Geotagging

This is a dataset of geotagged photos on a world-wide scale. The dataset contains a sample of more than 14 million geotagged photos crawled from Flickr with the corresponding metadata.

ReSEED: Social Event dEtection Dataset

This set consists of about 430,000 photos from Flickr together with the underlying ground truth consisting of about 21,000 social events. All the photos are accompanied by their textual metadata. The ground truth for the event groupings has been derived from event calendars on the Web that have been created collaboratively by people.

Fashion 10000: An Enriched Social Image Dataset for Fashion and Clothing

The dataset contains more than 32000 images, their context and social metadata, related to the fashion and clothing domain.

The EBU MIM-SCAIE Content Set for Automatic Information Extraction on Broadcast Media

This data set that has been made available by the European Broadcasting Union (EBU). The content in the set consists of broadcast media content collected from different broadcasters around the world. This content set is made available to the research community in order to evaluate automatic information extraction tools on this broadcast media. The set also contains ground truth data and annotations for several automatic information extraction tasks.

Soccer Video and Player Position Dataset

This is a dataset of body-sensor traces and corresponding videos from several professional soccer games captured in late 2013 at the Alfheim Stadium in Tromsø, Norway. Player data, including field position, heading, and speed are sampled at 20Hz using the highly accurate ZXY Sport Tracking system

YawDD: A Yawning Detection Dataset

YawDD provides two video datasets of drivers with various facial characteristics, to be used for designing and testing algorithms and models for yawning detection.


Use of the datasets in published work should be acknowledged by a full citation to the authors' papers at the MMSys conference:

Proceedings of ACM MMSys '13, February 27 - March 1, 2013, Oslo, Norway

The 2012 Social Event Detection Dataset

More than 160 thousand Flickr photos and their accompanying metadata, as well as a list of 149 manually selected and annotated target events, each of which is defined as a set of relevant photos.

A Professionally Annotated and Enriched Multimodal Data Set on Popular Music

A multimodal data set of professionally annotated music, including editorial metadata about songs, albums, and artists, as well as MusicBrainz identifiers to facilitate linking to other data sets.

Commute Path Bandwidth Traces from 3G Networks: Analysis and Applications

Real-world measurements of throughput achieved at the application layer when adaptive HTTP streaming was performed over 3G networks using mobile devices.

Video Surveillance Online Repository (ViSOR)

An open platform for collecting, annotating, and sharing surveillance videos. Most of the included videos are annotated, based on a reference ontology which integrates hundreds of concepts, some of them coming from the LSCOM and MediaMill ontologies.

Fashion-focused Creative Commons Social dataset

A mix of general images as well as images that are focused on fashion (i.e., relevant to particular clothing items or fashion accessories). The dataset contains 4810 images and related metadata.

Blip10000: A social Video Dataset containing SPUG Content for Tagging and Retrieval

A dataset containing comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'.

The Jiku Mobile Video Dataset

A dataset containing videos that could represent characteristics of mobile videos captured in realistic scenarios, consisting of videos simultaneously recorded using mobile devices by multiple users attending performance events.

SopCast P2P Live Streaming Traces

Logs from a very popular P2P live streaming application, the SopCast.

Monitoring Mobile Video Delivery to Android Devices

A dataset of wireless network behavior, geo-coordinates, and packet traces for popular streaming applications on Android certified devices, gathered in a 3G network for both HTTP and peer-to-peer video streaming applications.

Distributed DASH Dataset

D-DASH is a dataset of content for the Dynamic Adaptive Streaming over HTTP (DASH) standard from MPEG.

Consumer video dataset with marked head trajectories

A dataset gathered using a handheld camcorder and a mobile phone that includes ground truth data on person head trajectories and other people marked in the background in MPEG-7-based metadata model.

login · print
Page last modified on November 18, 2015, at 11:18 AM