# Overview

The Context Guided Agents Dataset was created as part of Dominik Gotojuch's PhD thesis 
[*Controlled design of human-like agents with context-guided learning for automated video game playing*](https://theses.gla.ac.uk/id/eprint/85486) [1], under the supervision of Professor Roderick Murray-Smith and Doctor John Williamson, at the School of Computing Science, University of Glasgow.

The dataset features gameplay telemetry samples from the scavenge portion of the commercial video game *60 Seconds!* [2], which were processed to include additional inference data, and encoded as JSON files. The game tasks players with navigating in a three dimensional game environment to pursue time-constrained foraging tasks. Data was crowd-sourced from the players of the Steam desktop PC version of the game. The dataset contains 8,244,111 gameplay telemetry samples, each representing a single game session, for 808,659 unique players. It was crowdsourced from the real players of the game, playing the game in their natural play setups between January 2017 and May 2022.

The dataset features game setup, progression and evaluation data, as well as navigation and interaction traces of players performing foraging tasks in the 3D space of the game’s environment. 

Additional resources and support for the dataset will be provided on [the author's project site] (https://gotojuch.com/contextguidedagents/) and [the project's public repository] (https://github.com/viadomx/ContextGuidedAgents).

# Data

## Structure

Dataset includes processed gameplay trajectory samples and the game's design data values.

The processed gameplay telemetry samples are organised in directories indexed by unique player identifiers. Each directory contains a collection of trajectories generated by a specific player. Each JSON file found in a directory represents a single, recorded gameplay trajectory. Number of trajectories per player vary, since some players played the game more than others.

The design data values are stored in dedicated JSON files, corresponding to difficulty, game type, item, and environment data. 

As part of processing, some values were compressed to trim redundant data. For example, 3D vectors were compressed into 2D vectors, since vertical positioning of objects in the game environment does not affect the gameplay.

## Access

To reduce storage footprint the dataset was compressed into 200 zip archives of gameplay telemetry data files (ContextGuidedAgentsDataset_001.zip, ContextGuidedAgentsDataset_002.zip, ..., ContextGuidedAgentsDataset_200.zip). Each archive contains a collection of directories, each representing all recorded game samples for a specific player. An additional zip archive featuring design data files (ContextGuidedAgentsDataset_DesignData.zip) is also included. All archives should be extracted to a single location for ease of access. 

Please be aware that the extracted dataset is over 1 TB in size.

### Gameplay Telemetry

Processed gameplay telemetry data samples are stored in JSON files, which contain normalised
and structured game session data, as well as additional inference and scoring data. The structure of a processed data sample follows:

* **game**: game session information, which includes:
	* **game_id**: game identifier (string)
	* **user_id**: user identifier (string)
	* **date**: recording date (string, YYYY-MM-DD format)
	* **time**: recording time (string, HHMMSS format)
	* **game_time**: actual execution time of the game session (float)
	* **level_id**: level identifier (integer), referencing the relevant level design data
	* **finished**: was game finished flag (integer, 1 for properly finished game, 0 for game that was prematurely aborted)
	* **won**: was the game session completed successfully (boolean)
	* **paused**: was the game paused (boolean)
	* **game_type_id**: game type identifier (integer), referencing the relevant game type design data (ContextGuidedAgentsDataset_DesignData.zip/game_type_data.json)
	* **prep_time**: exploration stage time (integer)
	* **run_time**: collection stage time (integer)
	* **total_time**: game session total time, including exploration and collection times (integer)
	* **elapsed_time**: actual game session total time, which includes startup and conclusion margins (float)
	* **start_position**: the starting position of the avatar (2D vector)
	* **exit_position**: the position of the exit (2D vector)
	* **last_second_deposit**: was an automatic deposit made at the conclusion of a successfulgame session because the avatar’s inventory was not empty(boolean)
	* **difficulty**: identifier of the game session’s difficulty level (integer), referencing the relevant difficulty design data (ContextGuidedAgentsDataset_DesignData.zip/difficulty_data.json).
	* **default_character**: was the default character (Ted) used in the game session (boolean)
	* **extended_item_set**: was the extended item set used in the game session (boolean)
	* **items_count**: number of items collected during the entire game session (integer)
	* **items_count_early_game**: number of items collected during the first half of the game (integer)
	* **items_count_late_game**: number of items collected during the second half of the game (integer)
	* **items_value**: total weight of items collected during the game session (integer)
	* **items_value_early_game**: total weight of items collected during the first half of the game (integer)
	* **items_value_late_game**: total weight of items collected during the second half of the game (integer)
	* **items_distribution**: for each real time second of the collection stage of the game session, a flag denoting if a collection interaction was performed at a given time, or not (60 character string, each character is either 0 or 1, with the former denoting no collection, and the latter meaning that a collection was made)
	* ** items_order**: items collected during the game session, listed in the order of collection (string, each character represents the type of item collected, value range: "A"-"W")
	* **items_collection_ratio**: ratio of collected items count to total number of spawned items (float)
	* **items_spawned_count**: number of items spawned for the game session (integer)
	* **items_spawned_value**: total weight of items spawned in the game session (float)
	* **deposits_count**: total number of deposits during the game session (integer)
	* **deposits_early_game**: number of deposits during the first half of the game (integer)
	* **deposits_late_game**: number of deposits during the second half of the game (integer)
	* **deposits_avg_load**: average deposit weight in the game session (float)
	* **deposits_distribution**: for each real time second of the collection stage of the game session, a flag denoting if a deposit interaction was performed at a given time, or not (60 character string, each character is either 0 or 1, with the former denoting no deposit, and	the latter meaning that a deposit was made)
	* **interactions_count**: interaction count during the entire game session (integer)
	* **interactions_early_game**: interaction count during the first half of the game (integer)
	* **interactions_late_game**: interaction count during the second half of the game (integer)
	* **collisions_count_game**: number of collisions during the second half of the game (integer)
	* **collisions_early_game**: number of collisions during the first half of the game (integer)
	* **collisions_late_game**: number of collisions during the second half of the game (integer)
	* **collisions_early_game_ratio**: ratio of collision count during the first half of the game to total collision count (float)
	* **collisions_late_game_ratio**: ratio of collision count during the second half of the game to total collision count (float)
	* **move_time_prep**: move time during the exploration stage (float)
	* **move_time_game**: move time during the entire game (float)
	* **move_time_early_game**: move time during the first half of the game (float)
	* **move_time_late_game**: move time during the second half of the game (float)
	* **move_time_ratio_prep**: ratio of movement time during exploration stage to total exploration time (float).
	* **move_time_ratio_early_game**: ratio of movement time during the first half of the game to total movement time during the game (float)
	* **move_time_ratio_late_game**: ratio of movement time during the second half of the game to total movement time during the game (float)
	* **move_time_ratio_game**: ratio of time spent on moving during the game to total game time (float)
	* **pause_game_time**: length of game being paused (float)
	* **pause_game_ratio**: ratio of time the game was paused to total game time (float)
	* **interaction_game_time**: time spent on interactions during the game (float)
	* **interaction_game_ratio**: ratio of time spent on interactions during the game to total game	time (float)
	* **move_distance_prep**: move distance during the exploration stage (float)
	* **move_distance_game**: move distance during the collection stage (float)
	* **move_distance_early_game**: move distance during the first half of the game (float)
	* **move_distance_late_game**: move distance during the second half of the game (float)
	* **move_distance_ratio_early_game**: ratio of movement distance during the first half of the game to total movement distance (float)
	* **move_distance_ratio_late_game**: ratio of movement distance during the second half of the game to total movement distance (float)
	* **traversal_prep**: sequence of rooms visited by the during the exploration stage (string, each character denotes the type of room, visited, value range: "A"-"F")
	* **traversal_game**: sequence of rooms visited by the player during the collection stage (string, each character denotes the type of room visited, value range: "A"-"F")
	* **traversal_game_distribution**: for each real time second of the collection stage of the	game session, the room in which the avatar was at the time (60 character string, each character denotes the type of room, in which the was at a given second, value range: "A"-"F")
	* **traversal_exploration_ratio**: ratio of movement time during the exploration stage to total	exploration time (float)
	* **evaluation_completion**: completion score achieved in the game session, expressed as 0.0 if the player did not reach the exit in time, or 1.0 if they did (float)
	* **evaluation_collection**: collection score achieved in the game session, expressed as the sum of weights (values) of collected items divided by the total sum of weights of all collectable items present in the environment (float)
	* **evaluation**: full score achieved in the game session, calculated as the sum of completion and collection scores, divided by two (float)

* **items**: list of items spawned in the game level. Each entry contains the following data:

	* **local_index**: index based on the entry’s position in the list (integer).
	* **item_id**: item identifier (integer), referencing the relevant item design data.
	* **position**: position of the item (2D vector).
	* **rotation**: rotation of the item (quaternion).

* **collected**: list of items collected during a game session. Each entry contains the following data:
	* **local_index**: index based on the order of collection (integer).
	* **item_id**: item identifier(integer), referencing the relevant item design data.
	* **position**: position of the collected item (2D vector).

* **deposits**: list of deposits made by the avatar during a game session. Each entry contains the following data:
	* **local_index**: index based on the ordering of deposits (integer).
	* **timestamp**: deposit interaction timestamp (float).

* **nav**: list of an avatar’s navigation steps during a game session. Each entry contains the following data:

	* **local_index**: index based on the order of collection (integer).
	* **timestamp**: navigation step timestamp (float).
	* **position**: position of the avatar during the navigation step (2D vector).
	* **rotation**: rotation of the avatar during the navigation step (quaternion).

* **collisions**: list of collisions with environment objects, caused by the avatar during a game session. Each entry contains the following data:

	* **local_index**: index based on the order of collisions (integer).
	* **timestamp**: collision timestamp (float).
	* **position**: position of the collided object (2D vector).
	* **rotation**: rotation of the collided object (quaternion).

### Design Data

Numerical representations of design values from the game were transcribed to dedicated JSON files. Indexed design representations are referenced in the processed gameplay trajectory samples, providing additional design context for the telemetry recorded.

#### Difficulty Data
Identification and basic setup of each difficulty level that can be selected for a specific game. Stored as a list of difficulty levels, each represented by:
* **id**: numerical index value for identification (integer)
* **name**: game design derived difficulty level name ["EASY", "NORMAL", "HARD", "CHALLENGE_SCAVENGE"] (string)
* **preptime**: exploration time allowed for the specific difficulty level (integer)

#### Game type data
Identification of all game types that can be played and influence the conditions of a specific game. Stored as a list of game types, each represented by:
* **id**: index value used for identification (integer)
* **name**: game design derived game type name ["TUTORIAL", "FULL", "SCAVENGE", "SURVIVAL", "CHALLENGE_SCAVENGE"] (string)

#### Item data
Details of each item type available to be collected in the game. Stored as a list of item types, each represented by:
* **id**: index value for identification (integer)
* **name**: game design derived game type name ["food, "water", "mom", "son", "daughter", "suitcase", "padlock", "torch", "spray", "handbook", "radio", "cards", "map", "gasmask", "checkers", "firstaid", "ammo", "axe", "rifle", "harmonica", "dad"] (string)
* **shortcode**: symbol representing the item type for inferring data, such as item collection order (character)
* **value**: weight of the item, which is considered an objective value equivalent (integer)

#### Environment data
Layout details for all environments found in the game. Stored as a list of game levels, each represented by:
* **id** - index value for identification (integer)
* **level_name** - game design derived level name ["level_scavenge_1", "level_scavenge_2", "level_scavenge_3", "level_scavenge_4", "level_scavenge_5", "level_scavenge_6", "level_scavenge_7", "level_scavenge_8", "level_scavenge_9", "level_scavenge_10", "level_scavenge_11", "level_scavenge_12", "level_scavenge_13", "level_scavenge_14", "level_scavenge_15", "level_scavenge_16", "level_scavenge_17", "level_scavenge_18", "level_scavenge_19", "level_scavenge_20", "level_scavenge_tutorial", "level_challenge_babysitter", "level_challenge_halloween", "level_challenge_holidays", "level_challenge_honeymoon", "level_challenge_murica", "level_challenge_siberia", "level_challenge_soup", "level_challenge_valhalla", "level_space1_1", "level_space1_2", "level_space1_3", "level_space1_4", "level_space1_5", "level_space2_0", "level_space2_1_StartPosition_MINOR", "level_space2_1_ExitPosition_MINOR", "level_space2_1_WallPlacement_MINOR", "level_space2_1_ItemPlacement_MINOR", "level_space2_1_ObstaclePlacement_MINOR", "level_space2_2_StartPosition_MEDIUM", "level_space2_2_ExitPosition_MEDIUM", "level_space2_2_WallPlacement_MEDIUM", "level_space2_2_ItemPlacement_MEDIUM", "level_space2_2_ObstaclePlacement_MEDIUM", "level_space2_3_StartPosition_MAJOR", "level_space2_3_ExitPosition_MAJOR", "level_space2_3_WallPlacement_MAJOR", "level_space2_3_ItemPlacement_MAJOR", "level_space2_3_ObstaclePlacement_MAJOR"] (string)
* **rooms** - a list of rooms found in a given level, each represented by:
	* **name** - game design derived room archetype name ["AShelter1", "AShelter2", "AShelter3", "AToilet1", "AToilet2", "AToilet3", "BBathroom1", "BBedroom1", "BKidsRoom1", "BKitchen1", "BLivingRoom1", "CKitchen1", "CBedroom1", "CBathroom1", "CKidsRoom1", "CLivingRoom1", "DKidsRoom1", "DBathroom1", "DBedroom1", "DKitchen1", "DLivingRoom1", "EBedroom1", "EBathroom1", "ELivingRoom1", "EKitchen1", "EKidsRoom1", "FBedroom1", "FBathroom1", "FBedroom1", "FKidsRoom1", "FKitchen1", "FLivingRoom1", "GHall1", "GHall2", "GHall3"] (string)
	* **class** - symbol representing the room archetype for inferring data, such as room visiting order (character) ["A", "B", "C", "D", "E", "F", "G", "X", "Y"]
	* **position** - position of a room in two dimensions (top-down perspective), identified by:
		* **x** - coordinate X (float)
		* **y** - coordinate Y (float)
	* "lower_corner" - lower corner of a room boundary in two dimensions (top-down perspective), identified by:
		* **x** - coordinate X (float)
		* **y** - coordinate Y (float)
	* "upper_corner" - upper corner of a room boundary in two dimensions (top-down perspective), identified by:
		* **x** - coordinate X (float)
		* **y** - coordinate Y (float)

## Handling

Data in the dataset can be retrieved directly by querying the JSON files provided, with respect to their documented structure. For structured querying, additional code resources will be provided on the author's project site and the public repository of the project, referenced in the *Overview* section. 

Existing gameplay trajectories can be replayed and new samples can be generated in the game simulator environment. The game simulator environment will also be provided on the project's site and repository.

[1] Gotojuch, D. (2025). *Controlled design of human-like agents with context-guided learning for automated video game playing* (Doctoral dissertation, University of Glasgow).

[2] Robot Gentleman (2015). *60 Seconds!*