SpaceMining RL Environment

What is SpaceMining?

SpaceMining is a single-agent reinforcement learning environment simulating asteroid mining in a 2D space environment. The agent (mining robot) must collect resources from asteroids, deliver them to the mothership, manage energy consumption, and avoid moving obstacles while maximizing efficiency.

Research Purpose

This environment was specifically designed to evaluate large language models' ability to design reward functions for unfamiliar environments without prior knowledge. Recent studies have raised concerns that large language models may carry prior knowledge from pretraining data about standard RL environments (like CartPole, BipedalWalker, Ant), leading to potential prompt leakage and evaluation biases.

To address this issue, SpaceMining serves as a custom environment to assess true generalization capabilities on tasks free from such pretrained knowledge. This allows researchers to evaluate whether LLMs can effectively design reward functions for completely novel environments.

Demo: Agent Behaviors

The GIF demonstrations showcase different agent behaviors and training outcomes, from successful resource collection to various failure modes and learning phases.

Early Exploration Eval num_timesteps=25000

Initial learning phase

Early Exploration Eval num_timesteps=500000

Initial learning phase

Successful Mining

Strategic resource collection

Complete Episode (1200 steps)

Optimal resource collection

Energy Depletion

Failed to recharge at mothership

Poor Obstacle Avoidance

Multiple collisions

Visual Elements Legend

Green Circle: Agent (mining robot)

Blue Circle: Mothership (delivery point)

Yellow Circles: Asteroids (mining targets)

Red Circles: Moving obstacles (avoid)

Red Ring: Mining range indicator

Light Blue Ring: Observation range

Health Bars: Green to red bars above asteroids show remaining resources

Status Display: Top-left shows inventory, energy, and step count

Features

2D physics-based space mining simulation with realistic movement
Dynamic resource management and energy consumption
Partial observability with limited observation radius
Moving obstacles and collision detection
Customizable reward structure and environment parameters
Real-time visualization with health bars and status indicators

Task Description

The agent is deployed in a 2D space environment (80x80 grid) with randomly distributed asteroids and a central mothership. The agent must:

Navigate efficiently to discover and mine resource-rich asteroids
Manage energy consumption and return to the mothership for recharging
Avoid collisions with moving obstacles
Deliver resources to maximize collection efficiency
Balance exploration and exploitation for optimal performance

The environment uses a comprehensive fitness scoring system with a target range of approximately 3000 points, evaluating resource collection, energy management, efficiency, and survival time.

Environment Specifications

State Space

The observation space includes:

Agent State (6 dimensions): Position (x, y), velocity (vx, vy), energy level, inventory
Asteroid Information (up to 45 dimensions): Relative positions and resource amounts for visible asteroids
Mothership Position (2 dimensions): Relative position to mothership

Action Space

The action space is continuous with 3 dimensions:

Thrust Control (2 dimensions): Force applied in x and y directions [-1.0, 1.0]
Mining Action (1 dimension): Binary mining activation [0.0, 1.0]

Difficulty Level

Medium Difficulty: The environment presents a balanced challenge with:

Limited observation radius (15 units) requiring strategic exploration
Energy management constraints requiring periodic returns to mothership
Moving obstacles requiring collision avoidance
Resource depletion mechanics requiring efficient mining strategies

Recommended Training Configuration

The environment is optimized for PPO and other modern RL algorithms. Here are the recommended parameters:

# Environment Parameters
max_episode_steps: 1200
grid_size: 80
max_asteroids: 12
max_resource_per_asteroid: 40
observation_radius: 15
mining_range: 8.0
max_inventory: 100

# Training Parameters
policy: MlpPolicy
learning_rate: 0.0003
total_timesteps: 3,000,000
batch_size: 64
n_steps: 2048
gamma: 0.99
gae_lambda: 0.95
clip_range: 0.2

Quick Start

Installation

Clone the repository and install dependencies:

git clone https://github.com/Lola-jo/space_mining.git
cd space_mining
pip install gymnasium numpy pygame stable-baselines3

Test Environment

python test_env.py

Train Agent

python train.py