SpaceMining is a single-agent reinforcement learning environment simulating asteroid mining in a 2D space environment. The agent (mining robot) must collect resources from asteroids, deliver them to the mothership, manage energy consumption, and avoid moving obstacles while maximizing efficiency.
This environment was specifically designed to evaluate large language models' ability to design reward functions for unfamiliar environments without prior knowledge. Recent studies have raised concerns that large language models may carry prior knowledge from pretraining data about standard RL environments (like CartPole, BipedalWalker, Ant), leading to potential prompt leakage and evaluation biases.
To address this issue, SpaceMining serves as a custom environment to assess true generalization capabilities on tasks free from such pretrained knowledge. This allows researchers to evaluate whether LLMs can effectively design reward functions for completely novel environments.
The GIF demonstrations showcase different agent behaviors and training outcomes, from successful resource collection to various failure modes and learning phases.
Health Bars: Green to red bars above asteroids show remaining resources
Status Display: Top-left shows inventory, energy, and step count
The agent is deployed in a 2D space environment (80x80 grid) with randomly distributed asteroids and a central mothership. The agent must:
The environment uses a comprehensive fitness scoring system with a target range of approximately 3000 points, evaluating resource collection, energy management, efficiency, and survival time.
The observation space includes:
The action space is continuous with 3 dimensions:
Medium Difficulty: The environment presents a balanced challenge with:
The environment is optimized for PPO and other modern RL algorithms. Here are the recommended parameters:
# Environment Parameters max_episode_steps: 1200 grid_size: 80 max_asteroids: 12 max_resource_per_asteroid: 40 observation_radius: 15 mining_range: 8.0 max_inventory: 100 # Training Parameters policy: MlpPolicy learning_rate: 0.0003 total_timesteps: 3,000,000 batch_size: 64 n_steps: 2048 gamma: 0.99 gae_lambda: 0.95 clip_range: 0.2
Clone the repository and install dependencies:
git clone https://github.com/Lola-jo/space_mining.git cd space_mining pip install gymnasium numpy pygame stable-baselines3
python test_env.py
python train.py