Dynamic Goal-Based Agent for Warehouse Logistics Optimization

Question

A robotic agent operates in a warehouse modeled as an N×M grid environment. The agent starts at a predefined loading dock and must deliver packages to multiple destinations marked on the grid while avoiding dynamically placed obstacles.

Your Tasks:

Q1. Represent the warehouse as an N×M matrix. Place packages, drop-off points, and obstacles randomly.
Q2. Implement a goal-based agent using a search algorithm (UCS, BFS, or DFS) to deliver all packages.
Q3. Choose a random seed to reproduce results and output the final path, costs, rewards, penalties, and success stats.

2. What is a Goal-Based Agent?

A Goal-Based Agent selects actions by considering possible future outcomes and choosing the one that leads to its goal. It relies on:

Perception of the current state
Defined goals (e.g., deliver packages)
Search and planning (e.g., Uniform Cost Search)

These agents differ from reactive agents as they plan ahead using heuristics or search algorithms.

3. What Are Warehouse Problems in AI?

Warehouse problems simulate real-world logistics, focusing on:

Efficient pathfinding through a constrained grid.
Handling obstacles and variable movement costs.
Delivering items in predefined sequences.
Optimizing for minimum total cost and maximum delivery reward.

Applications include robotics, inventory systems, and smart logistics.

4. Uniform Cost Search (UCS) – Path Planning Strategy

Uniform Cost Search (UCS) is a classic search algorithm used when the goal is to find the least-cost path in a weighted graph. Unlike greedy or depth-first search, UCS always expands the least-cost node first, making it optimal and complete as long as step costs are non-negative.

In this warehouse problem, UCS is ideal because:

Movement has a cost of 1 unit.
Obstacles add an extra cost of 5 units.
The agent must find cost-efficient paths while minimizing penalty.

Key UCS Concepts Applied:

State Space: Each cell in the grid is a node.
Cost Function: Total cost = movement cost + obstacle penalty.
Priority Queue (Min-Heap): Used to always expand the path with the lowest cumulative cost.

5. Thought Process and How to Approach the Problem

To design the solution:

Randomly generate a warehouse grid (N×M) using a seed.
Place packages, drop-off points, and obstacles with no overlaps.
Use a Uniform Cost Search (UCS) algorithm to compute shortest paths:
- Obstacles incur an additional penalty of 5
- Normal movement costs 1 unit
Create a Goal-Based Agent that:
1. Moves to the next pickup point
2. Picks up the package
3. Plans to the respective drop location
4. Delivers it, and repeats until all packages are delivered

6. Assumptions

Grid size randomly chosen: 6 × 8
Number of packages: 6
Number of obstacles: 2
Agent starts at: (0, 0)
Penalty for obstacle: -5 units
Reward for successful delivery: +10 units
Agent uses Uniform Cost Search (UCS)

7. Code Snippet

import random
import heapq

random.seed(54)

def generate_random_warehouse():
    """
    Generates a random warehouse grid and associated parameters,
    labeling:
      - Empty cells as " "
      - Obstacles as "X"
      - Packages as "P1", "P2", ...
      - Drop-off locations as "D1", "D2", ...
    Robot start is always (0,0), which remains " ".

    Returns a dictionary containing:
      - rows, cols
      - grid (2D list of strings)
      - num_packages, num_obstacles
      - package_locations, drop_off_locations
      - robot_start (row, col)

    Conditions:
      1. 5 <= N, M <= 10
      2. 1 <= O <= 10   (number of obstacles)
      3. 2 <= P <= 6    (number of packages)
      4. Packages and drop-offs do not overlap each other or obstacles
      5. Robot start is always (0,0) and remains free
    """

    # 1) Random warehouse dimensions: N x M
    N = random.randint(5, 10)
    M = random.randint(5, 10)

    # Initialize the grid to " " (empty space)
    grid = [[" " for _ in range(M)] for _ in range(N)]

    # Keep track of used positions to avoid overlap
    # Mark (0,0) as used for the robot start so we don't place anything else there
    used_positions = {(0, 0)}


    # 3) Random number of packages
    P = random.randint(2, 6)
    package_locations = []
    drop_off_locations = []


    # Place packages and drop-offs without overlap
    for i in range(P):
        # Find a free cell for the package
        while True:
            r = random.randint(0, N - 1)
            c = random.randint(0, M - 1)
            if (r, c) not in used_positions:
                label = f"P{i+1}"
                grid[r][c] = label
                used_positions.add((r, c))
                package_locations.append((r, c))
                break

        # Find a free cell for the drop-off
        while True:
            r = random.randint(0, N - 1)
            c = random.randint(0, M - 1)
            if (r, c) not in used_positions:
                label = f"D{i+1}"
                grid[r][c] = label
                used_positions.add((r, c))
                drop_off_locations.append((r, c))
                break

    # 2) Random number of obstacles
    O = random.randint(1, 10)

    # Place static obstacles (marked as "X")
    count_obstacles = 0
    obstacle_locations = []
    while count_obstacles < O:
        r = random.randint(0, N - 1)
        c = random.randint(0, M - 1)
        if (r, c) not in used_positions:
            grid[r][c] = "X"
            used_positions.add((r, c))
            obstacle_locations.append((r,c))
            count_obstacles += 1

    # 4) Robot start is always (0,0) => keep as " "
    robot_start = (0, 0)

    return {
        "rows": N,
        "cols": M,
        "grid": grid,
        "num_packages": P,
        "num_obstacles": O,
        "package_locations": package_locations,
        "drop_off_locations": drop_off_locations,
        "obstacle_locations": obstacle_locations,
        "robot_start": robot_start
    }
def show_grid(grid):
    for idx, row in enumerate(grid):
        print(f"{row}")
class GoalBasedAgent:
    def __init__(self, environment):
        self.environment = environment
        self.state = "FIND_PICKUP_POINT"
        self.location = environment["robot_start"]
        self.current_goal =(0,0)
        self.path = []
        self.cost = 0
        self.reward = 0
    
    def perceive(self):
        if self.location in self.environment["obstacle_locations"]:
            return "OBSTACLE"
        elif self.location in self.environment["drop_off_locations"]:
            return "DROP POINT"
        elif self.location in self.environment["package_locations"]:
            return "PICKUP POINT"
        else:
            return "EMPTY"
            
        
    def move(self, new_location):
        self.location = new_location
        self.cost += 1

    def act(self):
        
        percept  = self.perceive()
        if percept == "PICKUP POINT" and self.current_goal == self.location:
            print(f"Package picked from  {self.location}.")
            self.environment["package_locations"].remove(self.location)
            self.state = "FIND_DROP_POINT"
        elif percept == "DROP POINT" and self.current_goal == self.location:
            print(f"Package dropped to {self.location}.")
            self.environment["drop_off_locations"].remove(self.location)
            self.reward = self.reward + 10
            # If we still have packages left, go find another pickup; otherwise, we are done
            if len(self.environment["package_locations"]) > 0:
                self.state = "FIND_PICKUP_POINT"
            else:
                self.state = "COMPLETED"
        elif percept == "OBSTACLE":
            print(f"Hit a obstacle at  {self.location}. Penalty incurred -5")
            self.cost = self.cost + 5
            self.move(self.path[0])
            self.path = self.path[1:]
        else:
            # Skip any leading positions in the path that match the current location
            while len(self.path) > 0 and self.path[0] == self.location:
                self.path.pop(0)  # Remove the first element since it's the same as the current location
            
            # Now, if there's still something in the path, move there
            if len(self.path) > 0:
                print(f"Moving from {self.location} to {self.path[0]}")
                self.move(self.path[0])
                self.path.pop(0)

        self.compute_goal()

    def find_shortest_path(self,start, goal):
        start = tuple(start)
        goal = tuple(goal)
        directions = [(-1, 0), (1, 0), (0, -1), (0, 1)]
        heap = []
        heapq.heappush(heap, (0, [start]))  # Heap element: (cumulative_cost, path)
        visited = {start: 0}

        if start == goal:
            return [start]  # or an empty list, depending on your design

        while heap:
            cost, path = heapq.heappop(heap)
            current = path[-1]
            if current == goal:
                return path, cost
            

            for dx, dy in directions:
                nx, ny = current[0] + dx, current[1] + dy
                if 0 <= nx < self.environment["rows"] and 0 <= ny < self.environment["cols"]:
                    step_cost = 1  # Base movement cost
                    penalty = 0
                    if (nx, ny) in self.environment["obstacle_locations"]:
                        penalty = 5  # Additional penalty for stepping into an obstacle
                    new_cost = cost + step_cost + penalty
                    neighbor = (nx, ny)
                    if neighbor not in visited or new_cost < visited[neighbor]:
                        visited[neighbor] = new_cost
                        heapq.heappush(heap, (new_cost, path + [neighbor]))
        return None, float('inf')
   
        
    def compute_goal(self):
        if self.state == "FIND_PICKUP_POINT":
            start = self.location
            goal = self.environment["package_locations"][0]
            self.path, cost = self.find_shortest_path(start, goal)
            print(f" FIND_PICKUP_POINT: Current Location={start}, Package pickup location={goal}, Path={self.path}, Cost={cost}")
            self.current_goal= goal
            self.state = "MOVE_TO_PICKUP"
        elif self.state == "FIND_DROP_POINT":
            start = self.location
            goal = self.environment["drop_off_locations"][0]
            self.path, cost = self.find_shortest_path(start, goal)
            print(f"FIND_DROP_POINT: Package Location={start}, Package drop location={goal}, Path={self.path}, Cost={cost}")
            self.current_goal= goal
            self.state = "MOVE_TO_DROP"




# Example usage:
if __name__ == "__main__":
    #Q1 : Initial Warehouse Configuration:
    warehouse_data = generate_random_warehouse()
    print("Initial Warehouse Configuration\n")
    show_grid(warehouse_data["grid"])
    print("Grid Size:", warehouse_data["rows"], "x", warehouse_data["cols"])
    print("Number of Obstacles:", warehouse_data["num_obstacles"])
    print("Number of Packages:", warehouse_data["num_packages"])
    print("Package Locations:", warehouse_data["package_locations"])
    print("Drop-Off Locations:", warehouse_data["drop_off_locations"])
    print("Obstacle Locations:", warehouse_data["obstacle_locations"])
    print("Robot Start:", warehouse_data["robot_start"])

    agent =GoalBasedAgent(warehouse_data)

    while(agent.state != "COMPLETED"):
        agent.act()

    print(f"All deliveries are completed with total cost {agent.cost} and reward {agent.reward}")
    print(f"Final score is {agent.reward - agent.cost}")

Key functions:

generate_random_warehouse(seed)
GoalBasedAgent class
UCS implemented in find_shortest_path

🧠 Pathfinding uses a min-heap and accounts for extra obstacle cost.

8. Output

Initial Configuration:

Initial Warehouse Configuration

[' ', ' ', ' ', ' ', ' ', ' ', ' ', 'D3']
[' ', ' ', ' ', 'P3', ' ', ' ', ' ', 'P2']
[' ', ' ', ' ', ' ', ' ', ' ', 'D2', 'P1']
[' ', ' ', ' ', ' ', 'P6', ' ', 'D1', ' ']
['X', 'D4', 'D6', ' ', 'D5', 'P5', ' ', 'P4']
[' ', ' ', ' ', ' ', 'X', ' ', ' ', ' ']
Grid Size: 6 x 8
Number of Obstacles: 2
Number of Packages: 6
Package Locations: [(2, 7), (1, 7), (1, 3), (4, 7), (4, 5), (3, 4)]
Drop-Off Locations: [(3, 6), (2, 6), (0, 7), (4, 1), (4, 4), (4, 2)]
Obstacle Locations: [(5, 4), (4, 0)]
Robot Start: (0, 0)

Execution Steps:

 FIND_PICKUP_POINT: Current Location=(0, 0), Package pickup location=(2, 7), Path=[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (1, 7), (2, 7)], Cost=9
Moving from (0, 0) to (0, 1)
Moving from (0, 1) to (0, 2)
Moving from (0, 2) to (0, 3)
Moving from (0, 3) to (0, 4)
Moving from (0, 4) to (0, 5)
Moving from (0, 5) to (0, 6)
Moving from (0, 6) to (0, 7)
Moving from (0, 7) to (1, 7)
Moving from (1, 7) to (2, 7)
Package picked from  (2, 7).
FIND_DROP_POINT: Package Location=(2, 7), Package drop location=(3, 6), Path=[(2, 7), (2, 6), (3, 6)], Cost=2
Moving from (2, 7) to (2, 6)
Moving from (2, 6) to (3, 6)
Package dropped to (3, 6).
 FIND_PICKUP_POINT: Current Location=(3, 6), Package pickup location=(1, 7), Path=[(3, 6), (2, 6), (1, 6), (1, 7)], Cost=3
Moving from (3, 6) to (2, 6)
Moving from (2, 6) to (1, 6)
Moving from (1, 6) to (1, 7)
Package picked from  (1, 7).
FIND_DROP_POINT: Package Location=(1, 7), Package drop location=(2, 6), Path=[(1, 7), (1, 6), (2, 6)], Cost=2
Moving from (1, 7) to (1, 6)
Moving from (1, 6) to (2, 6)
Package dropped to (2, 6).
 FIND_PICKUP_POINT: Current Location=(2, 6), Package pickup location=(1, 3), Path=[(2, 6), (1, 6), (1, 5), (1, 4), (1, 3)], Cost=4
Moving from (2, 6) to (1, 6)
Moving from (1, 6) to (1, 5)
Moving from (1, 5) to (1, 4)
Moving from (1, 4) to (1, 3)
Package picked from  (1, 3).
FIND_DROP_POINT: Package Location=(1, 3), Package drop location=(0, 7), Path=[(1, 3), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7)], Cost=5
Moving from (1, 3) to (0, 3)
Moving from (0, 3) to (0, 4)
Moving from (0, 4) to (0, 5)
Moving from (0, 5) to (0, 6)
Moving from (0, 6) to (0, 7)
Package dropped to (0, 7).
 FIND_PICKUP_POINT: Current Location=(0, 7), Package pickup location=(4, 7), Path=[(0, 7), (1, 7), (2, 7), (3, 7), (4, 7)], Cost=4
Moving from (0, 7) to (1, 7)
Moving from (1, 7) to (2, 7)
Moving from (2, 7) to (3, 7)
Moving from (3, 7) to (4, 7)
Package picked from  (4, 7).
FIND_DROP_POINT: Package Location=(4, 7), Package drop location=(4, 1), Path=[(4, 7), (4, 6), (4, 5), (4, 4), (4, 3), (4, 2), (4, 1)], Cost=6
Moving from (4, 7) to (4, 6)
Moving from (4, 6) to (4, 5)
Moving from (4, 5) to (4, 4)
Moving from (4, 4) to (4, 3)
Moving from (4, 3) to (4, 2)
Moving from (4, 2) to (4, 1)
Package dropped to (4, 1).
 FIND_PICKUP_POINT: Current Location=(4, 1), Package pickup location=(4, 5), Path=[(4, 1), (4, 2), (4, 3), (4, 4), (4, 5)], Cost=4
Moving from (4, 1) to (4, 2)
Moving from (4, 2) to (4, 3)
Moving from (4, 3) to (4, 4)
Moving from (4, 4) to (4, 5)
Package picked from  (4, 5).
FIND_DROP_POINT: Package Location=(4, 5), Package drop location=(4, 4), Path=[(4, 5), (4, 4)], Cost=1
Moving from (4, 5) to (4, 4)
Package dropped to (4, 4).
 FIND_PICKUP_POINT: Current Location=(4, 4), Package pickup location=(3, 4), Path=[(4, 4), (3, 4)], Cost=1
Moving from (4, 4) to (3, 4)
Package picked from  (3, 4).
FIND_DROP_POINT: Package Location=(3, 4), Package drop location=(4, 2), Path=[(3, 4), (3, 3), (3, 2), (4, 2)], Cost=3
Moving from (3, 4) to (3, 3)
Moving from (3, 3) to (3, 2)
Moving from (3, 2) to (4, 2)
Package dropped to (4, 2).

Final Metrics:

All deliveries are completed with total cost 44 and reward 60
Final score is 16

✅ All deliveries completed  
🟢 Total Cost: 44  
💰 Total Reward: 60  
🏁 Final Score: 16

9. Future Work

Introduce dynamic obstacles that move over time.
Implement A or Bidirectional Search* for better efficiency.
Optimize for multiple agents working simultaneously.
Add priority levels to packages.
Use reinforcement learning to learn obstacle avoidance and path planning.

GitHub link: Dynamic Goal-Based Agent for Warehouse Logistics Optimization.py

Comment and let me know what can be improved in this approach!

GeekJuggernaut