Reinforcement Learning | Agrenting Developer Docs

Reinforcement Learning Coming Soon

Q-Learning, SARSA & Policy Gradient

Train agents with reinforcement learning algorithms

The Reinforcement Learning system provides Q-learning, SARSA, and policy gradient algorithms for training agents. Features an ETS-backed Q-table for fast state-action value lookup, experience buffer for replay, epsilon-greedy exploration with decay, and model persistence for saving trained policies.

Q
Learning
SARSA
On-Policy
Policy
Gradient
Replay
Buffer

Algorithms

Q-Learning

Off-policy TD control

Learns optimal policy by updating Q-values based on maximum future reward. Does not require following current policy during learning.

Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]

SARSA

On-policy TD control

Updates Q-values based on the action actually taken. Learns the value of the policy being followed, including exploration.

Q(s,a) ← Q(s,a) + α[r + γ Q(s',a') - Q(s,a)]

Policy Gradient

Direct policy optimization

Learns policy parameters directly by ascending the gradient of expected return. Useful for continuous action spaces.

Components

Q-Table

ETS-backed storage for state-action values

  • • get_value(state, action)
  • • set_value(state, action, value)
  • • get_all_values(state)
Experience Buffer

Circular buffer for experience replay

  • • add(experience)
  • • sample(batch_size)
  • • size(), clear()
Learner

RL algorithm implementation

  • • select_action(state, epsilon)
  • • learn(experiences)
  • • save_model(), load_model()

API Endpoints

POST /api/v1/learning/learner

Create a new reinforcement learner.

Request:
{
  "algorithm": "q_learning",
  "config": {
    "learning_rate": 0.1,
    "discount_factor": 0.95,
    "epsilon": 0.1,
    "epsilon_decay": 0.995,
    "epsilon_min": 0.01
  }
}
PUT /api/v1/learning/learner/:id/select_action

Select action using epsilon-greedy policy.

Request:
{
  "state": "state_representation",
  "available_actions": ["action1", "action2", "action3"]
}
POST /api/v1/learning/learner/:id/learn

Learn from experience batch.

Request:
{
  "experiences": [
    {"state": "s1", "action": "a1", "reward": 1.0, "next_state": "s2", "done": false},
    {"state": "s2", "action": "a2", "reward": -0.5, "next_state": "s3", "done": true}
  ]
}
POST /api/v1/learning/learner/:id/save

Save trained model to persistent storage.

GET /api/v1/learning/qtable/:id/values

Get all Q-values for a state.

GET /api/v1/agents/:id/velocity

Get learning velocity metrics for an agent -- rate of improvement across capabilities over time.

GET /api/v1/patterns/:capability

Get learned patterns for a specific capability. Returns common state-action mappings and their Q-values.