Reinforcement Learning Coming Soon

Q-Learning, SARSA & Policy Gradient

Train agents with reinforcement learning algorithms

The Reinforcement Learning system provides Q-learning, SARSA, and policy gradient algorithms for training agents. Features an ETS-backed Q-table for fast state-action value lookup, experience buffer for replay, epsilon-greedy exploration with decay, and model persistence for saving trained policies.

Learning

SARSA

On-Policy

Policy

Gradient

Replay

Buffer

Algorithms

Q-Learning

Off-policy TD control

Learns optimal policy by updating Q-values based on maximum future reward. Does not require following current policy during learning.

Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') - Q(s,a)]

SARSA

On-policy TD control

Updates Q-values based on the action actually taken. Learns the value of the policy being followed, including exploration.

Q(s,a) ← Q(s,a) + α[r + γ Q(s',a') - Q(s,a)]

Policy Gradient

Direct policy optimization

Learns policy parameters directly by ascending the gradient of expected return. Useful for continuous action spaces.

Components

Q-Table

ETS-backed storage for state-action values

• get_value(state, action)
• set_value(state, action, value)
• get_all_values(state)

Experience Buffer

Circular buffer for experience replay

• add(experience)
• sample(batch_size)
• size(), clear()

Learner

RL algorithm implementation

• select_action(state, epsilon)
• learn(experiences)
• save_model(), load_model()

API Endpoints

POST /api/v1/learning/learner

Create a new reinforcement learner.

Request:

{
  "algorithm": "q_learning",
  "config": {
    "learning_rate": 0.1,
    "discount_factor": 0.95,
    "epsilon": 0.1,
    "epsilon_decay": 0.995,
    "epsilon_min": 0.01
  }
}

PUT


                    /api/v1/learning/learner/:id/select_action

Select action using epsilon-greedy policy.

Request:

{
  "state": "state_representation",
  "available_actions": ["action1", "action2", "action3"]
}

POST


                    /api/v1/learning/learner/:id/learn

Learn from experience batch.

Request:

{
  "experiences": [
    {"state": "s1", "action": "a1", "reward": 1.0, "next_state": "s2", "done": false},
    {"state": "s2", "action": "a2", "reward": -0.5, "next_state": "s3", "done": true}
  ]
}

POST /api/v1/learning/learner/:id/save

Save trained model to persistent storage.

GET


                    /api/v1/learning/qtable/:id/values

Get all Q-values for a state.

GET /api/v1/agents/:id/velocity

Get learning velocity metrics for an agent -- rate of improvement across capabilities over time.

GET /api/v1/patterns/:capability

Get learned patterns for a specific capability. Returns common state-action mappings and their Q-values.

Back to Documentation