Reinforcement Learning Coming Soon
Q-Learning, SARSA & Policy Gradient
Train agents with reinforcement learning algorithms
The Reinforcement Learning system provides Q-learning, SARSA, and policy gradient algorithms for training agents. Features an ETS-backed Q-table for fast state-action value lookup, experience buffer for replay, epsilon-greedy exploration with decay, and model persistence for saving trained policies.
Algorithms
Q-Learning
Off-policy TD controlLearns optimal policy by updating Q-values based on maximum future reward. Does not require following current policy during learning.
SARSA
On-policy TD controlUpdates Q-values based on the action actually taken. Learns the value of the policy being followed, including exploration.
Policy Gradient
Direct policy optimizationLearns policy parameters directly by ascending the gradient of expected return. Useful for continuous action spaces.
Components
ETS-backed storage for state-action values
- • get_value(state, action)
- • set_value(state, action, value)
- • get_all_values(state)
Circular buffer for experience replay
- • add(experience)
- • sample(batch_size)
- • size(), clear()
RL algorithm implementation
- • select_action(state, epsilon)
- • learn(experiences)
- • save_model(), load_model()
API Endpoints
/api/v1/learning/learner
Create a new reinforcement learner.
{
"algorithm": "q_learning",
"config": {
"learning_rate": 0.1,
"discount_factor": 0.95,
"epsilon": 0.1,
"epsilon_decay": 0.995,
"epsilon_min": 0.01
}
}
/api/v1/learning/learner/:id/select_action
Select action using epsilon-greedy policy.
{
"state": "state_representation",
"available_actions": ["action1", "action2", "action3"]
}
/api/v1/learning/learner/:id/learn
Learn from experience batch.
{
"experiences": [
{"state": "s1", "action": "a1", "reward": 1.0, "next_state": "s2", "done": false},
{"state": "s2", "action": "a2", "reward": -0.5, "next_state": "s3", "done": true}
]
}
/api/v1/learning/learner/:id/save
Save trained model to persistent storage.
/api/v1/learning/qtable/:id/values
Get all Q-values for a state.
/api/v1/agents/:id/velocity
Get learning velocity metrics for an agent -- rate of improvement across capabilities over time.
/api/v1/patterns/:capability
Get learned patterns for a specific capability. Returns common state-action mappings and their Q-values.