ns3gym中的multi-agent官方示例

一、运行

把contrib/opengym/examples文件夹下的multi-agent文件夹复制到scratch目录下
不复制也可以,但是输入路径比较麻烦
所以我都是统一把需要运行的脚本复制到scratch下

直接运行:

1
2
3
4
5
6
7
8
9
10
# 终端一
sudo ./waf --run multi-agent

# 终端二
cd scratch/multi-agent
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python ./agent1.py --start=0

# 终端三
cd scratch/multi-agent
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python ./agent2.py --start=0

这里会报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
In file included from ../scratch/multi-agent/mygym.cc:21:
../scratch/multi-agent/mygym.h:35:26: error: ‘Time’ has not been declared
35 | MyGymEnv (uint32_t id, Time stepTime);
| ^~~~
../scratch/multi-agent/mygym.h:51:3: error: ‘Time’ does not name a type; did you mean ‘time’?
51 | Time m_interval;
| ^~~~
| time
../scratch/multi-agent/mygym.cc: In constructor ‘ns3::MyGymEnv::MyGymEnv()’:
../scratch/multi-agent/mygym.cc:39:3: error: ‘m_interval’ was not declared in this scope; did you mean ‘internal’?
39 | m_interval = Seconds(0.1);
| ^~~~~~~~~~
| internal

经过一番查询,是mygym.h里面少了头文件,补充上就可以正常运行了:

1
#include "ns3/core-module.h"

运行结果可以在每个终端看到不同的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# sim.cc的运行终端,即环境:
AgendId: 1 MyGetActionSpace: DictSpace:
---box: BoxSpace Low: 0 High: 10 Shape: (5,) Dtype: uint32_t
---discrete: DiscreteSpace N: 5

Simulation process id: 3244537 (parent (waf shell) id: 3244274)
Waiting for Python process to connect on port: tcp://localhost:5555
Please start proper Python Gym Agent
AgendId: 1 MyGetObservation: Tuple([8, 6, 5, 4, 4], 8)
---[8, 6, 5, 4, 4]
---8
AgendId: 1 MyGetGameOver: 0
AgendId: 1 MyGetExtraInfo: testInfo|123
AgendId: 1 MyExecuteActions: Dict(box=[0, 2, 4, 0, 3], discrete=4)
---[0, 2, 4, 0, 3]
---4
AgendId: 2 MyGetObservationSpace: DictSpace:
---box: BoxSpace Low: 0 High: 10 Shape: (5,) Dtype: uint32_t
---discrete: DiscreteSpace N: 5

AgendId: 2 MyGetActionSpace: DictSpace:
---box: BoxSpace Low: 0 High: 10 Shape: (5,) Dtype: uint32_t
---discrete: DiscreteSpace N: 5

# agent1
Step: 0
---obs: ([8, 6, 5, 4, 4], 8)
---action: OrderedDict([('box', array([0, 2, 4, 0, 3], dtype=uint64)), ('discrete', 4)])

# agent2
Step: 0
---obs: ([9, 5, 4, 0, 4], 10)
---action: OrderedDict([('box', array([3, 7, 1, 9, 4], dtype=uint64)), ('discrete', 4)])

二、multi-agent的交互逻辑

官方给的示例非常简单,不涉及神经网络的训练
所以很方便观察多个agent与环境之间的交互

sim.cc

agents是共享一个环境,所以只需要进行一次环境的初始化,然后分别对agents的接口进行初始化,不同的agent的接口号不一样:

1
2
3
4
5
6
7
8
9
10
11
12
13
// OpenGym Env for agent 1
uint32_t agentId = 1;
openGymPort = 5555;
Ptr<OpenGymInterface> openGymInterface1 = CreateObject<OpenGymInterface> (openGymPort);
Ptr<MyGymEnv> myGymEnv1 = CreateObject<MyGymEnv> (agentId, Seconds(envStepTime));
myGymEnv1->SetOpenGymInterface(openGymInterface1);

// OpenGym Env for agent 2
agentId = 2;
openGymPort = 5556;
Ptr<OpenGymInterface> openGymInterface2 = CreateObject<OpenGymInterface> (openGymPort);
Ptr<MyGymEnv> myGymEnv2 = CreateObject<MyGymEnv> (agentId, Seconds(envStepTime));
myGymEnv2->SetOpenGymInterface(openGymInterface2);
mygym.cc

这里针对不同agent用m_agentId进行了区分,相关代码直接搜索这个关键词就可以看到与其他单智能体示例不同的部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
// 环境接口初始化
MyGymEnv::MyGymEnv (uint32_t id, Time stepTime)
{
NS_LOG_FUNCTION (this);
m_agentId = id;
m_interval = stepTime;

Simulator::Schedule (Seconds(0.0), &MyGymEnv::ScheduleNextStateRead, this);
}

// 定义状态空间、动作空间、结束条件
// 没有什么改变,这里省略

//获取状态。状态由一个数组和一个数字组成,都是随机生成的
Ptr<OpenGymDataContainer>
MyGymEnv::GetObservation()
{
uint32_t nodeNum = 5;
uint32_t low = 0.0;
uint32_t high = 10.0;
Ptr<UniformRandomVariable> rngInt = CreateObject<UniformRandomVariable> ();

std::vector<uint32_t> shape = {nodeNum,};
Ptr<OpenGymBoxContainer<uint32_t> > box = CreateObject<OpenGymBoxContainer<uint32_t> >(shape);

// generate random data
for (uint32_t i = 0; i<nodeNum; i++){
uint32_t value = rngInt->GetInteger(low, high);
box->AddValue(value);
}

Ptr<OpenGymDiscreteContainer> discrete = CreateObject<OpenGymDiscreteContainer>(nodeNum);
uint32_t value = rngInt->GetInteger(low, high);
discrete->SetValue(value);

Ptr<OpenGymTupleContainer> data = CreateObject<OpenGymTupleContainer> ();
data->Add(box);
data->Add(discrete);

// Print data from tuple
Ptr<OpenGymBoxContainer<uint32_t> > mbox = DynamicCast<OpenGymBoxContainer<uint32_t> >(data->Get(0));
Ptr<OpenGymDiscreteContainer> mdiscrete = DynamicCast<OpenGymDiscreteContainer>(data->Get(1));
NS_LOG_UNCOND ("AgendId: "<< m_agentId << " MyGetObservation: " << data);
NS_LOG_UNCOND ("---" << mbox);
NS_LOG_UNCOND ("---" << mdiscrete);

return data;
}

//执行动作,这里的动作已经由agent产生了,所以这里只是起到一个解析的作用
//假设动作的类型是不一样的,全部包裹在OpenGymDataContainer里面,要解析成原来的box、discrete形式
bool
MyGymEnv::ExecuteActions(Ptr<OpenGymDataContainer> action)
{
Ptr<OpenGymDictContainer> dict = DynamicCast<OpenGymDictContainer>(action);
Ptr<OpenGymBoxContainer<uint32_t> > box = DynamicCast<OpenGymBoxContainer<uint32_t> >(dict->Get("box"));
Ptr<OpenGymDiscreteContainer> discrete = DynamicCast<OpenGymDiscreteContainer>(dict->Get("discrete"));

NS_LOG_UNCOND ("AgendId: "<< m_agentId << " MyExecuteActions: " << action);
NS_LOG_UNCOND ("---" << box);
NS_LOG_UNCOND ("---" << discrete);
return true;
}
mygym.h
1
2
3
4
5
6
// 跟随mygym.cc改变,主要是增加了agent的id
public:
MyGymEnv (uint32_t id, Time stepTime);

private:
uint32_t m_agentId;
agent1.py\agent2.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
try:
while True:
obs = env.reset()
print("Step: ", stepIdx)
print("---obs: ", obs)

while True:
stepIdx += 1
# 这里的动作生成很简单,只是在给定的action_space里面随机采样
action = env.action_space.sample()
print("---action: ", action)

print("Step: ", stepIdx)
obs, reward, done, info = env.step(action)
print("---obs, reward, done, info: ", obs, reward, done, info)


# 设置了一个Input,需要在相应的终端上输入回车键才继续
input("press enter....")

if done:
break

currIt += 1
if currIt == iterationNum:
break

三、总结

在这个示例里面,两个智能体之间没有交互,而且它们与环境的交互始终具有顺序依赖性

比如说agent1交互以后,提示必须输入回车键,才能让agent2与环境进行交互

如果注释掉input(“press enter….”)这一行,它们与环境的交互就是同时进行的,只是在终端打印出来的信息还是agent1在前

但是实际的多智能体要复杂很多,尤其是共享环境这一块,状态、动作、奖励都会相互影响


ns3gym中的multi-agent官方示例
http://example.com/2025/09/30/ns3gym中的multi-agent/
作者
Zoe Jiang
发布于
2025年9月30日
许可协议