我们继续上一篇文章pybullet杂谈 :使用深度学习拟合相机坐标系与世界坐标系坐标变换关系(一),在上一篇文章中,我们完成了物体世界坐标系和相机坐标系的坐标对应关系的数据,以及数据的存取和物体轮廓中心在相机坐标系中的识别等任务,今天的任务就是使用世界坐标系和相机坐标系的坐标数据,通过神经网络来拟合两个坐标系之间的变换关系。 首先要思考以下这个神经网络结构该怎么设计? 如果把这个神经网络用盒子来代替,进入盒子的数据是物体相机坐标系的坐标数据(x_camera, y_camera),盒子输出的数据是物体世界坐标系的坐标数据(x_predict, y_predict),即神经网络预测的世界坐标系的坐标数据,那么损失函数就非常简单了,loss=sum[(x_predict – x_real)^2 + (y_predict – y_real)^2]。所以,网络架构很自然的就出来了,一个2进2出的多层感知机。 网络结构部分的代码为: 其中用到了网络参数的归一化操作,可以提高训练过程的稳定性。在网络层数方面,多层和2层或3层的效果差别不是很大,可以做实验试一下,也可以使用ray的tune组件和pytorch结合进行超参数的寻优,找到最好的网络层数。
class Net(nn.Module):
def __init__(self, ):
super(Net, self).__init__()
self.fc0 = nn.Linear(2, 125)
self.fc0.weight.data.normal_(0, 0.1)
self.fc1 = nn.Linear(125, 75)
self.fc1.weight.data.normal_(0, 0.1) # initialization
self.fc2 = nn.Linear(75, 50)
self.fc2.weight.data.normal_(0, 0.1)
self.fc3 = nn.Linear(50, 30)
self.fc3.weight.data.normal_(0, 0.1)
self.fc4 = nn.Linear(30, 10)
self.fc4.weight.data.normal_(0, 0.1)
self.fc5 = nn.Linear(10, 5)
self.fc5.weight.data.normal_(0, 0.1)
self.out = nn.Linear(5, 2)
self.out.weight.data.normal_(0, 0.1) # initialization
def forward(self, x):
x = F.relu(self.fc0(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
real_value = self.out(x)
return real_value
网络结构完成之后,然后对采集数据的处理。这里简单的将数据分成两份:70%的训练集和30%的测试集。
camera_coor_file=open('camera_coor_saved_file','rb')
camera_coor=pickle.load(camera_coor_file)
camera_coor_file.close()
real_coor_file=open('real_coor_saved_file','rb')
real_coor=pickle.load(real_coor_file)
real_coor_file.close()
device=torch.device("cpu" if torch.cuda.is_available() else "cpu")
camera_coor_tensor=torch.tensor(camera_coor,dtype=torch.float32).to(device)
real_coor_tensor=torch.tensor(real_coor,dtype=torch.float32).to(device)
TRAIN_LENGTH=int(len(camera_coor_tensor)*0.7)
camera_coor_tensor_for_train=camera_coor_tensor[:TRAIN_LENGTH]
real_coor_tensor_for_train=real_coor_tensor[:TRAIN_LENGTH]
最后设置损失函数,训练过程等完成训练。
model =Net().to(device)
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-6
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
train_coor.py文件所有代码如下:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 12 15:44:02 2021
@author: dell
"""
import pickle
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
camera_coor_file=open('camera_coor_saved_file','rb')
camera_coor=pickle.load(camera_coor_file)
camera_coor_file.close()
real_coor_file=open('real_coor_saved_file','rb')
real_coor=pickle.load(real_coor_file)
real_coor_file.close()
device=torch.device("cpu" if torch.cuda.is_available() else "cpu")
camera_coor_tensor=torch.tensor(camera_coor,dtype=torch.float32).to(device)
real_coor_tensor=torch.tensor(real_coor,dtype=torch.float32).to(device)
TRAIN_LENGTH=int(len(camera_coor_tensor)*0.7)
camera_coor_tensor_for_train=camera_coor_tensor[:TRAIN_LENGTH]
real_coor_tensor_for_train=real_coor_tensor[:TRAIN_LENGTH]
class Net(nn.Module):
def __init__(self, ):
super(Net, self).__init__()
self.fc0 = nn.Linear(2, 125)
self.fc0.weight.data.normal_(0, 0.1)
self.fc1 = nn.Linear(125, 75)
self.fc1.weight.data.normal_(0, 0.1) # initialization
self.fc2 = nn.Linear(75, 50)
self.fc2.weight.data.normal_(0, 0.1)
self.fc3 = nn.Linear(50, 30)
self.fc3.weight.data.normal_(0, 0.1)
self.fc4 = nn.Linear(30, 10)
self.fc4.weight.data.normal_(0, 0.1)
self.fc5 = nn.Linear(10, 5)
self.fc5.weight.data.normal_(0, 0.1)
self.out = nn.Linear(5, 2)
self.out.weight.data.normal_(0, 0.1) # initialization
def forward(self, x):
x = F.relu(self.fc0(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
real_value = self.out(x)
return real_value
model =Net().to(device)
loss_fn = torch.nn.MSELoss(reduction='sum')
learning_rate = 1e-6
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)
for t in range(1500000):
# Forward pass: compute predicted y by passing x to the model.
real_pred=model(camera_coor_tensor_for_train)
# Compute and print loss.
loss = loss_fn(real_pred, real_coor_tensor_for_train)
if t % 100 == 99:
print(t, loss.item())
# Before the backward pass, use the optimizer object to zero all of the
# gradients for the variables it will update (which are the learnable
# weights of the model). This is because by default, gradients are
# accumulated in buffers( i.e, not overwritten) whenever .backward()
# is called. Checkout docs of torch.autograd.backward for more details.
optimizer.zero_grad()
# Backward pass: compute gradient of the loss with respect to model
# parameters
loss.backward()
# Calling the step function on an Optimizer makes an update to its
# parameters
optimizer.step()
# Specify a path
PATH = "state_dict_model.pt"
# Save
torch.save(model.state_dict(), PATH)
# Load
# =============================================================================
# model = Net()
# model.load_state_dict(torch.load(PATH))
# model.eval()
# =============================================================================
训练完成后将模型保存,然后将模型读取,进行推理。 推理阶段的代码和训练过程基本一致,流程就是读取数据、数据分割、编写网络结构、读取模型、进行推理。 eval_coor.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Jan 20 11:03:05 2021
@author: dell
"""
import pickle
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import time
camera_coor_file=open('camera_coor_saved_file','rb')
camera_coor=pickle.load(camera_coor_file)
camera_coor_file.close()
real_coor_file=open('real_coor_saved_file','rb')
real_coor=pickle.load(real_coor_file)
real_coor_file.close()
device=torch.device("cpu" if torch.cuda.is_available() else "cpu")
camera_coor_tensor=torch.tensor(camera_coor,dtype=torch.float32).to(device)
real_coor_tensor=torch.tensor(real_coor,dtype=torch.float32).to(device)
TRAIN_LENGTH=int(len(camera_coor_tensor)*0.7)
camera_coor_tensor_for_eval=camera_coor_tensor[TRAIN_LENGTH:]
real_coor_tensor_for_eval=real_coor_tensor[TRAIN_LENGTH:]
class Net(nn.Module):
def __init__(self, ):
super(Net, self).__init__()
self.fc0 = nn.Linear(2, 125)
self.fc0.weight.data.normal_(0, 0.1)
self.fc1 = nn.Linear(125, 75)
self.fc1.weight.data.normal_(0, 0.1) # initialization
self.fc2 = nn.Linear(75, 50)
self.fc2.weight.data.normal_(0, 0.1)
self.fc3 = nn.Linear(50, 30)
self.fc3.weight.data.normal_(0, 0.1)
self.fc4 = nn.Linear(30, 10)
self.fc4.weight.data.normal_(0, 0.1)
self.fc5 = nn.Linear(10, 5)
self.fc5.weight.data.normal_(0, 0.1)
self.out = nn.Linear(5, 2)
self.out.weight.data.normal_(0, 0.1) # initialization
def forward(self, x):
x = F.relu(self.fc0(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.relu(self.fc5(x))
real_value = self.out(x)
return real_value
model = Net()
PATH = "state_dict_model.pt"
loss_fn = torch.nn.MSELoss()
model.load_state_dict(torch.load(PATH))
model.eval()
with torch.no_grad():
# =============================================================================
# input=camera_coor_tensor_for_eval
# target=real_coor_tensor_for_eval
# output=model(input)
# loss=loss_fn(output,target)
# print(loss)
# =============================================================================
for i in range(len(camera_coor_tensor_for_eval)):
input=camera_coor_tensor_for_eval[i]
target=real_coor_tensor_for_eval[i]
output=model(input)
loss=loss_fn(output,target)
print('real_coor=',real_coor_tensor_for_eval[i])
print('output=',output)
print(loss)
time.sleep(2)
运行推理程序,即可观察训练的效果。总体来说,精度还是不错的,如果想要更加高的精度,一个是在相机的可视区域内获取尽可能多的数据,二十使用ray的tune结合pytorch进行超参数的最优调整,比图网络的层数、学习率、优化函数等。
猜你想看: pybullet杂谈 :使用深度学习拟合相机坐标系与世界坐标系坐标变换关系(一)