【ATU Book-i.MX8系列 - TFLite 进阶】手骨识别应用

一. 概述

手骨识别应用(Hand Skeleton Detection) 或是 手部特征侦测(Hand Landmarks Detection) 是深度学习热门的研究项目之一。主要用途是让机器定位出手部的关节位置，并将各个节点连接起来。能够广泛应用于手势识别中，像是利用手势来操作萤幕，或是操作仪器时判断手部姿势是否正确。最常见的莫过于 Hand Landmarks 模组架构，也是利用轻量化网路架构 MobileNet 作主干，来达到模组轻量化的目的。

若新读者欲理解更多人工智能、机器学习以及深度学习的资讯，可点选查阅下方博文
大大通精彩博文 【ATU Book-i.MX8系列】博文索引

TensorFlow Lite 进阶系列博文-文章架构示意图

二. 算法介绍

MobileNet 神经网路架构 :

此架构仍是使用 MobileNet 作为骨干核心，其概念是利用拆分的概念，将原本的卷积层拆成 深度卷积(Depthwise Convolution) 与 逐点卷积(Pointwise Convolution) 两个部分，称作深层可分离卷积(Depthwise Separable Convolution) 。以此方式进行运算，能够大幅度减少参数量，以达到加快运算速度。(用途撷取特征)

MobileNet 轻量化概念示意图
图文来源 - 参考 LaptrihnX 网站

Hand Skeleton Detection

此技术大致上可以分成 2D 与 3D 特征点预测。简单来说，前者指得就是考虑平面的关系，后者则必须考虑到空间的影响。也就是当摆出 C 形状的手势时，平面概念可能容易识别不到手势，若以空间概念去训练模组的话，识别手势的机率也会相应增加。因此有些模组或是算法，会利用时间维度上来增加识别率，但也扩大输入资料。因此，Google 团队提出了一套 2.5D 预测方式，以手腕处作为中心点来计算各个关节点的相对深度，近而达到类似 3D 效果，且仅需输入单一影像即可，其概念如下图，

Hand Skeleton Detection 概念示意图
图文来源 - Paper

同时，该团队透过 3 万张真实世界的图片训练，来预测 21 个关节点的相对深度，如下图。

Hand Skeleton Detection各节点示意图
图文来源 - 参考MediaPipe 网站

二. 算法介绍

Google 官方提供效果极佳的 Hand landmarks detection guide for Pytho 范例与 Hand Landmark 模组实现，读者可直接依 MediaPiepe 的作法实现。而此范例将延用该模组，并量化为整数运算呈现。

实现步骤如下:

第一步 : 开启 Colab 设定环境

%tensorflow_version 2.x

第二步 : 下载转换套件

!pip install tf2onnx
!pip install onnx-tf==1.9.0
!pip install onnx==1.9.0

第三步 : TensorFlow Lite 转换 ONNX

! python -m tf2onnx.convert --opset 11 --tflite /root/hand_landmark.tflite --output /root/hand_landmark.onnx

第四步 : ONNX 转换 SavedModel

! onnx-tf convert -i /root/hand_landmark.onnx -o /root/hand_landmark

第五步 : TensorFlow Lite 转换

import tensorflow as tf
import numpy as np 
def representative_dataset_gen(): 
    for _ in range(250):
        yield [np.random.uniform(0.0, 1.0, size=(1, 256, 256, 3)).astype(np.float32)] 

model = tf.saved_model.load("/root/hand_landmark")
concrete_func = model.signatures["serving_default"]
concrete_func.inputs[0].set_shape([1, 256, 256, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type  = tf.float32
converter.inference_output_type = tf.float32
converter.representative_dataset = representative_dataset_gen
tflite_model = converter.convert()
with open("/root/handskeleton_qunat_new.tflite",'wb') as f:
    f.write(tflite_model)

第六步 : Hand Skeleton Detection 范例实现 (于 i.MX8M Plus 撰写运行)

import sys

import cv2
import time
import argparse
import numpy as np
from tflite_runtime.interpreter import Interpreter

interpreter = Interpreter(model_path='/root/handskeleton_qunat_new.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
width = input_details[0]['shape'][2]
height = input_details[0]['shape'][1]
nChannel = input_details[0]['shape'][3]

frame = cv2.imread("/root/hand.jpg")
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)
input_data = input_data.astype('float32')
input_data = input_data /255
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter_time_start = time.time()
interpreter.invoke()
interpreter_time_end = time.time()
print("Inference Time = ", (interpreter_time_end - interpreter_time_start)*1000 , " ms" )

feature = interpreter.get_tensor(output_details[0]['index'])[0].reshape(21, 3)
hand_detected = interpreter.get_tensor(output_details[1]['index'])[0]

# 建立输出结果 - 特征位置
Px = []
Py = []
size_rate = [frame.shape[1]/width, frame.shape[0]/height]
for pt in feature:
x = int(pt[0]*size_rate[0])
y = int(pt[1]*size_rate[1])
Px.append(x)
Py.append(y)

# 建立输出结果
if (hand_detected) :
# 拇指
cv2.line(frame, (Px[0], Py[0]) , (Px[1], Py[1]) , (0, 255, 0), 3)
cv2.line(frame, (Px[1], Py[1]) , (Px[2], Py[2]) , (0, 255, 0), 3)
cv2.line(frame, (Px[2], Py[2]) , (Px[3], Py[3]) , (0, 255, 0), 3)
cv2.line(frame, (Px[3], Py[3]) , (Px[4], Py[4]) , (0, 255, 0), 3)

# 食指
cv2.line(frame, (Px[0], Py[0]) , (Px[5], Py[5]) , (0, 255, 0), 3)
cv2.line(frame, (Px[5], Py[5]) , (Px[6], Py[6]) , (0, 255, 0), 3)
cv2.line(frame, (Px[6], Py[6]) , (Px[7], Py[7]) , (0, 255, 0), 3)
cv2.line(frame, (Px[7], Py[7]) , (Px[8], Py[8]) , (0, 255, 0), 3)
# 中指
cv2.line(frame, (Px[5], Py[5]) , (Px[9], Py[9]) , (0, 255, 0), 3)
cv2.line(frame, (Px[9], Py[9]) , (Px[10], Py[10]) , (0, 255, 0), 3)
cv2.line(frame, (Px[10], Py[10]) , (Px[11], Py[11]) , (0, 255, 0), 3)
cv2.line(frame, (Px[11], Py[11]) , (Px[12], Py[12]) , (0, 255, 0), 3)

# 无名指
cv2.line(frame, (Px[9], Py[9]) , (Px[13], Py[13]) , (0, 255, 0), 3)
cv2.line(frame, (Px[13], Py[13]) , (Px[14], Py[14]) , (0, 255, 0), 3)
cv2.line(frame, (Px[14], Py[14]) , (Px[15], Py[15]) , (0, 255, 0), 3)
cv2.line(frame, (Px[15], Py[15]) , (Px[16], Py[16]) , (0, 255, 0), 3)

# 小指
cv2.line(frame, (Px[13], Py[13]) , (Px[17], Py[17]) , (0, 255, 0), 3)
cv2.line(frame, (Px[17], Py[17]) , (Px[18], Py[18]) , (0, 255, 0), 3)
cv2.line(frame, (Px[18], Py[18]) , (Px[19], Py[19]) , (0, 255, 0), 3)
cv2.line(frame, (Px[19], Py[19]) , (Px[20], Py[20]) , (0, 255, 0), 3)
cv2.line(frame, (Px[17], Py[17]) , (Px[0], Py[0]) , (0, 255, 0), 3)

#指节
for i in range(len(Px)):
cv2.circle(frame, ( Px[i] , Py[i] ), 1, (0, 0, 255), 4)

import matplotlib.pyplot as plt
plt.imshow(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

Hand Skeleton Detection 实现结果呈现

如下图所示，成功将图片转换侦测到各个手部关节的位置。
在 i.MX8M Plus 的 NPU 处理器，推理时间(Inference Time) 约 12.69 ms。

四. 结语

手骨识别应用 (Hand Skeleton Detection) 通常需要搭配 手部侦测(Hand Detection) 来作应用。也就是侦测到手部的位置后，将局部会特征交付给手骨识别模组进行特征提取，才能将准确度应用最大化。最后利用所检测到的 21 个手骨关节位置来作后续的判断机制，即可以实现手势操作等等应用。目前运行在 i.MX8MP 的 Vivante VIP8000 NPU，其推理时间可达每秒 12.69 ms 的处理速度，约 78 张 FPS ，以及在适当的距离下，有不错的检测率。由于此范例属于复合式的应用，故实际花费时间应该为手部与手骨侦测的花费时间，粗估计算为 10ms + N * ( 12 ms ) ，其中 N 为侦测到的手部数量。下一章节将会介绍机器学习 GAN 架构应用之一的 “风格转换应用( Style Transform)” ，敬请期待 !!。

五. 参考文件

[1] SSD: Single Shot MultiBox Detector
[2] SSD-Tensorflow
[3] Single Shot MultiBox Detector (SSD) 论文阅读
[4] ssd-mobilenet v1 算法结构及程式码介绍
[5] Nonparametric Structure Regularization Machine for 2D Hand Pose Estimationr
[6]Mediapipe - Hand landmarks detection guide for Python

如有任何相关 TensorFlow Lite 进阶技术问题，欢迎至博文底下留言提问 !!
接下来还会分享更多 TensorFlow Lite 进阶的技术文章 !!敬请期待 【ATU Book-i.MX8系列 – TFLite 进阶】 !!

★博文内容均由个人提供，与平台无关，如有违法或侵权，请与网站管理员联系。

★文明上网，请理性发言。内容一周内被举报5次，发文人进小黑屋喔~

【ATU Book-i.MX8系列 - TFLite 进阶】手骨识别应用

评论