ROS高效进阶第五章 -- 机器人语音交互之ros集成科大讯飞中文语音库，实现语音控制机器人小车_交互

机器人语音交互之ros集成科大讯飞中文语音库，实现语音控制机器人小车

1 背景和资料
2 正文
- 2.1 下载科大讯飞语音库
- 2.2 robot_voice 之语音控制机器人小车移动样例
3 总结

1 背景和资料

从本文开始，我们将用两篇文章学习机器人语音交互。本文作为第一篇，将在ros上集成科大讯飞的中文语音库，实现语音控制机器人小车运动。至于语音识别和语音合成的原理，本文并不深究，读者可以自行搜索相关的文章介绍。这里提醒，本文的测试环境是ubuntu20.04 + ros noetic。
本文参考资料如下：
（1）《ros机器人开发实践》胡春旭第8章
（2）
（3）讯飞语音听写 linux sdk 文档
（4）ros高效入门第二章 – 基本概念和常用命令学习，基于小乌龟样例
（5）ros高效进阶第三章 – 以差速轮式机器人为例，使用gazebo构建机器人仿真平台
本系列博客汇总：ros高效进阶系列。

2 正文

2.1 下载科大讯飞语音库

（1）首先登陆讯飞开放平台：讯飞开放平台，注册后，点击控制台进入。
（2）然后创建应用并下载linux sdk，更具体的操作可以参考：
在这里插入图片描述
（3）最后得到自己专属的sdk，如我本人的：linux_iat1227_tts_online1227_bb839ccf.zip，其中 bb839ccf 是专属bb839ccf。下面我们将把这套 sdk 集成到 robot_voice 样例中，这里不对这个 zip 包内容进行展开讲解。

2.2 robot_voice 之语音控制机器人小车移动样例

（1）robot_voice 样例，我们将实现两个应用，第一个就是本文的语音控制机器人小车移动，拓扑图如下：

voice_detector：负责语音识别，将语音转换为文字，并作为 client，通过 human_chatter 服务，发给 robot_controller 。
robot_controller：作为 human_chatter 服务 server，接收 voice_detector 发来的文字化的指令，并生成对应的语音播报文字和控车命令。前者通过 str2voice 服务，发给 voice_creator，后者通过 /cmd_vel topic，发给 mbot_gazebo。
voice_creator：作为 str2voice 服务server，接收 robot_controller 发来的语音播报文字，合成语音文件并播放。
mbot_gazebo：作为机器人小车，接收 /cmd_vel topic，并调整运动状态。
补充：关于ros的服务机制，可以参考本人ros高效入门博客第二章的2.6节: ros高效入门第二章 – 基本概念和常用命令学习，基于小乌龟样例
（2）安装环境：

unzip linux_iat1227_tts_online1227_bb839ccf.zip
sudo cp libs/x64/libmsc.so /usr/lib/
sudo apt update
sudo apt install sox
sudo apt install libsox-fmt-all

sox, 全称 sound exchange，被官方称为 “the swiss army knife of audio manipulation”。
它是一个强大的用于转换和处理声音文件的库。因其操作简单且功能强大，广泛应用在音频数据的处理和分析中。
除此之外，本人在编译讯飞样例时，遇到了：

linuxrec.c:12:10: fatal error: alsa/asoundlib.h: no such file or directory

解决方式是：

sudo apt-get install libasound2-dev

alsa，全称advanced linux sound architecture (alsa) 库，用于处理音频设备。
（3）创建 robot_voice 及相关文件

cd ~/catkin_ws/src
catkin_create_pkg robot_voice roscpp rospy std_msgs geometry_msgs message_generation message_runtime

cd robot_voice 
mkdir srv launch
touch srv/stringtovoice.srv launch/voice_control_robot.launch
touch src/voice_detector.cpp src/robot_controller.cpp src/voice_creator.cpp
mkdir ifly_voice include/ifly_voice

请将 linux_iat1227_tts_online1227_bb839ccf.zip 中的相关文件，分别移入相关目录，供编译使用，如下图：
在这里插入图片描述
（4）voice_detector.cpp

#include <ros/ros.h>
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "ifly_voice/qisr.h"
#include "ifly_voice/qtts.h"
#include "ifly_voice/msp_cmn.h"
#include "ifly_voice/formats.h"
#include "ifly_voice/msp_errors.h"
#include "ifly_voice/speech_recognizer.h"
#include <std_msgs/string.h>
#include <robot_voice/stringtovoice.h>

class helper {
public:
  static void signalhandler(int signal) {
      ros_info("\ncaught signal %d. exiting gracefully...\n", signal);
      exit(0);
  }
};

class voicedetector {
public:
  voicedetector() {
    ros_info("voicedetector constructor");
  }
  ~voicedetector() {
    ros_info("voicedetector destructor");
  }

  int init() {
    int ret = msp_success;
    ret = msplogin(null, null, login_params_.c_str());
    if (msp_success != ret)	{
      ros_error("msplogin failed , error code %d", ret);
      msplogout(); // logout...
      return -1;
    }    
    ros_info("voicedetector msp login for update, waiting for seconds...");
    return 0;
  }

  static void jointxt(const char *result, char is_last) {
    if (result) {
      std::string slice_txt = result;
      voicedetector::voice_txt_ += slice_txt;
    }
    if (is_last) {
      printf("voice txt : %s\n", voicedetector::voice_txt_.c_str());
    }
  }

  static void initspeech() {
    voicedetector::voice_txt_ = "";
    printf("clear cache, start listening...\n");
  }

  static void endspeech(int reason) {
    if (reason == end_reason_vad_detect) {
      printf("\nspeaking done \n");
    } else {
      printf("\nrecognizer error %d\n", reason);
    }
  }

  int speechonce() {
    int ret;
    int i = 0;
    struct speech_rec iat;

    struct speech_rec_notifier recnotifier = {
      jointxt,
      initspeech,
      endspeech
    };

    ret = sr_init(&iat, session_begin_params_.c_str(), sr_mic, &recnotifier);
    if (ret) {
      ros_error("speech recognizer init failed");
      return -1;
    }

    ret = sr_start_listening(&iat);
    if (ret) {
      printf("start listen failed %d\n", ret);
    }

    /* demo 15 seconds recording */
    sleep(10);

    ret = sr_stop_listening(&iat);
    if (ret) {
      printf("stop listening failed %d\n", ret);
    }

    sr_uninit(&iat);
    return 0;
  }

  static std::string get_voice_txt_() {
    return voice_txt_;
  }

private:
	const std::string login_params_ = "appid = bb839ccf, work_dir = .";
	const std::string session_begin_params_ =
		"sub = iat, domain = iat, language = zh_cn, "
		"accent = mandarin, sample_rate = 16000, "
		"result_type = plain, result_encoding = utf8";

  const uint32_t	buffersize = 4096;
  uint64_t g_buffersize = buffersize;
  static std::string voice_txt_;
};

std::string voicedetector::voice_txt_ = "";

int main(int argc, char* argv[]) {
  int ret = 0;
  ros::init(argc, argv, "voice_detector");
  ros::nodehandle nh;
  // 创建 human_chatter 服务client
  ros::serviceclient client_ = nh.serviceclient<robot_voice::stringtovoice>("human_chatter");

  if (signal(sigint, helper::signalhandler) == sig_err) {
    return -1;
  }

  voicedetector vd;
  ret = vd.init();
  if (ret < 0) {
    return -1;
  }
  
  while (1) {
  	// 一次聆听
    ret = vd.speechonce();
    if (ret < 0) {
      return -1;
    }
	// 获取当次聆听得到的内容
    std::string voice_txt = voicedetector::get_voice_txt_();
    if (voice_txt == "") {
      printf("voice_txt is empty, do not send chatter\n");
      continue;
    } else if (voice_txt.find("结束") != std::string::npos) {
      break;
    }
	// 通过 human_chatter 服务，发给robot_controller，处理成功后，进入下一轮
    robot_voice::stringtovoice::request req;
    robot_voice::stringtovoice::response resp;
    req.data = voice_txt;
    bool ok = client_.call(req, resp);
    if (ok) {
      printf("send human_chatter service success: %s\n", req.data.c_str());
    } else {
      printf("failed to send human_chatter service\n");
    }
  }

  ros::spin();
  return 0;
}

（5）robot_controller.cpp

#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <ros/ros.h>
#include <std_msgs/string.h>
#include <geometry_msgs/twist.h>
#include <robot_voice/stringtovoice.h>


class robotcontroller {
public:
  robotcontroller() {
    ros_info("robotcontroller constructor");
  }

  ~robotcontroller() {
    ros_info("robotcontroller destructor");
  }

  int init(ros::nodehandle& nh) {
    cmd_pub_ = nh.advertise<geometry_msgs::twist>("/cmd_vel", 1000);
    client_ = nh.serviceclient<robot_voice::stringtovoice>("str2voice");
    return 0;
  }

  void todownstream(const std::string& answer_txt, float linear_x, float angular_z) {
  	// 通过 str2voice 服务和 /cmd_vel topic向下游 voice_creator 和 mbot_gazebo 发送
    robot_voice::stringtovoice::request req;
    robot_voice::stringtovoice::response resp;
    req.data = answer_txt;

    bool ok = client_.call(req, resp);
    if (ok) {
      printf("send str2voice service success: %s, and pub cmd_vel\n", req.data.c_str());
       geometry_msgs::twist msg;
       msg.linear.x = linear_x;
       msg.angular.z = angular_z;
       cmd_pub_.publish(msg);
    } else {
      ros_error("failed to send str2voice service");
    }
  }

  bool chattercallbback(robot_voice::stringtovoice::request &req, robot_voice::stringtovoice::response &resp) {
    printf("i received: %s\n", req.data.c_str());
    std::string voice_txt = req.data;
	// 根据指令关键字，发送对应的语音播包文字和 cmd_vel 命令
    if (voice_txt.find("前") != std::string::npos) {
      todownstream("小车请向前跑", 0.3, 0);
    } else if (voice_txt.find("后") != std::string::npos) {
      todownstream("小车请向后倒", -0.3, 0);
    } else if (voice_txt.find("左") != std::string::npos) {
      todownstream("小车请向左转", 0, 0.3);
    } else if (voice_txt.find("右") != std::string::npos) {
      todownstream("小车请向右转", 0, -0.3);
    } else if (voice_txt.find("转") != std::string::npos) {
      todownstream("小车请打转", 0.3, -0.3);
    }

    resp.success = true;
    return resp.success;
  }

  void start(ros::nodehandle& nh) {
  	// 申明 human_chatter 服务，chattercallbback是回调函数
    chatter_server_ = nh.advertiseservice("human_chatter", &robotcontroller::chattercallbback, this);
  }

private:
  ros::serviceserver chatter_server_;
  ros::publisher cmd_pub_;
  ros::serviceclient client_;
};

int main(int argc, char* argv[]) {
  int ret = 0;
  ros::init(argc, argv, "voice_controller");
  ros::nodehandle nh;

  robotcontroller rc;
  rc.init(nh);

  printf("this is a voice controller app for robot, you can say: 向前, 向后, 向左, 向右, 转圈, 结束\n");
  rc.start(nh);

  ros::spin();
  return 0;
}

（6）voice_creator.cpp

#include <ros/ros.h>
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "ifly_voice/qisr.h"
#include "ifly_voice/qtts.h"
#include "ifly_voice/msp_cmn.h"
#include "ifly_voice/formats.h"
#include "ifly_voice/msp_errors.h"
#include "ifly_voice/speech_recognizer.h"
#include <robot_voice/stringtovoice.h>

class helper {
public:
  static void signalhandler(int signal) {
      ros_info("\ncaught signal %d. exiting gracefully...\n", signal);
      exit(0);
  }
};

class voicecreator {
public:
  voicecreator() {
    ros_info("voicecreator constructor");
  }

  ~voicecreator() {
    ros_info("voicecreator destructor ");
  }

  int init() {
    int ret = msp_success;
    ret = msplogin(null, null, login_params_.c_str());
    if (msp_success != ret)	{
      ros_error("msplogin failed , error code %d", ret);
      msplogout(); // logout...
      return -1;
    }    

    ros_info("voicecreator msp login for update, waiting for seconds...");
    return 0;
  }


  int processtxt(std::string& txt) {
    int          ret          = -1;
    file*        fp           = null;
    const char*  sessionid    = null;
    unsigned int audio_len    = 0;
    int          synth_status = msp_tts_flag_still_have_data;
    wavepcmhdr_t wav_hdr      = {
      { 'r', 'i', 'f', 'f' },
      0,
      {'w', 'a', 'v', 'e'},
      {'f', 'm', 't', ' '},
      16,
      1,
      1,
      16000,
      32000,
      2,
      16,
      {'d', 'a', 't', 'a'},
      0  
    };

    const char* src_text = txt.c_str();
    const char* des_path = filename_.c_str();
    const char* params = session_begin_params_.c_str();

    if (null == src_text || null == des_path) {
      ros_error("params is error!");
      return ret;
    }

    fp = fopen(des_path, "wb");
    if (null == fp) {
      ros_error("open %s error", des_path);
      return ret;
    }

    /* 开始合成 */
    sessionid = qttssessionbegin(params, &ret);
    if (msp_success != ret) {
      ros_error("qttssessionbegin failed, error code: %d", ret);
      fclose(fp);
      return ret;
    }

    ret = qttstextput(sessionid, src_text, (unsigned int)strlen(src_text), null);
    if (msp_success != ret) {
      ros_error("qttstextput failed, error code: %d",ret);
      qttssessionend(sessionid, "textputerror");
      fclose(fp);
      return ret;
    }
    
    printf("正在合成 ...\n");
    fwrite(&wav_hdr, sizeof(wav_hdr) ,1, fp); //添加wav音频头，使用采样率为16000
    while (1)  {
      /* 获取合成音频 */
      const void* data = qttsaudioget(sessionid, &audio_len, &synth_status, &ret);
		  if (msp_success != ret) {
        break;
      }

      if (null != data) {
        fwrite(data, audio_len, 1, fp);
        wav_hdr.data_size += audio_len; //计算data_size大小
      }

      if (msp_tts_flag_data_end == synth_status) {
        break;
      }

      printf(">");
      usleep(150*1000); //防止频繁占用cpu
    }
    printf("\n");

    if (msp_success != ret) {
      ros_error("qttsaudioget failed, error code: %d",ret);
      qttssessionend(sessionid, "audiogeterror");
      fclose(fp);
      return ret;
    }

    /* 修正wav文件头数据的大小 */
    wav_hdr.size_8 += wav_hdr.data_size + (sizeof(wav_hdr) - 8);
    
    /* 将修正过的数据写回文件头部,音频文件为wav格式 */
    fseek(fp, 4, 0);
    fwrite(&wav_hdr.size_8,sizeof(wav_hdr.size_8), 1, fp); //写入size_8的值
    fseek(fp, 40, 0); //将文件指针偏移到存储data_size值的位置
    fwrite(&wav_hdr.data_size,sizeof(wav_hdr.data_size), 1, fp); //写入data_size的值
    fclose(fp);
    fp = null;
    /* 合成完毕 */
    ret = qttssessionend(sessionid, "normal");
    if (msp_success != ret) {
      ros_error("qttssessionend failed, error code: %d", ret);
      return ret;
    }
	// 播放语音文件
    fp = popen(play_cmd_.c_str(),"r");
    if (fp == null) {
      ros_error("play /tmp/tts_sample.wav failed");
      return -1;
    }
    sleep(1);
    pclose(fp);

    return 0;
  }

  bool speeking(robot_voice::stringtovoice::request &req, robot_voice::stringtovoice::response &resp) {
    int ret = -1;
    ret = processtxt(req.data);
    if (msp_success != ret) {
      ros_error("answervoice failed, error code: %d", ret);
      resp.success = false;
      return false;
    } else {
      resp.success = true;
    }
    return resp.success;
  }

  void start(ros::nodehandle& nh) {
  	// 申明 str2voice 服务
    server_ = nh.advertiseservice("str2voice", &voicecreator::speeking, this);
  }

private:
  	ros::serviceserver server_;
	const std::string login_params_ = "appid = bb839ccf, work_dir = .";
	const std::string session_begin_params_ = 
    "voice_name = xiaoyan, text_encoding = utf8, "
    "sample_rate = 16000, speed = 50, volume = 50, "
    "pitch = 50, rdn = 2";
    //合成的语音文件名称
	const std::string filename_ = "/tmp/tts_sample.wav"; 
	//语音播放命令
  	const std::string play_cmd_ = "play /tmp/tts_sample.wav";

  /* wav音频头部格式 */
  typedef struct wavepcmhdr {
    char            riff[4];                // = "riff"
    int		size_8;                 // = filesize - 8
    char            wave[4];                // = "wave"
    char            fmt[4];                 // = "fmt "
    int		fmt_size;		// = 下一个结构体的大小 : 16

    short int       format_tag;             // = pcm : 1
    short int       channels;               // = 通道数 : 1
    int		samples_per_sec;        // = 采样率 : 8000 | 6000 | 11025 | 16000
    int		avg_bytes_per_sec;      // = 每秒字节数 : samples_per_sec * bits_per_sample / 8
    short int       block_align;            // = 每采样点字节数 : wbitspersample / 8
    short int       bits_per_sample;        // = 量化比特数: 8 | 16

    char            data[4];                // = "data";
    int		data_size;              // = 纯数据长度 : filesize - 44 
  } wavepcmhdr_t;

};

int main(int argc, char ** argv) {
  int ret = 0;
  ros::init(argc, argv, "voice_creator");
  ros::nodehandle nh;

  if (signal(sigint, helper::signalhandler) == sig_err) {
    return -1;
  }

  voicecreator vc;
  ret = vc.init();
  if (ret < 0) {
    return -1;
  }

  vc.start(nh);
  ros::spin();
  return 0;
}

（6）stringtovoice.srv , voice_control_robot.launch 和 cmakelists.txt
stringtovoice.srv

string data
---
bool success

voice_control_robot.launch

<launch>
  <node
      pkg="robot_voice"
      type="voice_creator"
      name="voice_creator"
      output="screen"
  />
  <node
      pkg="robot_voice"
      type="robot_controller"
      name="robot_controller"
      output="screen"
  />
  <node
      pkg="robot_voice"
      type="voice_detector"
      name="voice_detector"
      launch-prefix="bash -c 'sleep 5; $0 $@'"
      output="screen"
  />
</launch>

cmakelists.txt

cmake_minimum_required(version 3.0.2)
project(robot_voice)
add_compile_options(-std=c++11)
find_package(catkin required components
  roscpp
  rospy
  std_msgs
  geometry_msgs
  message_generation
)
add_service_files(
  files
  stringtovoice.srv
)
generate_messages(
  dependencies
  std_msgs
)
catkin_package(
  catkin_depends message_runtime roscpp rospy std_msgs
)
include_directories(
  include
  ${catkin_include_dirs}
)
add_executable(voice_detector 
  src/voice_detector.cpp
  ifly_voice/speech_recognizer.c
  ifly_voice/linuxrec.c)

add_executable(robot_controller src/robot_controller.cpp)
add_executable(voice_creator src/voice_creator.cpp)

add_dependencies(voice_detector ${project_name}_generate_messages_cpp)
target_link_libraries(voice_detector
  ${catkin_libraries} 
  libmsc.so -ldl -lpthread -lm -lrt -lasound
)
add_dependencies(robot_controller ${project_name}_generate_messages_cpp)
target_link_libraries(robot_controller
  ${catkin_libraries} 
)
add_dependencies(voice_creator ${project_name}_generate_messages_cpp)
target_link_libraries(voice_creator
  ${catkin_libraries} 
  libmsc.so -ldl -pthread
)

（7）编译并运行（运行时请注意电脑网络通畅！）

cd ~/catkin_ws/
catkin_make -dcatkin_whitelist_packages="robot_voice;mbot_gazebo"
source devel/setup.bash
roslaunch mbot_gazebo view_mbot_gazebo.launch
// 再开一个窗口
source devel/setup.bash
roslaunch robot_voice voice_control_robot.launch

语音控制机器人

（8）在开发调试过程中，出现了如下编译报错：

internal compiler error: illegal instruction

不得已，更新了gcc版本，问题解决

sudo apt-get install gcc-10
sudo apt-get install g++-10
cd /usr/bin
sudo rm gcc g++
sudo ln -s gcc-10 gcc
sudo ln -s g++-10 g++

3 总结

本文的样例托管在本人的github上：robot_voice，mbot_gazebo

ROS高效进阶第五章 -- 机器人语音交互之ros集成科大讯飞中文语音库，实现语音控制机器人小车

2024年08月01日 • 交互 •我要评论

机器人语音交互之ros集成科大讯飞中文语音库，实现语音控制机器人小车

1 背景和资料

2 正文

2.1 下载科大讯飞语音库

2.2 robot_voice 之语音控制机器人小车移动样例

3 总结

相关文章:

手把手教你调用文心一言API，含py调用示例代码

在敏捷项目管理中实施 Scrum 方法

掌握Scrum：敏捷开发中的短期迭代与定期会议

发表评论


验证码：