C#操作Word模拟解析HTML标记输出带格式的文本_网页播放器

需求与困惑

应需求通过算法输出纯文本内容到 ms word 对应的替换字段中，原有的设计仅能保持模板设定的格式，如下是一个word表格，下方单元格中输出题目内容，固定格式为宋体：

但客户的需求是希望题目为黑体加粗，考察关键点为正常宋体，颜色置灰，如下图：

初期的设想是通过 word.find 对象配合扩展的格式参数，进行查找结果关键字进行替换及格式重置操作，发现无法定位精准或有效的 range ，尤其是 word.shape.textframe.textrange ，参考、搜索了一些资料，问题仍无法解决。

解决方案

目前主要针对如下两个 range 对象进行操作：

序号	对象	说明
1	word.appication.selection.range	页面选择区域范围对象（如查找到的段落高亮文字显示结果）
2	word.shape.textframe.textrange	形状对象，对象内包含文字，且查找到的文字结果范围range

基本的实现的思路如下：

一、将原始输出文本按照指定的定义进行 html 标记化，如将 “这是一段文本” 文本更改为 “<span style='font-family:黑体;font-weight:bold'>这是一段文本</span>” (html 部分使用标准的 span + style )，这样可以同时兼容标准的网页版输出。

二、对 range 的文本（text）使用正则表达式提取 html 标记间的所有查找关键字。

三、对 range 的字符集对象（word.characters）进行逐字操作，提取 html 标记的 style 属性部分，分隔各种 style 进行解析，重刷每一个字符的格式。

四、处理完格式设置，调用 range.find 对象替换掉 “多余” 的 html 标记文本，完成最终输出效果。

范例运行环境

操作系统： windows server 2019 datacenter

操作系统上安装 office ｗord 2016

数据库：microsoft sql server 2016

.net版本： .netframework4.7.1 或以上

开发工具：vs2019 c#

配置office dcom

配置方法可参照我的文章《c# 读取word表格到dataset》进行处理和配置。

设计实现

组件库引入

方法实现

processwordchars 方法基本说明如下表：

序号	参数名称	参数类型	说明
1	chars	word.characters	word.range的字符集对象

方法示例代码如下：

void processwordchars(word.characters chars)
{

  string content = chars.parent.text;
  if (content == null || content == "") { return; }
  word.find fnd = chars.parent.find;

  arraylist paras2 = new arraylist();
  paras2.add(new string[] { "<span style=", "</span>" });
  foreach (string[] p in paras2)
  {
      string pattern = string.format(@"{0}(.*?){1}", p[0], p[1]);
      system.text.regularexpressions.matchcollection matches = system.text.regularexpressions.regex.matches(content, pattern);
      foreach (system.text.regularexpressions.match match in matches)
      {
         string key = match.groups[1].value;  //提取的内容
         string vkey = key.substring(key.indexof('>') + 1); //最终有效内容
                    
         string vstyle = key.substring(1, key.length - vkey.length - 3); //截取 style 值
         string findkey = p[0] + key + "</span>";  //最终替换部分
         int fk = content.indexof(findkey);
         if (fk != -1)
         {
             for (int i = 1; i <= findkey.length; i++)
             {
                 foreach (string kv in vstyle.split(';'))
                 {
                     string[] style = kv.split(':');
                     if (style[0] == "color")
                     {
                         chars[fk + i].font.color =(word.wdcolor)colortranslator.toole(colortranslator.fromhtml(style[1]));
                                        // 获取argb值
                     }
                     else if(style[0]== "font-weight")
                     {
                         if (style[1] == "bold") {
                             chars[fk + i].font.bold=1;
                         }
                     }
                     else if (style[0] == "font-family")
                     {
                         chars[fk + i].font.name=style[1];
                     }
                 }
              }
          fnd.clearformatting();
          object findtext = findkey;
          object matchcase = false; object matchwholeword = type.missing; object matchwildcards = false; object matchsoundslike = false; object matchallwordforms = false;
          object forward = true; object wrap = word.wdfindwrap.wdfindcontinue; object format = false;
          object replacewith = vkey;
          object replace = word.wdreplace.wdreplaceall; object matchkashida = type.missing; object matchdiacritics = type.missing; object matchalefhamza = type.missing; object matchcontrol = type.missing;
          fnd.execute(ref findtext, ref matchcase, ref matchwholeword, ref matchwildcards, ref matchsoundslike, ref matchallwordforms,ref forward, ref wrap, ref format, ref replacewith, ref replace, ref matchkashida, ref matchdiacritics, ref matchalefhamza, ref matchcontrol);
          content = chars.parent.text;
         }
      }
   }
}