GA-BPSO Hybrid Optimization of Middle Infrared Spectrum Feature Band Selection of Lubricating Oil Additive Type Identification Technology
-
摘要:
针对各种设备润滑油中微量多品种添加剂种类识别问题,提出二进制粒子群算法结合遗传算法(GA-BPSO)混合优化中红外光谱特征波段筛选方法. 首先建立K近邻算法(KNN)和随机森林算法(RF)的润滑油添加剂种类识别基础分类模型;然后通过GA-BPSO混合优化算法在光谱全波段范围内筛选特征波段区域,消除干扰及无效信息,压缩庞大光谱数据集,降低搜索空间维度;再以模型识别准确率作为评价标准,用优选出的特征波段在基础分类模型上构建高性能增强分类模型. 选取硫化异丁烯(T321)、烷基二苯胺(T534)和硫化磷酸胺盐(T307) 三种润滑油添加剂作为测试对象,以不同配比混合在基础油中,采集配制样品的中红外光谱数据,并划分为训练集与测试集,分别导入基础分类模型与增强分类模型进行训练及测试. 结果显示,GA-BPSO优化筛选特征波段,使KNN的有效波段长度削减至原来的16.4%,识别准确率从70%提高到89.58%;RF的有效波段长度削减至原来的15.8%,识别准确率从85%提升至97.5%. 对比研究发现,GA-BPSO混合特征波段优选方法明显优于GA和BPSO单独使用时的筛选结果,在极大地减轻运行负担的同时,有效提高了模型多种类同步识别的准确率和稳定性.
Abstract:Aiming at solving the problem of species identification of several additives in trace quantity in lubricating oils, a hybrid optimization method for feature band selection of middle infrared spectrum based on binary particle swarm optimization (BPSO) and genetic algorithm (GA) was proposed as GA-BPSO. Firstly, the basic classification model of oil additive species recognition by K nearest neighbor algorithm (KNN) and random forest algorithm (RF) was established. Then the GA-BPSO hybrid optimization algorithm was used to screen the characteristic band region in the whole band range of the spectrum. Then the model recognition accuracy was used as the evaluation criterion and the optimized feature bands were used to build the high-performance enhanced classification model on the basic classification model. In this study, three kinds of lubricating oil additives (isobutylene sulfide (T321), alkyl diphenylamine (T534) and sulphur amine salt phosphate (T307)) with different mixing ratios in the base oil were chosen as samples. The infrared spectrum data of the prepared samples were acquired and divided into training set and test set, and finally imported based classification model and enhance the classification model for training and testing. The results showed that GA-BPSO optimized the selection of feature bands, reduced the effective band length of KNN to the original 16.4%, and improved the recognition accuracy from 70% to 89.58%. The effective band length of the RF was reduced to 15.8%, and the recognition accuracy was improved from 85% to 97.5%. Comparative study showed that the GA-BPSO hybrid feature band optimization method was significantly better than the screening results by either GA or BPSO. This greatly reduced the operating burden and effectively improved the accuracy and stability of multi-type synchronous recognition of the model.
-
润滑油是广泛应用于汽车、机械和冶金等行业的重要产品,主要由基础油和添加剂组成. 其中添加剂含量占比通常为1%~10%,主要用途是改善基础油的原有性能或赋予新的性能. 润滑油添加剂的种类繁多,有十几大类,近万个品种. 一般地说,润滑油的品种和质量往往取决于添加剂的品种和质量[1],因而对添加剂的合理生产和使用,已成为有效利用资源,提高设备性能和节约能源的重要途径之一[2-3].
鉴于添加剂在润滑油品中所起的关键作用,实际生活中经常通过对润滑油内添加剂的种类及含量的识别与剖析,作为判定润滑油类别归属、品质和性能的重要依据. 不仅适用于出厂油品的质量检测,也可据此监测与评估在用润滑油的健康等级,还成为对未知油品鉴别与解析的主要手段[4]. 但是传统采用的理化测试分析方法多受人为因素影响,耗时长、误差大且成本高. 近年来兴起的中红外光谱分析技术[5-6],结合计算机技术和模式识别技术等近代计算分析方法,相较于传统方法,具有无损、快速、高通量和低成本等优点[7-8],越来越广泛应用于各种有机物质的定性和定量分析. 但在润滑领域中的应用,目前多侧重于油品中单一成分检测[9-10],针对油品中的微量添加剂多种类同步识别的研究在文献中尚未见到.
对中红外光谱数据的处理,现应用较多的算法有偏最小二乘回归(PLS)[11]、支持向量机(SVM)[12]、K近邻(KNN)[13]和随机森林(RF)[14]等. 其中KNN和RF可以用于非线性分类[15-18],较适用于样本容量比较大的类域自动分类问题,且不易发生过拟合,尤其是所具有的多输出特点[19-20],为本文解决多种添加剂的种类同步识别问题提供了解决思路. 但是此类方法通常需事先读取所有的训练数据,而占据大量的内存空间,影响求解效率。为消除干扰及无效信息,降低搜索空间维度,最为行之有效的办法就是对中红外光谱特征波段进行筛选. 常用的特征优选方法有后向间隔偏最小二乘法(BiPLS)[21]、蚁群算法(ACO)[22-23]、遗传算法(GA)[24]和粒子群算法(PSO)[25]等. Liu等[26]利用BiPLS进行近红外光谱的特征优选,并结合PLS建立玉米秸秆中纤维素和半纤维素含量预测模型,但BiPLS筛选波长耗时长且结果稳定性较差. Zhang等[27]尝试了改进ACO算法对土壤全氮谱的特征提取,但ACO一般需要较长的搜索时间,且易出现停滞现象. 文献[28-29]均利用GA对近红外光谱进行特征选择,可以较好地解决选择过程中波段组合数目多和难以遍历的问题,但其在有限时间内的收敛效率有待提高. 二进制粒子群算法(BPSO)是PSO的二进制版本,具有很强的全局搜索能力,在配电网故障定位和组合优化问题方面已得到广泛应用,如文献[30]用BPSO优化了导电材料的分布,但其尚未被引入分类模型和光谱特征优选问题中,且随着算法迭代搜索随机性越来越强,缺乏后期的局部搜索能力. 可见,单一优化算法在计算效率、全局搜索性、通用性和简洁性方面各具优势.
本文选用KNN算法和RF算法分别建立润滑油添加剂种类识别基础分类模型,设计并实现GA-BPSO混合优化对光谱全波段的特征波段区域筛选方法,并以此构建增强分类模型. 选取实例进行测试,通过对比分析GA-BPSO混合优选以及GA和BPSO各自单一应用时,两种基础模型分别与其对应的增强模型之间波段范围变化及识别结果差异,考察光谱波段筛选效果和模型识别准确率情况,验证GA-BPSO优选方法的有效性.
1. 设备润滑油中红外光谱数据采集
1.1 润滑油样本配制
测试样本所用基础油为PAO-10,选用了T321、T534和T307三种常见设备润滑油添加剂[31]. 其中T321是采用硫黄或单氯化硫及异丁烯为原料制得的含硫添加剂,具有含硫量高、极压性能好、油溶性好和颜色浅等优点;T534是烷基二苯胺,一种性能优越的高温抗氧剂;T307是硫代磷酸胺盐,具有优良的极压抗磨性、抗氧抗腐蚀性能及热稳定性. 此三种添加剂多用于配制车辆齿轮油、工业齿轮油和汽轮机油等工业设备润滑油.
样本调配:三种添加剂均取质量分数为1%的量,按照排列组合的方式分别加入基础油,构成8种样本,并统一进行编号,如图1所示.
每种样本采集10条光谱数据,按照3:1的比例划分训练集和测试集.
其中训练集数据的分布情况如图2所示.
1.2 检测仪器及测试方法
试验样本数据采集仪器为Thermo Scientific Nicolet iS5傅里叶变换红外光谱仪,光谱范围:7 800~350 cm−1,采用KBr(溴化钾)窗片. 采集设置:扫描次数16次,分辨率4,数据间隔1.928 cm−1(扫描速度:0.10~2 cm/s). 每个样本重新装样后采集10次光谱数据,模拟不同采集人员在红外光谱采集过程中产生的人工误差.
1.3 光谱数据预处理
Min-max标准化处理,也称离差标准化,是对原始数据进行线性变换,使处理后的数据映射在0~1之间,公式为
Yi=Xi−Xmin (1) 其中:Xmax为样本数据的最大值,Xmin为样本数据的最小值,Yi为映射在[0,1]区间上的投影.
2. 构建润滑油添加剂种类识别分类模型
2.1 建立KNN/RF种类识别基础分类模型
KNN算法的核心思想:已知所有样本集的类别,那么某个测试样本的类别,是根据该样本在特征空间中距离最近的K个相邻样本的占多数的类别来判断种类归属. 可见K的选取和距离计算是这一算法的关键,本文中经过测试确定将参数n_neighbors设置为5.
RF算法利用多个决策树对数据进行分类,随机是核心,一是随机在原始训练数据中有放回地选取等量的数据作为训练样本,二是在建立决策树时,随机地选取一部分特征建立决策树. 这两种随机使得各个决策树之间的相关性小,进一步提高模型的准确性. 每棵决策树之间可以并行化处理,效率相对较高. 本文训练集中有60个样本,每个样本的特征数为1 868,需要训练1个包含117棵树的随机森林,训练过程:
(1) 从原始的样本集n里,利用bootstrap方法提取k个训练集,各个训练集的样本个数为n;
(2) 分别学习k个训练集,建立k个决策树模型. 为改善决策树间的差异,对抽取得到的训练样本集的特征进行随机的抽取. 用网格搜索法优化参数n_estimators、max_depth,最终确定n_estimators=117,max_depth=7;
(3) 将k个决策树的结果进行组合,形成最终的结果.
将前面1.3节中处理后的样本训练集光谱全波段数据分别导入KNN和RF基础分类模型,输出均设置为三种添加剂可能的8种组合方式的种类标签.
2.2 PCA提取主成分
PCA的思想是将n维特征映射到k维空间上,k<n,该k维特征是全新的正交特征. 即在最大程度保留原始数据信息的同时,对数据进行压缩处理,从而达到降维和简化模型的目的,计算过程:
(1) 输入数据集X={X1, X2, X3,…, Xn},需要降至k维;
(2) 对所有样本进行中心化处理;
X_{i}=X_{i}-\frac{1}{n} \sum_{j=1}^{n} X_{i} (2) (3) 计算样本的协方差矩阵XXT,并对协方差矩阵XXT做特征值分解,计算其贡献率,取累积贡献率大于1的k个特征值对应的特征向量W1,W2,W3,…,Wk;
(4) 输出提取的特征值所对应的投影矩阵.
W=\left(W_{1}, W_{2}, W_{3} \cdots, W_{k}\right) (3) 2.3 GA-BPSO混合优化特征波段筛选
中红外光谱区域(基本振动区4 000~400 cm−1)包括了绝大多数有机和无机化合物的化学键振动基频区,是化合物鉴定的重要区域. 通过光谱鉴定或识别某种物质,重点在于搜索那些与该物质相关的特征波段区域. 如果可以,尽早排除非相关部分的无效波段,并解决光谱数据中通常存在的部分波段相关性过强的干扰问题,有必要对光谱数据进行波段优化筛选,剔除高相关性干扰波段,可以极大提高识别准确率和工作效率. 本文通过分析GA与BPSO两种仿生优化算法各自对有效波段区域筛选的优势,提出并设计了GA-BPSO混合优化特征波段筛选方法及实施方案.
GA是一类借鉴生物界的进化规律演化而来的随机化搜索方法,原理是随机产生一定数量的初始个体构成原始种群,通过选择、交叉和变异形成新种群,利用适应度函数对个体评估,将适应度高的个体遗传到下一代,直到满足终止条件,输出优化解. GA的运行参数范围设置如表1所示.
表 1 GA参数范围Table 1. Range of GA parametersThe population size Termination of algebra Crossover probability Mutation probability 2~100 2~200 0.4~0.99 0.000 1~0.1 经过GA优化出的最佳个体染色体为n位0-1编码组合,若基因编码为1,建模时包括此波段;若为0,则不包括此波段. 原始光谱进行离差标准化处理后, 将整条光谱 (1 868个数据点) 分为25个子区间, 前 7个子区间分别有74个数据点, 后18个子区间有75个数据点, 将25个光谱子区间进行0-1二进制基因编码, 每个波段为1个基因. 经过测试确定GA的参数:种群大小25,最大繁殖代数50,交叉概率为 0.6,变异概率为 0.01. 将染色体所代表的训练集光谱波段数据分别输入KNN和RF种类识别基础分类模型进行训练,将测试过程中所得到的准确率作为适应度函数进行区间筛选,得到优化波段区间组合.
特征选择问题本质上是1个合适的0/1串选择,串的长度是原始数据集中的特征个数N,0是不被选择,1则被选择. 寻找一个合适的0/1串最简单也最全面的方法就是暴力枚举,也就是把所有的取值都计算1遍,共有2N种可能,这样是不现实的. BPSO是在离散粒子群算法基础上,约定位置向量和速度向量均为0、1值构成. BPSO对每个粒子位置进行二进制编码,用sigmoid函数将速度转换到[0,1]区间,粒子速度代表每个位置为0或1的概率. 则BPSO的速度和位置更新公式为
v_{i d}(t+1)=w v_{i d}(t)+c_{1} r_{1} \times\left(P_{i d}-x_{i d}(t)\right)+c_{2} r_{2} \times\left(P_{g}-x_{i d}(t)\right) (4) x_{i d}(t+1)=\left\{\begin{array}{ll} 1, & \text { 若 } r<\operatorname{sig} \bmod \left(v_{id}(t)\right) \\ 0, & \text { 其他 } \end{array}\right. (5) \operatorname{sig} \bmod \left(v_{id}(t)\right)=\frac{1}{1+\exp \left(-v_{id}(t)\right)} (6) 其中vid为第i个粒子在d维解空间的速度;xid为第i个粒子在d维解空间的位置;w为惯性系数;c1、c2是学习因子,均为非负数;r1、r2为分布在[0,1]区间的随机值;Pid为个体最优位置;Pg为全局最优位置.
BPSO在光谱特征变量筛选时的二进制编码方式:每个粒子对应1个二进制码,其中数值1表示被选中,0则表示未被选中. 对应测试样品光谱总波数点1 868个,则粒子的长度由1 868个0和1的字符串组合而成. 寻优过程中,每个粒子的位置不断改变,对应的频率子集随之变化. 初始化粒子群,经过测试设置相关参数列于表2中.
表 2 BPSO参数设置Table 2. Parameter settings of BPSONumber of particles Particle swarm dimensions Inertia weight factor Learning factor Maximum number of iterations 30 1 868 0.5 c1=c2=2 50 将粒子所代表的训练集光谱波段数据分别输入KNN和RF种类识别基础分类模型进行训练,以当前所得的总体分类准确率作为评价函数,更新粒子的适应度值、个体极值和群体极值,循环迭代直到达到最大迭代次数,得到光谱波段筛选结果.
显然,BPSO选取了测试样品的光谱总波数点,设置粒子的长度达到1 868个数位,其中含有大量的冗余. 对此,本文提出并设计了一种将GA与BPSO混合进行特征波段优选的方法,先利用GA筛选出有效光谱组合区间,再利用BPSO对组合区间进行特征波数点筛选,从而使粒子的长度被尽可能缩短,两种模型的设置参数列于表3中.
表 3 GA- BPSO参数设置Table 3. Parameter settings of GA-BPSO
AlgorithmNumber of particles Particle swarm dimensions Inertia weight factor Learning factor Maximum number of iterations GA-KNN 30 671 0.5 C1=C2=2 50 GA-RF 30 598 0.5 C1=C2=2 50 将明显已经缩短的粒子所代表的训练集光谱波段数据分别输入GA-KNN和GA-RF种类识别基础分类模型进行训练,以当前所得的总体分类准确率作为评价函数,更新粒子的适应度值、个体极值和群体极值,循环迭代直到达到最大迭代次数,得到GA-BPSO光谱波段筛选结果.
2.4 KNN/RF种类识别增强分类模型构建
以KNN/RF种类识别基础分类模型为基础,采用GA-BPSO混合优化润滑油中红外光谱特征波段筛选,构建润滑油添加剂种类识别增强分类模型,其工作流程见图3.
3. 两类模型测试结果及分析
3.1 原始光谱数据离差标准化处理
将原始光谱数据先经离差标准化处理,结果如图4所示.
3.2 KNN/RF种类识别基础分类模型测试结果
图5所示为两种基础模型分别对三种添加剂的种类识别准确率结果统计图,“o”为测试集样本中添加剂的实际种类,“*”为模型预测输出的预测种类,若重合则表示预测正确. 表4列出了两种基础模型的种类识别准确率.
表 4 KNN/RF基础模型种类识别准确率Table 4. The accuracy of type recognition by KNN/RF basic model
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 KNN 0 50% 100% 100% 100% 100% 0 33% RF 100% 100% 60% 50% 100% 100% 100% 100% 图6所示为两种基础模型经PCA降维处理后分别对三种添加剂的种类识别结果的模糊矩阵图. 表5~6为两种基础模型经PCA降维处理后的种类识别准确率.
表 5 PCA-KNN分类精度Table 5. Classification accuracy of PCA-KNNCategory Precision Recall F1-score Support 000 0.00 0.00 0.00 1 001 1.00 0.50 0.67 2 010 1.00 1.00 1.00 5 011 0.50 1.00 0.67 2 100 1.00 1.00 1.00 2 101 0.75 1.00 0.86 3 110 0.00 0.00 0.00 2 111 0.50 0.33 0.40 3 Avg/total 0.59 0.60 0.57 20 表 6 PCA-RF分类精度Table 6. Classification accuracy of PCA-RFCategory Precision Recall F1-score Support 000 1.00 1.00 1.00 1 001 1.00 1.00 1.00 2 010 1.00 0.80 0.89 5 011 1.00 0.50 0.67 2 100 0.67 1.00 0.80 2 101 1.00 1.00 1.00 3 110 1.00 1.00 1.00 2 111 0.75 1.00 0.86 3 Avg/total 0.93 0.91 0.90 20 3.3 原始光谱特征波段区间筛选结果
3.3.1 单一GA优选光谱特征波段区间
单一GA优选光谱特征波段区间结果见图7. 由图7(a)可知,于KNN基础模型共筛选出6个区间,分别是541.899 2~682.677 2 cm−1、827.312 3~1 253.503 cm−1、1 542.773~1 830.115 cm−1、2 265.948~2 408.655 cm−1、3 423.028~3 565.735 cm−1、3 712.298~3 855.005 cm−1,优化筛选出的波段长度为原波段长度的35.9%. 由图7(b)可知,于RF基础模型共筛选出7个区间,分别是541.899 2~682.677 2 cm−1、970.018 7~1 110.797 cm−1、1 398.138~1 685.48 cm−1、1 832.043~1 974.75 cm−1、3 133.758~3 276.465 cm−1、3 423.028~3 565.735 cm−1、3 865.933~3 999.64 cm−1,优化筛选出的波段长度为原波段长度的32.0%.
3.3.2 单一BPSO优选光谱特征波段区间
单一BPSO优选光谱特征波段区间结果见表7. 对KNN基础模型,原始的1 868个光谱数据点被筛减到918个,优化筛选出的波段长度为原波段长度的49.1%. 对RF基础模型,原始的光谱数据点被筛减到884个,优化筛选出的波段长度为原波段长度的47.3%.
表 7 单一BPSO特征波段区间筛选结果Table 7. Screening results of single BPSO feature band interval
AlgorithmsOptimized
wavelengthsRatio of the original
wavelengthBPSO-KNN 918 49.1% BPSO-RF 884 47.3% 3.3.3 GA-BPSO混合优选光谱特征波段区间
GA-BPSO混合优化光谱特征波段区间筛选结果列于表8中. KNN基础模型的光谱数据点已经从原始的1 868个光谱数据点被筛减到307个,优化筛选出的波段长度为原波段长度的16.4%. RF基础模型的被筛减到295个,为原波段长度的15.8%.
表 8 GA-BPSO混合特征波段区间筛选结果Table 8. Screening results of GA-BPSO mixed feature band interval
Algorithms
Selected subinterval
WavelengthsRatio of the original wavelength GA-BPSO-KNN 2、4、5、6、9、
10、14、22、24307 16.4% GA-BPSO-RF 2、5、8、9、
11、20、22、25295 15.8% 3.4 KNN/RF种类识别增强分类模型测试结果
3.4.1 单一GA优选的KNN/RF增强分类模型测试结果
以单一GA优选的区间组合波段作为输入,两类增强模型输出的种类识别结果如图8所示,种类识别准确率列于表9中.
表 9 单一GA优选特征波段区间的两类增强模型种类识别准确率Table 9. The recognition accuracy of two kinds of enhanced models by the single GA optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 GA-KNN 100% 50% 100% 100% 100% 100% 100% 67% GA-RF 100% 100% 60% 100% 100% 100% 100% 100% 3.4.2 单一BPSO优选的KNN/RF增强分类模型测试结果
以单一BPSO优选的区间组合波段作为输入,两类增强模型输出的种类识别结果如图9所示,种类识别准确率列于表10中.
表 10 单一BPSO优选特征波段区间的两类增强模型种类识别准确率Table 10. The recognition accuracy of two kinds of enhanced models by the single BPSO optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 BPSO-KNN 100% 100% 100% 100% 100% 100% 0 33% BPSO-RF 100% 100% 80% 50% 100% 100% 100% 100% 3.4.3 GA-BPSO混合优选的KNN/RF增强分类模型测试结果
将GA-BPSO混合优选的区间组合波段作为输入,两类增强模型输出的种类识别结果如图10所示,种类识别准确率列于表11中.
表 11 GA-BPSO混合优选特征波段区间的两类增强模型种类识别准确率Table 11. The recognition accuracy of two kinds of enhanced models by the GA-BPSO hybrid optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 GA-BPSO-KNN 100% 50% 100% 100% 100% 100% 100% 67% GA-BPSO-RF 100% 100% 80% 100% 100% 100% 100% 100% 3.5 测试结果综合分析
综合上述各项测试结果,将光谱全部波段、GA优选波段、BPSO优选波段和GA-BPSO混合优选波段的KNN/RF基础模型及增强模型,对测试集的20个未知样品进行种类识别,识别准确率对比如图11所示.
由图11中可以看到:1)采用光谱波段优选的增强分类模型,识别准确率均优于基础分类模型. 其中GA-BPSO混合方法进行润滑油添加剂种类识别准确率最高,且因实际参与模型识别的光谱波长变量数最少,可使工作效率显著提高;2)在使用同等优选方法的情况下,RF模型的种类识别准确率要高于KNN模型. 其中,GA-BPSO-RF获得了最优的测试结果.
为此,对GA-BPSO-RF展开进一步分析发现,根据其他理化方法[32]已经测得的结果可知T321在657 cm−1处的C-S-C振动和在1 178 cm-1处的C-S(芳环)振动;T534在1 500 cm−1处的N-H伸缩振动;T307在930~1 110 cm−1处的P-N振动和2 150 cm−1处的P-S-N振动,这些振动点位与本文3.3.1中所列的GA-RF优选波段对比,获得了很好的对应关系,再经过BPSO进一步优选特征波长变量,则使最终的有效波段区间更加接近并围绕在真实的振动点位附近. 对比GA-BPSO-RF增强模型与RF基础模型,添加剂种类识别的准确率如图12所示.
由图12明显可以看到,经GA-BPSO混合优选波段后,针对T534、T534+T307的种类识别准确率分别由60%和50%提升至80%和100%,整体分类准确率由原来的85%提升至97.5%.
4. 结论
a. 由RF所建的基础分类模型可以对设备润滑油微量添加剂进行有效的多种类同步识别,相比于传统理化分析方法简单、快速且易于实现.
b. GA-BPSO优化筛选特征波段,能够有效削减KNN/RF基础分类模型的有效波段长度并提高识别准确率,且明显优于GA和BPSO单独使用时的筛选结果,在极大地减轻运行负担的同时,有效提高了模型多种类同步识别的准确率和稳定性.
c.GA-BPSO混合方法对特征波段的筛选效果最优,经过其优选后构造的KNN/RF增强分类模型均比对应的基础分类模型各项性能上有显著的提高,其中GA-BPSO-RF的测试效果超过GA-BPSO-KNN,可以作为一种高效的润滑油添加剂微量多种类同步识别的重要手段.
-
表 1 GA参数范围
Table 1 Range of GA parameters
The population size Termination of algebra Crossover probability Mutation probability 2~100 2~200 0.4~0.99 0.000 1~0.1 表 2 BPSO参数设置
Table 2 Parameter settings of BPSO
Number of particles Particle swarm dimensions Inertia weight factor Learning factor Maximum number of iterations 30 1 868 0.5 c1=c2=2 50 表 3 GA- BPSO参数设置
Table 3 Parameter settings of GA-BPSO
AlgorithmNumber of particles Particle swarm dimensions Inertia weight factor Learning factor Maximum number of iterations GA-KNN 30 671 0.5 C1=C2=2 50 GA-RF 30 598 0.5 C1=C2=2 50 表 4 KNN/RF基础模型种类识别准确率
Table 4 The accuracy of type recognition by KNN/RF basic model
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 KNN 0 50% 100% 100% 100% 100% 0 33% RF 100% 100% 60% 50% 100% 100% 100% 100% 表 5 PCA-KNN分类精度
Table 5 Classification accuracy of PCA-KNN
Category Precision Recall F1-score Support 000 0.00 0.00 0.00 1 001 1.00 0.50 0.67 2 010 1.00 1.00 1.00 5 011 0.50 1.00 0.67 2 100 1.00 1.00 1.00 2 101 0.75 1.00 0.86 3 110 0.00 0.00 0.00 2 111 0.50 0.33 0.40 3 Avg/total 0.59 0.60 0.57 20 表 6 PCA-RF分类精度
Table 6 Classification accuracy of PCA-RF
Category Precision Recall F1-score Support 000 1.00 1.00 1.00 1 001 1.00 1.00 1.00 2 010 1.00 0.80 0.89 5 011 1.00 0.50 0.67 2 100 0.67 1.00 0.80 2 101 1.00 1.00 1.00 3 110 1.00 1.00 1.00 2 111 0.75 1.00 0.86 3 Avg/total 0.93 0.91 0.90 20 表 7 单一BPSO特征波段区间筛选结果
Table 7 Screening results of single BPSO feature band interval
AlgorithmsOptimized
wavelengthsRatio of the original
wavelengthBPSO-KNN 918 49.1% BPSO-RF 884 47.3% 表 8 GA-BPSO混合特征波段区间筛选结果
Table 8 Screening results of GA-BPSO mixed feature band interval
Algorithms
Selected subinterval
WavelengthsRatio of the original wavelength GA-BPSO-KNN 2、4、5、6、9、
10、14、22、24307 16.4% GA-BPSO-RF 2、5、8、9、
11、20、22、25295 15.8% 表 9 单一GA优选特征波段区间的两类增强模型种类识别准确率
Table 9 The recognition accuracy of two kinds of enhanced models by the single GA optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 GA-KNN 100% 50% 100% 100% 100% 100% 100% 67% GA-RF 100% 100% 60% 100% 100% 100% 100% 100% 表 10 单一BPSO优选特征波段区间的两类增强模型种类识别准确率
Table 10 The recognition accuracy of two kinds of enhanced models by the single BPSO optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 BPSO-KNN 100% 100% 100% 100% 100% 100% 0 33% BPSO-RF 100% 100% 80% 50% 100% 100% 100% 100% 表 11 GA-BPSO混合优选特征波段区间的两类增强模型种类识别准确率
Table 11 The recognition accuracy of two kinds of enhanced models by the GA-BPSO hybrid optimizes feature band interval
AlgorithmsType identification accuracy PAO T307 T534 T534+T307 T321 T321+T307 T321+T534 T321+T534+T307 GA-BPSO-KNN 100% 50% 100% 100% 100% 100% 100% 67% GA-BPSO-RF 100% 100% 80% 100% 100% 100% 100% 100% -
[1] Wang Xinbo, Zhang Yafei, Yin Zhongwei, et al. Experimental research on tribological properties of liquid phase exfoliated graphene as an additive in SAE 10W-30 lubricating oil[J]. Tribology International, 2019, 135: 29–37. doi: 10.1016/j.triboint.2019.02.030
[2] Vrček A, Hultqvist T, Baubet Y, et al. Micro-pitting damage of bearing steel surfaces under mixed lubrication conditions: effects of roughness, hardness and ZDDP additive[J]. Tribology International, 2019, 138: 239–249. doi: 10.1016/j.triboint.2019.05.038
[3] Li Xiaowei, Xu Xiaowei, Zhou Yong, et al. Insights into friction dependence of carbon nanoparticles as oil-based lubricant additive at amorphous carbon interface[J]. Carbon, 2019, 150: 465–474. doi: 10.1016/j.carbon.2019.05.050
[4] Yang Chun, Yang Zeyu, Zhang Gong, et al. Characterization and differentiation of chemical fingerprints of virgin and used lubricating oils for identification of contamination or adulteration sources[J]. Fuel, 2016, 163: 271–281. doi: 10.1016/j.fuel.2015.09.070
[5] Jin Yongliang, Duan Haitao, Wei Lei, et al. Online infrared spectra detection of lubricating oil during friction process at high temperature[J]. Industrial Lubrication and Tribology, 2018, 70(7): 1294–1302. doi: 10.1108/ilt-09-2017-0251
[6] Rammal A, Perrin E, Vrabie V, et al. Selection of discriminant mid-infrared wavenumbers by combining a naïve Bayesian classifier and a genetic algorithm: Application to the evaluation of lignocellulosic biomass biodegradation[J]. Mathematical Biosciences, 2017, 289: 153–161. doi: 10.1016/j.mbs.2017.05.002
[7] Ding Jianhua, Fang Jianhua, Chen Boshui, et al. Improved biodegradability and tribological performances of mineral lubricating oil by two synthetic nitrogenous heterocyclic additives[J]. Industrial Lubrication and Tribology, 2019, 71(4): 578–585. doi: 10.1108/ilt-06-2018-0216
[8] Caneca A R, Pimentel M F, Galvão R K H, et al. Assessment of infrared spectroscopy and multivariate techniques for monitoring the service condition of diesel-engine lubricating oils[J]. Talanta, 2006, 70(2): 344–352. doi: 10.1016/j.talanta.2006.02.054
[9] Ng E P, Mintova S. Quantitative moisture measurements in lubricating oils by FTIR spectroscopy combined with solvent extraction approach[J]. Microchemical Journal, 2011, 98(2): 177–185. doi: 10.1016/j.microc.2011.01.006
[10] 李晓鹤,冯欣,夏延秋. 布谷鸟搜索的润滑脂特征红外光谱波段优选技术[J]. 光谱学与光谱分析, 2017, 37(12): 3703–3708 doi: 10.1016/j.aca.2012.08.028 Li Xiaohe, Feng Xin, Xia Yanqiu. IR spectra of grease optimization based on cuckoo search[J]. Spectroscopy and Spectral Analysis, 2017, 37(12): 3703–3708 doi: 10.1016/j.aca.2012.08.028
[11] Li Zhe, Feng Jinchao, Liu Pengyu, et al. Improving the spectral measurement accuracy based on temperature distribution and spectra-temperature relationship[J]. Infrared Physics & Technology, 2018, 90: 87–94. doi: 10.1016/j.infrared.2018.02.007
[12] Teye E, Elliott C, Sam-Amoah L K, et al. Rapid and nondestructive fraud detection of palm oil adulteration with Sudan dyes using portable NIR spectroscopic techniques[J]. Food Additives & Contaminants Part A, 2019, 36(11): 1589–1596. doi: 10.1080/19440049.2019.1658905
[13] Ling Yun, Lu Wei, Song Aiguo, et al. Sampling head-rock contact identification for regolith sampling in space[J]. Aerospace Science and Technology, 2013, 31(1): 108–114. doi: 10.1016/j.ast.2013.10.003
[14] de Santana F B, Neto W B, Poppi R J. Random forest as one-class classifier and infrared spectroscopy for food adulteration detection[J]. Food Chemistry, 2019, 293: 323–332. doi: 10.1016/j.foodchem.2019.04.073
[15] Xie Chuanqi, Yang Ce, He Yong. Hyperspectral imaging for classification of healthy and gray mold diseased tomato leaves with different infection severities[J]. Computers and Electronics in Agriculture, 2017, 135: 154–162. doi: 10.1016/j.compag.2016.12.015
[16] Xia Ji'an, Cao Hongxin, Yang Yuwang, et al. Detection of waterlogging stress based on hyperspectral images of oilseed rape leaves (Brassica napus L.)[J]. Computers and Electronics in Agriculture, 2019, 159: 59–68. doi: 10.1016/j.compag.2019.02.022
[17] Ozigis M S, Kaduk J D, Jarvis C H. Mapping terrestrial oil spill impact using machine learning random forest and Landsat 8 OLI imagery: a case site within the Niger Delta region of Nigeria[J]. Environmental Science and Pollution Research, 2019, 26(4): 3621–3635. doi: 10.1007/s11356-018-3824-y
[18] Georgouli K, Del Rincon J M, Koidis A. Continuous statistical modelling for rapid detection of adulteration of extra virgin olive oil using mid infrared and Raman spectroscopic data[J]. Food Chemistry, 2017, 217: 735–742. doi: 10.1016/j.foodchem.2016.09.011
[19] de Santana F B, de Souza A M, Poppi R J. Green methodology for soil organic matter analysis using a national near infrared spectral library in tandem with learning machine[J]. Science of the Total Environment, 2019, 658: 895–900. doi: 10.1016/j.scitotenv.2018.12.263
[20] Goh K M, Maulidiani M, Rudiyanto R, et al. Rapid assessment of total MCPD esters in palm-based cooking oil using ATR-FTIR application and chemometric analysis[J]. Talanta, 2019, 198: 215–223. doi: 10.1016/j.talanta.2019.01.111
[21] Li Xiaoli, Sun Chanjun, Luo Liubin, et al. Determination of tea polyphenols content by infrared spectroscopy coupled with iPLS and random frog techniques[J]. Computers and Electronics in Agriculture, 2015, 112: 28–35. doi: 10.1016/j.compag.2015.01.005
[22] Zhang Fudong, Liu Jie, Lin Jun, et al. Detection of oil yield from oil shale based on near-infrared spectroscopy combined with wavelet transform and least squares support vector machines[J]. Infrared Physics & Technology, 2019, 97: 224–228. doi: 10.1016/j.infrared.2018.12.036
[23] Hu Leqian, Yin Chunling, Ma Shuai, et al. Rapid detection of three quality parameters and classification of wine based on Vis-NIR spectroscopy with wavelength selection by ACO and CARS algorithms[J]. Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy, 2018, 205: 574–581. doi: 10.1016/j.saa.2018.07.054
[24] Yun Yonghuan, Bin Jun, Liu Dongli, et al. A hybrid variable selection strategy based on continuous shrinkage of variable space in multivariate calibration[J]. Analytica Chimica Acta, 2019, 1058: 58–69. doi: 10.1016/j.aca.2019.01.022
[25] 段小丽, 王明泉. 改进型PSO-SVM算法对井下多组分气体定量分析的研究[J]. 光谱学与光谱分析, 2019, 39(9): 2883–2888 Duan Xiaoli, Wang Mingquan. Quantitative analysis of multi-component gases in underground by improved PSO-SVM algorithm[J]. Spectroscopy and Spectral Analysis, 2019, 39(9): 2883–2888
[26] Liu Jinming, Chu Xiaodong, Wang Zhi, et al. Optimization of characteristic wavelength variables of near infrared spectroscopy for detecting contents of cellulose and hemicellulose in corn stover[J]. Spectroscopy and Spectral Analysis, 2018, 39(3): 743–750. doi: 10.3964/j.issn.1000-0593(2019)03-0743-08
[27] Zhang Yao, Li Minzan, Zheng Lihua, et al. Spectral features extraction for estimation of soil total nitrogen content based on modified ant colony optimization algorithm[J]. Geoderma, 2019, 333: 23–34. doi: 10.1016/j.geoderma.2018.07.004
[28] Arakawa M, Yamashita Y, Funatsu K. Genetic algorithm-based wavelength selection method for spectral calibration[J]. Journal of Chemometrics, 2011, 25(1): 10–19. doi: 10.1002/cem.1339
[29] Jiang Hui, Xu Weidong, Chen Quansheng. Comparison of algorithms for wavelength variables selection from near-infrared (NIR) spectra for quantitative monitoring of yeast (Saccharomyces cerevisiae) cultivations[J]. Spectrochimica Acta Part A:Molecular and Biomolecular Spectroscopy, 2019, 214: 366–371. doi: 10.1016/j.saa.2019.02.038
[30] Cai Hongwei, Li Xue, Xie Chungang, et al. Area-to-point heat conduction enhancement using binary particle swarm optimization[J]. Applied Thermal Engineering, 2019, 155: 449–460. doi: 10.1016/j.applthermaleng.2019.04.017
[31] 夏延秋, 徐大祎, 冯欣, 等. 基于极限学习机和优化算法的润滑油添加剂种类识别与含量预测[J]. 摩擦学学报, 2020, 40(1): 97–106 doi: 10.16078/j.tribology.2019107 Xia Yanqiu, Xu Dayi, Feng Xin, et al. Identification and content prediction of lubricating oil additives based on extreme learning machine[J]. Tribology, 2020, 40(1): 97–106 doi: 10.16078/j.tribology.2019107
[32] Shen Jiawei, Cheng Jia, Jiang Haizhen, et al. Study on the relationship between extreme pressure properties and component contents of high pressured sulfurized isobutylene[J]. Industrial Lubrication and Tribology, 2018, 70(3): 527–531. doi: 10.1108/ilt-07-2017-0187
-
期刊类型引用(7)
1. 贾文玉,童枫,刘恒旭,金淑芹,张永超,章晨峰,王振中,张欣,肖伟. 近红外光谱技术结合多种机器学习算法的热毒宁注射液批次潜在风险预测模型. 中国中药杂志. 2025(02): 430-438 . 百度学术
2. 贺兆南,景敏,韩亨通,刘盼,计丰,陈曼龙. 稀疏主成分分析-随机森林算法结合荧光光谱的土壤表面油种识别. 分析试验室. 2025(02): 216-222 . 百度学术
3. 夏延秋,谢培元,NAY MIN AUNG,张涛,冯欣. 改进遗传算法嵌入经典分类算法实现润滑油添加剂微小量多种类同步识别. 光谱学与光谱分析. 2024(03): 744-750 . 百度学术
4. 冯欣,夏延秋. 基于红外光谱技术智能识别润滑油的研究进展. 润滑油. 2024(01): 38-42 . 百度学术
5. Xia Yanqiu,Cui Jinwei,Xie Peiyuan,Zou Shaode,Feng Xin. Identification of Lubricating Oil Additives Using XGBoost and Ant Colony Optimization Algorithms. China Petroleum Processing & Petrochemical Technology. 2024(02): 158-167 . 必应学术
6. 夏延秋,谢培元,王裕兴,冯欣. 基于RSM-NSGAⅡ的MoDTC和ZnDDP复配的润滑性能参数优化. 石油学报(石油加工). 2024(06): 1656-1666 . 百度学术
7. 夏延秋,王裕兴,冯欣,蔡美荣. 群智能搜索在基础油性能预测模型中的优化效能. 摩擦学学报. 2023(04): 429-438 . 本站查看
其他类型引用(6)