提出了一种基于经验模式分解(empirical mode decomposition,简称EMD)的汉字字体识别方法.通过对大量汉字字体的研究比较,选取了能反映汉字字体基本特征的8种基本笔画.以这8种汉字笔画为模板,在汉字文档图像块中随机地抽取笔画信息,形成笔画特征序列.通过对笔画特征序列作EMD分解,提取每个笔画特征序列的高频能量,并结合汉字文档图像块的平均灰度,形成字体识别的一个9维特征.
This paper gives a novel approach to recognize Chinese fonts based on Empirical Mode Decomposition (EMD). By analyzing and comparing a great number of Chinese characters, 8 basic strokes are selected to characterize the structural attributes of Chinese fonts. Based on them, stroke feature sequences of each text block are calculated. Once decomposed by EMD, their first two intrinsic mode functions (IMFs), which are of the highest frequencies, are used to calculate the stroke energy of all the 8 basic strokes, forming the average of the energy of the two IMFs over the length of the sequence. To distinguish bold fonts from their regular fonts, average of the pixel's gray levels of the text is calculated and appended to the feature vector to form a 9 dimensional feature. Finally, the minimum distance classifier is used to recognize the fonts. Experiments show encouraging recognition rates.