针对互联网中文文档图像非法信息过滤提出了一种多模板匹配结合可信度分析的方法。该方法克服了传统OCR速度慢的缺点,同时改善了基于图像特征匹配方法对字体和噪音敏感的特性。通过改变关键词搜索方式有效地减小了计算量,提高了识别速度。实验结果表明了该方法的有效性。
A fast approach to searching in Chinese document images based on multiple templates matching and confidence measure is presented.The performance of new system has been significantly improved when compared to traditional OCR and image-based approach.Experimental results confirmed the validity of the proposed approach.