对于中英文特殊符号的匹配过滤,在少量的文本查找时,使用正则表达式方法简单方便,但是面对大文本的处理则性能瓶颈将导致其无法实用。此时使用逐字符替换方法则是速度性能最快的。
下面为PHP实现的中英文字符过滤方法,仅供参考。
1. PHP实现中文特殊字符转换与英文字符过滤
XML/HTML代码
- /*
- 中英文特殊字符转换与过滤
- */
- function clear_punctuation($str)
- {
- $arr = array(
- '0' => '0', '1' => '1', '2' => '2', '3' => '3', '4' => '4',
- '5' => '5', '6' => '6', '7' => '7', '8' => '8', '9' => '9',
- 'A' => 'A', 'B' => 'B', 'C' => 'C', 'D' => 'D', 'E' => 'E',
- 'F' => 'F', 'G' => 'G', 'H' => 'H', 'I' => 'I', 'J' => 'J',
- 'K' => 'K', 'L' => 'L', 'M' => 'M', 'N' => 'N', 'O' => 'O',
- 'P' => 'P', 'Q' => 'Q', 'R' => 'R', 'S' => 'S', 'T' => 'T',
- 'U' => 'U', 'V' => 'V', 'W' => 'W', 'X' => 'X', 'Y' => 'Y',
- 'Z' => 'Z', 'a' => 'a', 'b' => 'b', 'c' => 'c', 'd' => 'd',
- 'e' => 'e', 'f' => 'f', 'g' => 'g', 'h' => 'h', 'i' => 'i',
- 'j' => 'j', 'k' => 'k', 'l' => 'l', 'm' => 'm', 'n' => 'n',
- 'o' => 'o', 'p' => 'p', 'q' => 'q', 'r' => 'r', 's' => 's',
- 't' => 't', 'u' => 'u', 'v' => 'v', 'w' => 'w', 'x' => 'x',
- 'y' => 'y', 'z' => 'z',
- '(' => '', ')' => '', '〔' => '', '〕' => '', '【' => '',
- '】' => '', '〖' => '', '〗' => '', '“' => '', '”' => '',
- '‘' => '', '’' => '', '{' => '', '}' => '', '《' => '',
- '》' => '',
- '%' => '', '+' => '', '—' => '', '-' => '', '~' => '',
- ':' => '', '。' => '', '、' => '', ',' => '', '、' => '',
- ';' => '', '?' => '', '!' => '', '…' => '', '‖' => '',
- '”' => '', '’' => '', '‘' => '', '|' => '', '〃' => '',
- ' ' => '', '$'=>'', '@'=>'', '#'=>'', '^'=>'', '&'=>'', '*'=>'',
- '(' => '', ')' => '', '[' => '', ']' => '', '`' => '', '{' => '', '~' => '',
- '}' => '', '<' => '', '>' => '', '%' => '', '+' => '', '-' => '', ':' => '',
- '.' => '', ';' => '', '?' => '', '!' => '', '|' => '', '$' => '', '@' => '',
- '#' => '', '^' => '', '&' => '', '*' => '', '\' => '','"' => '', ''' => '',
- '=' => '', '/' => '', ' ' => ''
- );
- return strtr($str, $arr);
- }
2. PHP中文双字节字符转换为英文字符
XML/HTML代码
- /*
- 中文字符转换为英文字符
- */
- function make_semiangle($str)
- {
- $arr = array('0' => '0', '1' => '1', '2' => '2', '3' => '3', '4' => '4',
- '5' => '5', '6' => '6', '7' => '7', '8' => '8', '9' => '9',
- 'A' => 'A', 'B' => 'B', 'C' => 'C', 'D' => 'D', 'E' => 'E',
- 'F' => 'F', 'G' => 'G', 'H' => 'H', 'I' => 'I', 'J' => 'J',
- 'K' => 'K', 'L' => 'L', 'M' => 'M', 'N' => 'N', 'O' => 'O',
- 'P' => 'P', 'Q' => 'Q', 'R' => 'R', 'S' => 'S', 'T' => 'T',
- 'U' => 'U', 'V' => 'V', 'W' => 'W', 'X' => 'X', 'Y' => 'Y',
- 'Z' => 'Z', 'a' => 'a', 'b' => 'b', 'c' => 'c', 'd' => 'd',
- 'e' => 'e', 'f' => 'f', 'g' => 'g', 'h' => 'h', 'i' => 'i',
- 'j' => 'j', 'k' => 'k', 'l' => 'l', 'm' => 'm', 'n' => 'n',
- 'o' => 'o', 'p' => 'p', 'q' => 'q', 'r' => 'r', 's' => 's',
- 't' => 't', 'u' => 'u', 'v' => 'v', 'w' => 'w', 'x' => 'x',
- 'y' => 'y', 'z' => 'z',
- '(' => '(', ')' => ')', '〔' => '[', '〕' => ']', '【' => '[',
- '】' => ']', '〖' => '[', '〗' => ']', '“' => '[', '”' => ']',
- '‘' => '[', '’' => ']', '{' => '{', '}' => '}', '《' => '<',
- '》' => '>',
- '%' => '%', '+' => '+', '—' => '-', '-' => '-', '~' => '-',
- ':' => ':', '。' => '.', '、' => '\', ',' => '.', '、' => '.',
- ';' => ';', '?' => '?', '!' => '!', '…' => '-', '‖' => '|',
- '”' => '"', '’' => '`', '‘' => '`', '|' => '|', '〃' => '"',
- ' ' => ' ', '$'=>'$', '@'=>'@', '#'=>'#', '^'=>'^', '&'=>'&', '*'=>'*');
- //foreach($arr as $k=>$v)
- //echo $v;
- return strtr($str, $arr);
- }
3. PHP正则表达式过滤英文标点符号
XML/HTML代码
- $pattern = "/[ '.,:;*?~`!@#$%^&+=-)(<>{}]|]|[|/|\|"||/";
- $content = preg_replace($pattern, '', $content); //英文符号过滤
字符过滤起来很有意思的。。。很考验脑力。