正则表达式初探(Java String regex Grok)
前言什么是正则表达式?不同的网站的解释略有差别。在此我引用 wikipedia 的版本:In theoretical computer science and formal language theory,a regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern,mainly for use in pattern matching with strings,or string matching,i.e. “find and replace”-like operations. 直译过来就是:一个字符的序列,它定义了一个搜索模式 很多编程语言内置了regex ( regular expression 的缩写 ) 的功能(都是一些大神写的算法,我们凡人学会使用就行了),不同的语言在语法定义上略有不同。我初次学习正则表达式,是基于 java 的正则表达式。 来几个有用的网址。 Java 正则表达式中文学习网站 talk is cheap,show me the codeString 的 regexString 有 4 个方法用到了 regex : matches( ),split( ),replaceFirst( ),replaceAll( ) package regextest;
public class RegexTestStrings
{
public final static String EXAMPLE_TEST =
"This is my small example string which I'm going to use for pattern matching .";
public static void main(String[] args)
{
// 判断是否是:第一个字符是‘word字符’的字符串
System.out.println(EXAMPLE_TEST.matches("w.*"));
// 用 white spaces 拆开字符串,返回拆开后的String数组
String[] splitString = (EXAMPLE_TEST.split("s+"));
System.out.println(splitString.length);
for (String string : splitString)
{
System.out.println(string);
}
// 把符合正则式"s+"的字符串,全部替换成"才"
System.out.println(EXAMPLE_TEST.replaceFirst("s+","才"));
// 把符合正则式"s+"的字符串,全部替换成"才"
System.out.println(EXAMPLE_TEST.replaceAll("s+","才"));
}
}
输出结果: true
15
This
is
my
small
example
string
which
I'm
going
to
use
for
pattern
matching
.
This才is my small example string which I'm going to use for pattern matching .
This才is才my才small才example才string才which才I'm才going才to才use才for才pattern才matching才.
java. util. regeximport java.util.regex.Matcher 和 java.util.regex.Pattern,里面有很多方法可以用 package regextest;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main(String[] args)
{
String line = "The price for iPhone is 5288,which is a little expensive.";
// 提取字符串中的唯一的数字,圆括号是用来分组的, ^ 是“取反”的意思
String regex = "(.*[^d])(d+)(.*)";
// 创建 Pattern 对象
Pattern pattern = Pattern.compile(regex);
// 创建 matcher 对象
Matcher mather = pattern.matcher(line);
if (mather.find())
{
System.out.println("Found value: " + mather.group(2));
}
else
{
System.out.println("NO MATCH");
}
}
}
输出结果: Found value: 5288
grok 更加强大的 regex在 Matcher,Pattern 的基础上, import 了很多包;进行了升级,可以调用的方法更多,更加强大。 import com.google.code.regexp.Matcher;
import com.google.code.regexp.Pattern;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import org.apache.commons.lang3.StringUtils;
某网站对Grok的定义: Java Grok program is a great tool for parsing log data and program output. You can match any number of complex patterns on any number of inputs (processes and files) and have custom reactions. 一个简单的例子:从日志文件中读取数据,提取想要的信息:一是时间,二是来源IP 输入: Mon Nov 9 06:47:33 2015; UDP; eth1; 461 bytes; from 88.150.240.169:tag-pm to 123.40.222.170:sip
Mon Nov 9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:49208 to 123.40.222.170:sip
Mon Nov 9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:54159 to 123.40.222.170:sip
Mon Nov 9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:53640 to 123.40.222.170:sip
Mon Nov 9 06:47:34 2015; UDP; eth1; 463 bytes; from 88.150.240.169:52483 t
package com.yz.utils.grok.api;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
public class GrokTest
{
public static void main(String[] args)
{
FileInputStream fiStream = null;
InputStreamReader iStreamReader = null;
BufferedReader bReader = null;
//用于包装InputStreamReader,提高处理性能。因为BufferedReader有缓冲的,而InputStreamReader没有。
try
{
String line = "";
// 从文件系统中的某个文件中获取字节
fiStream = new FileInputStream("C:dev1javagrokjavagrokiptraf_eth1_15.06.11");
// InputStreamReader 是字节流通向字符流的桥梁
iStreamReader = new InputStreamReader(fiStream);
// 从字符输入流中读取文件中的内容,封装了一个new InputStreamReader的对象
bReader = new BufferedReader(iStreamReader);
Grok grok = new Grok();
// Grok 提供了很多现成的pattern,可以直接拿来用。用已有的pattern,来构成新的pattern。
grok.addPatternFromFile("c:dev1cloudshieldpatternspatterns");
grok.addPattern("fromIP","%{IPV4}");
// compile 一个 pattern,期间我被空格坑了一下
grok.compile(".*%{MONTH}s+%{MONTHDAY}s+%{TIME}s+%{YEAR}.*%{fromIP}.* to 123.40.222.170:sip");
Match match = null;
while((line = bReader.readLine()) != null) // 注意这里的括号,被坑了一次
{
match = grok.match(line);
match.captures();
if(!match.isNull())
{
System.out.print(match.toMap().get("YEAR").toString() + " ");
System.out.print(match.toMap().get("MONTH").toString() + " ");
System.out.print(match.toMap().get("MONTHDAY").toString() + " ");
System.out.print(match.toMap().get("TIME").toString() + " ");
System.out.print(match.toMap().get("fromIP").toString() + "n");
}
else
{
System.out.println("NO MATCH");
}
}
}
catch (FileNotFoundException fnfe)
{
System.out.println("file not found exception");
fnfe.printStackTrace();
}
catch (IOException ioe)
{
System.out.println("input/output exception");
ioe.printStackTrace();
}
catch (Exception e)
{
System.out.println("unknown exception");
e.printStackTrace();
}
finally
{
try
{
bReader.close();
iStreamReader.close();
fiStream.close();
}
catch(IOException ioe)
{
System.out.println("input/output exception");
ioe.printStackTrace();
}
}
}
}
输出: 2015 Nov 9 06:47:33 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
2015 Nov 9 06:47:34 88.150.240.169
NO MATCH
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |