只输出匹配的模式--正则表达式的一个应用
先看需要匹配的文件需求: 如下为防火墙日志,其中有字段service=http proto=6,如何输出service=http这个字段呢,service字段有可能包含好几个空格,并且不确定究竟会有几个,但是后面的字段肯定是proto,如何用awk模式匹配输出service这个字段呢? [dsadm@dataStage test]$ more sedonly.txt 2011-09-30 00:00:20 Local0.Notice 10.2.0.254 ns50: NetScreen device_id=0019022004000299 [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 polic y_id=103 <span style="font-size:18px;">service=http proto=6</span> src zone=Trust dst zone=Untrust action=Permit sent=2683 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4048 dst_port=80 src-xlated ip=218.206.244.202 port=467 9 dst-xlated ip=119.188.11.3 port=80 session_id=61727 reason=Close - AGE OUT<000> 2011-09-30 00:00:20 Local0.Notice 10.2.0.254 ns50: NetScreen device_id=0019022004000299 [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 poli cy_id=103 <span style="font-size:18px;">service=NETBIOS (NS) proto=17 </span>src zone=Trust dst zone=Untrust action=Permit sent=2674 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4045 dst_port=137 src-xlated ip=218.206.244.2 02 port=15311 dst-xlated ip=119.188.11.3 port=137 session_id=62271 reason=Close - AGE OUT<000> 2011-09-30 00:00:20 Local0.Notice 10.2.0.254 ns50: NetScreen device_id=0019022004000299 [Root]system-notification-00257(traffic): start_time="2011-09-30 00:01:05" duration=15 poli cy_id=103 <span style="font-size:18px;">service=VDO Live (tcp) proto=6</span> src zone=Trust dst zone=Untrust action=Permit sent=2645 rcvd=766 src=10.100.1.43 dst=119.188.11.3 src_port=4044 dst_port=7001 src-xlated ip=218.206.244 .202 port=14295 dst-xlated ip=119.188.11.3 port=7001 session_id=59240 reason=Close - AGE OUT<000> [dsadm@dataStage test]$ --解决方法 [dsadm@dataStage test]$ grep -Po 'service=.*(?= proto=)' sedonly.txt service=http service=NETBIOS (NS) service=VDO Live (tcp) [dsadm@dataStage test]$ sed -s 's/^.*(service=.*) proto=.*$/1/' sedonly.txt service=http service=NETBIOS (NS) service=VDO Live (tcp) [dsadm@dataStage test]$ awk -F 'proto|service' '{print "service"$2}' sedonly.txt service=http service=NETBIOS (NS) service=VDO Live (tcp) [dsadm@dataStage test]$
转自:http://bbs.chinaunix.net/thread-4132203-1-1.html
一下是我的一个需求 文件类似于下面这样,只取了前面一点 [dsadm@dataStage findjob]$ more alljob.xml <?xml version="1.0" encoding="utf-8"?><FindQuerySessionAsyncStateSerialiser xmlns:ibm="http://www.ibm.com/" clientInstallPath_="D:IBM_IISClientsClassic" generatedDate_="2014年5月29日" gener atedTime_="11:06:48" serverName_="DATASTAGE" serverVersion_="8.7"><criteria_><caseInsensitive_>1</caseInsensitive_><createdAfter_ /><createdBefore_ /><createdByUser_ /><DependsOnObjects /><des cription_ /><findWithinLastResultSet_>0</findWithinLastResultSet_><lastModifiedAfter_ /><lastModifiedBefore_ /><lastModifiedByUser_ /><name_>*</name_><nameDescriptionMatchMode_>NameOrDescripti on</nameDescriptionMatchMode_><repositoryName_>lscrm</repositoryName_><folder_>&;/folder_><Types><string>Parallel Jobs</string></Types><WhereUsedObjects /></criteria_><Results><ReposObjectSeri aliser><className_>CJobDefn</className_><displayName_>CT_ENT_DIST_MAXLNBAL</displayName_><folderPath_>JobsCRM_03_ENTCRM_0303_ENT_CTCRM_030303_ENT_CT_DIST</folderPath_><isTopLevel_>1</isTop Level_><id_>CT_ENT_DIST_MAXLNBAL</id_><platformType_ /><reposID_>c2e76d84.43058877.2174cfdoj.l4f87r0.76hjj8.unm720lidv156as11jdb5</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><s ubType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>CopyOfIFS I_CURTRAN</displayName_><folderPath_>作业 001_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>CopyOfIFSI_CURTRAN</id_><platformType_ /><reposID_>c 2e76d84.43058877.2174ce5cg.e9a93n8.dq7mt3.rilur196dttfpvk1ipaj6</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</type DefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>CopyOfIFSI_DEPTRAN</displayName_><folderPath_>作业 001_ODS 0011_ODS_账户 信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>CopyOfIFSI_DEPTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174cesld.fcdckp0.c4dm26.ogq04coo9cs4681ed5me0</r eposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSeriali ser><className_>CJobDefn</className_><displayName_>IFSI_CARDTRAN</displayName_><folderPath_>作业 001_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><i d_>IFSI_CARDTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b296p.aipqg68.3gs1oe.6id3oi6ifaunehhjd59tl</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subTy pe_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_CURTRAN</displayName _><folderPath_>作业 001_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_CURTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2970.2 r9jn6g.cqvmdf.r4521aevg2eh084hd8pgv</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></Rep osObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_DEPTRAN</displayName_><folderPath_>作业 001_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folde rPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_DEPTRAN</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2975.fj9jtmg.e8e747.3j81nbfj2eob0vlonomg5</reposID_><reposManagerID_>DATASTAGE:lscrm </reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><di splayName_>IFSI_INTBANKTRAN</displayName_><folderPath_>作业 001_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_INTBANKTRAN</id_><platformTyp e_ /><reposID_>c2e76d84.43058877.2174b2979.5tf23gg.4f527i.niesna07c63s112uhkt15</reposID_><reposManagerID_>DATASTAGE:lscrm</reposManagerID_><subType_>3</subType_><typeDefinitionDisplayName_>Pa rallel Job</typeDefinitionDisplayName_></ReposObjectSerialiser><ReposObjectSerialiser><className_>CJobDefn</className_><displayName_>IFSI_INTBANKTRAN_PAYFEE</displayName_><folderPath_>作业 0 01_ODS 0011_ODS_账户信息 0012_ODS_账户交易信息</folderPath_><isTopLevel_>1</isTopLevel_><id_>IFSI_INTBANKTRAN_PAYFEE</id_><platformType_ /><reposID_>c2e76d84.43058877.2174b2979.sng5q9g.67533 我要取<id_>IFSI_INTBANKTRAN_PAYFEE</id_>里面的信息,在文件中大概有两百个 我的处理方法 [dsadm@dataStage findjob]$ sed -s 's/^.*(<id_>.*</id_>).*$/1/g' alljob.xml 只取到一个 -- [dsadm@dataStage findjob]$ awk -F '<id_>|</id_>' '{print $2}' alljob.xml ---- [dsadm@dataStage findjob]$ sed -s 's/^.*<id_>(.*)</id_>.*$/1/g' alljob.xml 还是值取到一个 why??????????????
--我现在改一下文件的样式成标准XML [dsadm@dataStage findjob]$ more all.xml <?xml version="1.0" encoding="utf-8"?> <FindQuerySessionAsyncStateSerialiser xmlns:ibm="http://www.ibm.com/" clientInstallPath_="D:IBM_IISClientsClassic" generatedDate_="2014年5月29日" generatedTime_="11:06:48" serverName_="DATA STAGE" serverVersion_="8.7"> <criteria_> <caseInsensitive_>1</caseInsensitive_> <createdAfter_ /> <createdBefore_ /> <createdByUser_ /> <DependsOnObjects /> <description_ /> <findWithinLastResultSet_>0</findWithinLastResultSet_> <lastModifiedAfter_ /> <lastModifiedBefore_ /> <lastModifiedByUser_ /> <name_>*</name_> <nameDescriptionMatchMode_>NameOrDescription</nameDescriptionMatchMode_> <repositoryName_>lscrm</repositoryName_> <folder_>&;/folder_> <Types> <string>Parallel Jobs</string> </Types> <WhereUsedObjects /> </criteria_> <Results> <ReposObjectSerialiser> <className_>CJobDefn</className_> <displayName_>CT_ENT_DIST_MAXLNBAL</displayName_> <folderPath_>JobsCRM_03_ENTCRM_0303_ENT_CTCRM_030303_ENT_CT_DIST</folderPath_> <isTopLevel_>1</isTopLevel_> <id_>CT_ENT_DIST_MAXLNBAL</id_> <platformType_ /> <reposID_>c2e76d84.43058877.2174cfdoj.l4f87r0.76hjj8.unm720lidv156as11jdb5</reposID_> <reposManagerID_>DATASTAGE:lscrm</reposManagerID_> <subType_>3</subType_> <typeDefinitionDisplayName_>Parallel Job</typeDefinitionDisplayName_> </ReposObjectSerialiser> <ReposObjectSerialiser> 使用命令 awk -F '<id_>|</id_>' '{print $2}' all.xml 每个隔了很多空格,把空格去掉 awk -F '<id_>|</id_>' '{print $2}' all.xml |sed '/^$/d' OK -- [dsadm@dataStage findjob]$ sed -n 's/<id_>(.*)</id_>/1/p' all.xml |wc -l 注意: 不加-n 和 p的话,每行朝阳输出,匹配的行被替换 只加-n的话,无输出 只有加上-n和p,才打印了我想要的!!
-- grep -Po '<id_>.*</id_>' all.xml
打印如下 <id_>SCORE_PLAN_ZB</id_> <id_>SPECIAL_SHOP</id_> <id_>REPORT01_SCORE_MSOURCE</id_> <id_>REPORT02_SCORE_QSOURCE</id_> <id_>REPORT03_SCORE_YSOURCE</id_> <id_>REPORT11_SCORE_MCARDORG</id_> <id_>REPORT12_SCORE_QCARDORG</id_> <id_>REPORT13_SCORE_YCARDORG</id_> <id_>REPORT21_SCORE_MCUSTORG</id_> <id_>REPORT22_SCORE_QCUSTORG</id_> <id_>REPORT23_SCORE_YCUSTORG</id_> <id_>REPORT41_SCORE_PART</id_> <id_>REPORT51_SCORE_CONVERGIFT</id_> 修改如下 [dsadm@dataStage findjob]$ grep -Po '<id_>.*</id_>' all.xml |sed 's/<id_>//'|sed 's/</id_>//' sed:-e 表达式 #1,字符 10:“s”的未知选项 [dsadm@dataStage findjob]$ grep -Po '<id_>.*</id_>' all.xml |sed 's/<id_>//'|sed 's/</id_>//' CT_ENT_DIST_MAXLNBAL CopyOfIFSI_CURTRAN CopyOfIFSI_DEPTRAN IFSI_CARDTRAN IFSI_CURTRAN IFSI_DEPTRAN
??
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |