delphi – 来自TWebBrowser的HTML源代码 – 如何检测流编码?
基于这个问题:
How can I get HTML source code from TWebBrowser
如果我使用具有Unicode代码页的html页面运行this code,则结果是乱码,因为在D7中TStringStream不是Unicode.页面可能是UTF8编码或其他(Ansi)代码页编码. 如何检测TStream / IPersistStreamInit是否为Unicode / UTF8 / Ansi? 我如何始终为此函数返回正确的WideString结果? function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString; 如果我用TMemoryStream替换TStringStream,并将TMemoryStream保存到文件中就可以了.它可以是Unicode / UTF8 / Ansi.但我总是希望以WideString的形式返回流: function GetWebBrowserHTML(const WebBrowser: TWebBrowser): WideString; var // LStream: TStringStream; LStream: TMemoryStream; Stream : IStream; LPersistStreamInit : IPersistStreamInit; begin if not Assigned(WebBrowser.Document) then exit; // LStream := TStringStream.Create(''); LStream := TMemoryStream.Create; try LPersistStreamInit := WebBrowser.Document as IPersistStreamInit; Stream := TStreamAdapter.Create(LStream,soReference); LPersistStreamInit.Save(Stream,true); // result := LStream.DataString; LStream.SaveToFile('c:testtest.txt'); // test only - file is ok Result := ??? // WideString finally LStream.Free(); end; end; 编辑:我发现这篇文章 – How to load and save documents in TWebBrowser in a Delphi-like way 这完全符合我的需要.但它仅适用于Delphi Unicode编译器(D2009).阅读Conclusion部分:
魔术显然是在TEncoding类(TEncoding.GetBufferEncoding)中.但是D7没有TEncoding.有任何想法吗? 解决方法
我使用
GpTextStream来处理转换(应该适用于所有Delphi版本):
function GetCodePageFromHTMLCharSet(Charset: WideString): Word; const WIN_CHARSET = 'windows-'; ISO_CHARSET = 'iso-'; var S: string; begin Result := 0; if Charset = 'unicode' then Result := CP_UNICODE else if Charset = 'utf-8' then Result := CP_UTF8 else if Pos(WIN_CHARSET,Charset) <> 0 then begin S := Copy(Charset,Length(WIN_CHARSET) + 1,Maxint); Result := StrToIntDef(S,0); end else if Pos(ISO_CHARSET,Charset) <> 0 then // ISO-8859 (e.g. iso-8859-1: => 28591) begin S := Copy(Charset,Length(ISO_CHARSET) + 1,Maxint); S := Copy(S,Pos('-',S) + 1,2); if S = '15' then // ISO-8859-15 (Latin 9) Result := 28605 else Result := StrToIntDef('2859' + S,0); end; end; function GetWebBrowserHTML(WebBrowser: TWebBrowser): WideString; var LStream: TMemoryStream; Stream: IStream; LPersistStreamInit: IPersistStreamInit; TextStream: TGpTextStream; Charset: WideString; Buf: WideString; CodePage: Word; N: Integer; begin Result := ''; if not Assigned(WebBrowser.Document) then Exit; LStream := TMemoryStream.Create; try LPersistStreamInit := WebBrowser.Document as IPersistStreamInit; Stream := TStreamAdapter.Create(LStream,soReference); if Failed(LPersistStreamInit.Save(Stream,True)) then Exit; Charset := (WebBrowser.Document as IHTMLDocument2).charset; CodePage := GetCodePageFromHTMLCharSet(Charset); N := LStream.Size; SetLength(Buf,N); TextStream := TGpTextStream.Create(LStream,tsaccRead,[],CodePage); try N := TextStream.Read(Buf[1],N * SizeOf(WideChar)) div SizeOf(WideChar); SetLength(Buf,N); Result := Buf; finally TextStream.Free; end; finally LStream.Free(); end; end; (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |