问题 在c#中将网页捕获为图像,确保javascript呈现的元素可见


我试图使用标准的c#.net代码捕获以下页面。我一直在寻找人们的各种方法,其中大多数方法涉及实例化浏览器对象和使用绘图到位图方法。但是,这些都不会在此页面上获取图表的内容:

http://www.highcharts.com/demo/combo-dual-axes

也许javascript没有时间运行,但添加Thread.Sleep(x)没有帮助。

这个 商业成分 正确捕获它,但我宁愿避免在我的项目中需要额外的依赖,并在其他解决方案太接近时支付150美元!

有人发现他们的解决方案正确呈现吗?


11488
2017-10-18 06:21


起源

链接相关问题: 如何在WebBrowser控件上使用DrawToBitmap修复不透明度错误?。 - Noseratio
以为我会在这里添加一条评论,因为我以后仍然会注意到这一点。最后有很多边缘情况格式化,等等,我决定我需要PDF版本。我最终使用了EssentialObjects PDF控件,非常适合此目的。 - Glinkot


答案:


你可能已经尝试过了 IECapt。我认为这是正确的方法。我创建了它的修改版本并使用了 timer 代替 Thread.Sleep 它按预期捕获您的网站。

- - - 编辑 - - -

这是丑陋的来源。只需添加引用即可 Microsoft HTML Object Library

这是用法:

HtmlCapture capture = new HtmlCapture(@"c:\temp\myimg.png");
capture.HtmlImageCapture += new HtmlCapture.HtmlCaptureEvent(capture_HtmlImageCapture);
capture.Create("http://www.highcharts.com/demo/combo-dual-axes");

void capture_HtmlImageCapture(object sender, Uri url)
{
    this.Close();
}

文件1

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using System.IO;


namespace MyIECapt
{
    public class HtmlCapture
    {
        private WebBrowser web;
        private Timer tready;
        private Rectangle screen;
        private Size? imgsize = null;

        //an event that triggers when the html document is captured
        public delegate void HtmlCaptureEvent(object sender, Uri url);

        public event HtmlCaptureEvent HtmlImageCapture;

        string fileName = "";

        //class constructor
        public HtmlCapture(string fileName)
        {
            this.fileName = fileName;

            //initialise the webbrowser and the timer
            web = new WebBrowser();
            tready = new Timer();
            tready.Interval = 2000;
            screen = Screen.PrimaryScreen.Bounds;
            //set the webbrowser width and hight
            web.Width = 1024; //screen.Width;
            web.Height = 768; // screen.Height;
            //suppress script errors and hide scroll bars
            web.ScriptErrorsSuppressed = true;
            web.ScrollBarsEnabled = false;
            //attached events
            web.Navigating +=
              new WebBrowserNavigatingEventHandler(web_Navigating);
            web.DocumentCompleted += new
              WebBrowserDocumentCompletedEventHandler(web_DocumentCompleted);
            tready.Tick += new EventHandler(tready_Tick);
        }


        public void Create(string url)
        {
            imgsize = null;
            web.Navigate(url);
        }

        public void Create(string url, Size imgsz)
        {
            this.imgsize = imgsz;
            web.Navigate(url);
        }



        void web_DocumentCompleted(object sender,
                 WebBrowserDocumentCompletedEventArgs e)
        {
            //start the timer
            tready.Start();
        }

        void web_Navigating(object sender, WebBrowserNavigatingEventArgs e)
        {
            //stop the timer   
            tready.Stop();
        }



        void tready_Tick(object sender, EventArgs e)
        {
            try
            {
                //stop the timer
                tready.Stop();

                mshtml.IHTMLDocument2 docs2 = (mshtml.IHTMLDocument2)web.Document.DomDocument;
                mshtml.IHTMLDocument3 docs3 = (mshtml.IHTMLDocument3)web.Document.DomDocument;
                mshtml.IHTMLElement2 body2 = (mshtml.IHTMLElement2)docs2.body;
                mshtml.IHTMLElement2 root2 = (mshtml.IHTMLElement2)docs3.documentElement;

                // Determine dimensions for the image; we could add minWidth here
                // to ensure that we get closer to the minimal width (the width
                // computed might be a few pixels less than what we want).
                int width = Math.Max(body2.scrollWidth, root2.scrollWidth);
                int height = Math.Max(root2.scrollHeight, body2.scrollHeight);

                //get the size of the document's body
                Rectangle docRectangle = new Rectangle(0, 0, width, height);

                web.Width = docRectangle.Width;
                web.Height = docRectangle.Height;

                //if the imgsize is null, the size of the image will 
                //be the same as the size of webbrowser object
                //otherwise  set the image size to imgsize
                Rectangle imgRectangle;
                if (imgsize == null) imgRectangle = docRectangle;
                else imgRectangle = new Rectangle() { Location = new Point(0, 0), Size = imgsize.Value };

                //create a bitmap object 
                Bitmap bitmap = new Bitmap(imgRectangle.Width, imgRectangle.Height);
                //get the viewobject of the WebBrowser
                IViewObject ivo = web.Document.DomDocument as IViewObject;

                using (Graphics g = Graphics.FromImage(bitmap))
                {
                    //get the handle to the device context and draw
                    IntPtr hdc = g.GetHdc();
                    ivo.Draw(1, -1, IntPtr.Zero, IntPtr.Zero,
                             IntPtr.Zero, hdc, ref imgRectangle,
                             ref docRectangle, IntPtr.Zero, 0);
                    g.ReleaseHdc(hdc);
                }
                //invoke the HtmlImageCapture event
                bitmap.Save(fileName);
                bitmap.Dispose();
            }
            catch 
            {
                //System.Diagnostics.Process.GetCurrentProcess().Kill();
            }
            if(HtmlImageCapture!=null) HtmlImageCapture(this, web.Url);
        }
    }
}

和File2

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Drawing;
using System.Runtime.InteropServices;

namespace MyIECapt
{
    [ComVisible(true), ComImport()]
    [GuidAttribute("0000010d-0000-0000-C000-000000000046")]
    [InterfaceTypeAttribute(ComInterfaceType.InterfaceIsIUnknown)]
    public interface IViewObject
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int Draw(
            [MarshalAs(UnmanagedType.U4)] UInt32 dwDrawAspect,
            int lindex,
            IntPtr pvAspect,
            [In] IntPtr ptd,
            IntPtr hdcTargetDev,
            IntPtr hdcDraw,
            [MarshalAs(UnmanagedType.Struct)] ref Rectangle lprcBounds,
            [MarshalAs(UnmanagedType.Struct)] ref Rectangle lprcWBounds,
            IntPtr pfnContinue,
            [MarshalAs(UnmanagedType.U4)] UInt32 dwContinue);
        [PreserveSig]
        int GetColorSet([In, MarshalAs(UnmanagedType.U4)] int dwDrawAspect,
           int lindex, IntPtr pvAspect, [In] IntPtr ptd,
            IntPtr hicTargetDev, [Out] IntPtr ppColorSet);
        [PreserveSig]
        int Freeze([In, MarshalAs(UnmanagedType.U4)] int dwDrawAspect,
                        int lindex, IntPtr pvAspect, [Out] IntPtr pdwFreeze);
        [PreserveSig]
        int Unfreeze([In, MarshalAs(UnmanagedType.U4)] int dwFreeze);
    }
}

11
2017-10-18 06:36



谢谢LB.我之前尝试过,命令行版本运行良好。然而,它需要一个系统COM DLL,所以当试图把它放在一个项目中,我无法让它编译为爱情或金钱!调用命令行是另一种选择,但不像调用方法那样令人满意。你是在项目中运行它还是只是执行命令行?如果你这样做,也许你可以分享你如何使DLL工作的片段,或者链接到一个例子?非常感谢! - Glinkot
我更新了答案。 - L.B
非常感谢LB,我很感激! - Glinkot
我正在尝试使用您的代码并将所有内容放入DLL并且我面临此错误:ActiveX控件'8856f961-340a-11d0-a96b-00c04fd705a2'无法实例化,因为当前线程不在单线程单元中。它在这里:public HtmlCapture(string fileName){this.fileName = fileName; //初始化webbrowser和计时器web = new WebBrowser(); //错误 - Academy of Programmer
当你可以表达时为什么这么粗鲁? - Alon M


Thread.Sleep 将简单地暂停您的Web浏览器正在运行的线程 - 您如何在它被暂停时呈现任何内容? :)

相反,您需要允许线程处理工作。你可以通过组合实现这一目标 Thread.Sleep(0) 和 Application.DoEvents(),具有如下内容:

DateTime finish = DateTime.Now.AddSeconds(3);
while (DateTime.Now < finish) {
    Application.DoEvents();
    Thread.Sleep(0);
}

0
2017-10-18 08:06



非常感谢 - 我怀疑这是一个因素! - Glinkot


@ L.B,谢谢你的帮助!

对于想要在类库中运行它的人来说只是一个FYI, WebBrowser需要Single Threaded Apartment,所以做这样的事情:

  var t = new Thread(InitAndDo); //InitAndDo would have your code creating the webbrowser object etc...
  t.SetApartmentState(ApartmentState.STA);
  t.Start();

然后在导航调用完成后,Gotcha,添加这行代码,以便您获得完成的导航事件:

    web.Navigate(Url);
    Application.Run();

0
2017-11-18 13:39





我为此创建了一个nuget包 https://github.com/dcumin39/RenderHighCharts/wiki


0
2017-08-21 18:46