问题 生锈与c的表现


我想学习一些关于生锈任务的知识,所以我做了蒙特卡罗计算 PI。现在我的难题是单线程C版本快4倍的原因 比4路螺纹Rust版本。显然,我做错了什么,或者我的心理表现模型已经过时了。

这是C版本:

#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h>

#define PI 3.1415926535897932

double monte_carlo_pi(int nparts)
{
    int i, in=0;
    double x, y;
    srand(getpid());

    for (i=0; i<nparts; i++) {
        x = (double)rand()/(double)RAND_MAX;
        y = (double)rand()/(double)RAND_MAX;

            if (x*x + y*y < 1.0) {
            in++;
        }
    }

    return in/(double)nparts * 4.0;
}

int main(int argc, char **argv)
{
    int nparts;
    double mc_pi;

    nparts = atoi(argv[1]);
    mc_pi = monte_carlo_pi(nparts);
    printf("computed: %f error: %f\n", mc_pi, mc_pi - PI);
}

Rust版本不是逐行端口:

use std::rand;
use std::rand::distributions::{IndependentSample,Range};

fn monte_carlo_pi(nparts: uint ) -> uint {
    let between = Range::new(0f64,1f64);
    let mut rng = rand::task_rng();
    let mut in_circle = 0u;
    for _ in range(0u, nparts) {
        let a = between.ind_sample(&mut rng);
    let b = between.ind_sample(&mut rng);

    if a*a + b*b <= 1.0 {
        in_circle += 1;
    }
    }
    in_circle
}

fn main() {
    let (tx, rx) = channel();

    let ntasks = 4u;
    let nparts = 100000000u; /* I haven't learned how to parse cmnd line args yet!*/
    for _ in range(0u, ntasks) {
        let child_tx = tx.clone();
        spawn(proc() {
        child_tx.send(monte_carlo_pi(nparts/ntasks));
        });
    }

    let result = rx.recv() + rx.recv() + rx.recv() + rx.recv();

    println!("pi is {}", (result as f64)/(nparts as f64)*4.0);
}

构建和计算C版本:

$ clang -O2 mc-pi.c -o mc-pi-c; time ./mc-pi-c 100000000
computed: 3.141700 error: 0.000108
./mc-pi-c 100000000  1.68s user 0.00s system 99% cpu 1.683 total

构建和修改Rust版本:

$ rustc -v      
rustc 0.12.0-nightly (740905042 2014-09-29 23:52:21 +0000)
$ rustc --opt-level 2 --debuginfo 0 mc-pi.rs -o mc-pi-rust; time ./mc-pi-rust  
pi is 3.141327
./mc-pi-rust  2.40s user 24.56s system 352% cpu 7.654 tota

6942
2017-10-09 14:04


起源

不要使用调试符号进行编译。 - AndyG
the single-threaded C version is 4 times slower than the 4-way threaded Rust version。你发布的数字似乎是另一种方式 - user703016
@RobLatham这里的瓶颈可能是随机数生成器。尝试 rand::XorShiftRng::new_unseeded() 代替 rand::task_rng() 用于更快的随机发生器。 - Dogbert
您可以创建随机种子 XorShiftRng,例如 let mut rng: XorShiftRng = rand::random(); (速度的提高来自算法的变化,而不是缺乏播种)。 - huon
现在更像是它。在4核系统上,四向螺纹生锈比单螺纹C版快4.5倍。如果他愿意写下来,我会接受Dogbert的回答,或者我会在几天内自我回答。 - Rob Latham


答案:


正如Dogbert所观察到的那样,瓶颈是随机数发生器。这是一个在每个线程上快速播种的方法

fn monte_carlo_pi(id: u32, nparts: uint ) -> uint {
    ...
    let mut rng: XorShiftRng = SeedableRng::from_seed([id,id,id,id]);
    ...
}

11
2017-10-18 01:42