问题 linux高内核cpu用法对内存初始化

我有一个Linux内核高CPU消耗的问题，同时在服务器上引导我的java应用程序。这个问题只发生在生产中，在开发服务器上一切都是光速。

upd9： 关于这个问题有两个问题：

怎么解决？ - 名义动物 建议同步和删除所有内容，这确实有帮助。 sudo sh -c 'sync ; echo 3 > /proc/sys/vm/drop_caches ; 作品。 upd12：但确实如此 sync 足够。
为什么会这样？ - 它对我来说仍然是开放的，我知道将durty页面刷新到磁盘会占用内核CPU和IO时间，这很正常。 但是什么是strage，为什么即使用“C”编写的单线程应用程序我在内核空间中加载所有内核100％？

由于参考upd10 和ref-upd11 我有一个想法 echo 3 > /proc/sys/vm/drop_caches 不需要用缓慢的内存分配来修复我的问题。它应该足以运行`sync' 之前开始消耗内存的应用程序。可能会在生产中尝试这个tommorow并在此发布结果。

upd10： 丢失的FS缓存页面案例：

我执行了 cat 10GB.fiel > /dev/null，然后
sync 可以肯定的是，没有durty页面（cat /proc/meminfo |grep ^Dirty 显示184kb。
检查 cat /proc/meminfo |grep ^Cached 我得到了：4GB缓存
运行 int main(char**) 我获得了正常的性能（比如50ms来初始化32MB的分配数据）。
缓存内存减少到900MB
测试总结： 我认为linux将用作FS缓存的页面回收到已分配的内存中是没有问题的。

upd11： 很多脏页案例。

项目清单
我跑了我的 HowMongoDdWorks 评论的例子 read 部分，过了一段时间
/proc/meminfo 说2.8GB是 Dirty 和3.6GB是 Cached。
我停下了 HowMongoDdWorks 并运行我的 int main(char**)。
这是结果的一部分：

init 15，时间0.00s x 0 [尝试1 /部分0]时间1.11s x 1 [尝试2 /部分0]时间0.04秒 x 0 [尝试1 /部分1]时间1.04s x 1 [尝试2 /部分1]时间0.05s x 0 [尝试1 /部分2]时间0.42秒 x 1 [尝试2 /第2部分]时间0.04秒
测试摘要：丢失的durty页面显着减慢了首次访问分配的内存（公平地说，只有当整个应用程序内存开始与整个OS内存相当时才开始发生，即如果你有16 GB的8个免费，那么就没问题了分配1GB，从3GB左右减速start）。

现在我设法在我的开发环境中重现这种情况，所以这里有新的细节。

开发机配置：

Linux 2.6.32-220.13.1.el6.x86_64 - Scientific Linux版本6.1（Carbon）
RAM：15.55 GB
CPU：1 X Intel（R）Core（TM）i5-2300 CPU @ 2.80GHz（4线程）（物理）

由FS缓存中的大量durty页面引起的问题是99.9％。这是在脏页面上创建批量的应用程序：

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.util.Random;

/**
 * @author dmitry.mamonov
 *         Created: 10/2/12 2:53 PM
 */
public class HowMongoDdWorks{
    public static void main(String[] args) throws IOException {
        final long length = 10L*1024L*1024L*1024L;
        final int pageSize = 4*1024;
        final int lengthPages = (int) (length/pageSize);
        final byte[] buffer = new byte[pageSize];
        final Random random = new Random();
        System.out.println("Init file");
        final RandomAccessFile raf = new RandomAccessFile("random.file","rw");
        raf.setLength(length);
        int written = 0;
        int readed = 0;
        System.out.println("Test started");
        while(true){
            { //write.
                random.nextBytes(buffer);
                final long randomPageLocation = (long)random.nextInt(lengthPages)*(long)pageSize;
                raf.seek(randomPageLocation);
                raf.write(buffer);
                written++;
            }
            { //read.
                random.nextBytes(buffer);
                final long randomPageLocation = (long)random.nextInt(lengthPages)*(long)pageSize;
                raf.seek(randomPageLocation);
                raf.read(buffer);
                readed++;
            }
            if (written % 1024==0 || readed%1024==0){
                System.out.printf("W %10d R %10d pages\n", written, readed);
            }

        }
    }
}

这里是测试应用程序，它导致内核空间中的HI（由所有内核高达100％）CPU负载（与下面相同，但我将再次复制它）。

#include<stdlib.h>
#include<stdio.h>
#include<time.h>

int main(char** argv){
   int last = clock(); //remember the time
   for(int i=0;i<16;i++){ //repeat test several times
      int size = 256 * 1024 * 1024;
      int size4=size/4;
      int* buffer = malloc(size); //allocate 256MB of memory
      for(int k=0;k<2;k++){ //initialize allocated memory twice
          for(int j=0;j<size4;j++){ 
              //memory initialization (if I skip this step my test ends in 
              buffer[j]=k; 0.000s
          }
          //printing 
          printf(x "[%d] %.2f\n",k+1, (clock()-last)/(double)CLOCKS_PER_SEC); stat
          last = clock();
      }
   }
   return 0;
}

虽然以前 HowMongoDdWorks 程序正在运行， int main(char** argv) 将显示如下结果：

x [1] 0.23
x [2] 0.19
x [1] 0.24
x [2] 0.19
x [1] 1.30 -- first initialization takes significantly longer
x [2] 0.19 -- then seconds one (6x times slowew)
x [1] 10.94 -- and some times it is 50x slower!!!
x [2] 0.19
x [1] 1.10
x [2] 0.21
x [1] 1.52
x [2] 0.19
x [1] 0.94
x [2] 0.21
x [1] 2.36
x [2] 0.20
x [1] 3.20
x [2] 0.20 -- and the results is totally unstable
...

我保留在这条线以下的所有东西只是为了历史。

upd1：开发和生产系统都是这项测试的重中之重。 upd7：它不是分页，至少我在问题时间没有看到任何存储IO活动。

dev~4核心，16 GM RAM，~8 GB免费
生产~12核，24 GB RAM，大约16 GB免费（从8到10 GM在FS Cache下，但没有差异，即使所有16GM都是完全免费的，结果相同），这台机器也是由CPU加载的，但不要太高~10％。

upd8（REF）： 新的测试用例和潜在的解释见尾。

这是我的测试用例（我也测试了java和python，但“c”应该最清楚）：

#include<stdlib.h>
#include<stdio.h>
#include<time.h>

int main(char** argv){
   int last = clock(); //remember the time
   for(int i=0;i<16;i++){ //repeat test several times
      int size = 256 * 1024 * 1024;
      int size4=size/4;
      int* buffer = malloc(size); //allocate 256MB of memory
      for(int k=0;k<2;k++){ //initialize allocated memory twice
          for(int j=0;j<size4;j++){ 
              //memory initialization (if I skip this step my test ends in 
              buffer[j]=k; 0.000s
          }
          //printing 
          printf(x "[%d] %.2f\n",k+1, (clock()-last)/(double)CLOCKS_PER_SEC); stat
          last = clock();
      }
   }
   return 0;
}

dev机器上的输出（部分）：

x [1] 0.13 --first initialization takes a bit longer
x [2] 0.12 --then second one, but the different is not significant.
x [1] 0.13
x [2] 0.12
x [1] 0.15
x [2] 0.11
x [1] 0.14
x [2] 0.12
x [1] 0.14
x [2] 0.12
x [1] 0.13
x [2] 0.12
x [1] 0.14
x [2] 0.11
x [1] 0.14
x [2] 0.12 -- and the results is quite stable
...

生产机器上的输出（部分）：

x [1] 0.23
x [2] 0.19
x [1] 0.24
x [2] 0.19
x [1] 1.30 -- first initialization takes significantly longer
x [2] 0.19 -- then seconds one (6x times slowew)%

问题 linux高内核cpu用法对内存初始化

答案:

热门问题