自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓

鴻蒙開發(fā)者社區(qū)

WOT技術大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考華為認證廠商認證 IT技術 PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設置退出

淺嘗.NET 4并行計算效率已大幅提升

作者：Elvin Chen 2010-04-21 09:23:09

開發(fā) 后端

.NET 4中關于并行計算方面的提升，讓人激動。畢竟這是代表未來開發(fā)技術發(fā)展的一個方向。本文也將為大家介紹這方面的內(nèi)容。

隨著Visual Studio 2010正式版的發(fā)布，我們已經(jīng)可以用上.NET 4的所有功能。那么對于并行計算的嘗試，是本文的重點。51CTO向您推薦《F#函數(shù)式編程語言》，以便于您全方位了解.NET 4中不同的部分。

我們都知道CPU的性能至關重要，但主頻已經(jīng)越來越難以提升，縱向發(fā)展受限的情況下，橫向發(fā)展成為必然——核心數(shù)開始越來越多。然而多核心的利用、并行計算一直是編程中的難題，大的不說，就說代碼的編寫，程序員大多都有過痛苦的經(jīng)歷：多線程的程序代碼量大，編寫復雜，容易出錯，并且實際運行效率是否理想也較難保證。

為改善這種狀況，.NET 4.0中引入了 TPL(任務并行庫)，關于TPL，MSDN的簡介是：

任務并行庫 (TPL) 的設計是為了能更簡單地編寫可自動使用多處理器的托管代碼。使用該庫，您可以非常方便地用現(xiàn)有序列代碼表達潛在并行性，這樣序列代碼中公開的并行任務將會在所有可用的處理器上同時運行。通常這會大大提高速度。

簡而言之，TPL提供了一系列的類庫，可以使編寫并行運算的代碼更簡單和方便。

說起來很簡單，我們來看點例子：

void ThreadpoolMatrixMult(int size, double[,] m1, double[,] m2,   
     double[,] result)  
 {  
   int N = size;                             
int P = 2 * Environment.ProcessorCount; // assume twice the procs for   
                                       // good work distribution  
   int Chunk = N / P;                  // size of a work chunk  
   AutoResetEvent signal = new AutoResetEvent(false);   
   int counter = P;                      // use a counter to reduce   
                                        // kernel transitions      
  for (int c = 0; c < P; c++) {         // for each chunk  
    ThreadPool.QueueUserWorkItem(delegate(Object o)  
    {  
      int lc = (int)o;  
  for (int i = lc * Chunk;           // iterate through a work chunk  
     i < (lc + 1 == P ? N : (lc + 1) * Chunk); // respect upper   
                                               // bound  
           i++) {  
        // original inner loop body  
        for (int j = 0; j < size; j++) {  
          result[i, j] = 0;  
          for (int k = 0; k < size; k++) {  
            result[i, j] += m1[i, k] * m2[k, j];  
          }  
        }  
      }  
   if (Interlocked.Decrement(ref counter) == 0) { // use efficient   
                                                // interlocked   
                                               // instructions        
        signal.Set();  // and kernel transition only when done  
      }  
    }, c);   
  }  
  signal.WaitOne();  
}

很眼熟但同時看著也很心煩的代碼吧。在換用TPL后，上面的代碼就可以簡化為：

void ParMatrixMult(int size, double[,] m1, double[,] m2, double[,] result)  
 {  
   Parallel.For( 0, size, delegate(int i) {  
     for (int j = 0; j < size; j++) {  
      result[i, j] = 0;  
       for (int k = 0; k < size; k++) {  
         result[i, j] += m1[i, k] * m2[k, j];  
       }  
     }  
  });  
}

舒服多了吧？具體的內(nèi)容請見MSDN的文章優(yōu)化多核計算機的托管代碼。

裝好正式版的VS2010以后，寫了段代碼來測試下，TPL究竟好不好用。

代碼很簡單，拿一條字符串和一堆字符串里的每一條分別用LevenshteinDistance算法做字符串相似程度比對。先用傳統(tǒng)的順序執(zhí)行的代碼跑一遍，記錄下時間；再換用TPL的并行代碼跑一遍，記錄下時間。然后比對兩次運行的時間差異。

using System;  
  using System.Collections.Generic;  
  using System.Linq;  
  using System.Text;  
  using System.Threading.Tasks;  
  using System.Diagnostics;  
    
  namespace ParallelLevenshteinDistance  
  {  
     class Program  
     {  
         static void Main(string[] args)  
         {  
             Stopwatch sw;  
   
             int length;  
             int count;  
             string[] strlist;  
             int[] steps;  
             string comparestring;  
   
             Console.WriteLine("Input string lenth:");  
             length = int.Parse(Console.ReadLine());  
   
             Console.WriteLine("Input string list count:");  
             count = int.Parse(Console.ReadLine());  
   
             comparestring = GenerateRandomString(length);  
             strlist = new string[count];  
             steps = new int[count];  
   
             // prepare string[] for comparison  
             Parallel.For(0, count, delegate(int i)  
             {  
                 strlist[i] = GenerateRandomString(length);  
             });  
   
             Console.WriteLine("{0}Computing...{0}", Environment.NewLine);  
   
             // sequential comparison  
             sw = Stopwatch.StartNew();  
             for (int i = 0; i < count; i++)  
             {  
                 steps[i] = LevenshteinDistance(comparestring, strlist[i]);  
             }  
             sw.Stop();  
             Console.WriteLine("[Sequential] Elapsed:");  
             Console.WriteLine(sw.Elapsed.ToString());  
   
             // parallel comparison  
             sw = Stopwatch.StartNew();  
             Parallel.For(0, count, delegate(int i)  
             {  
                 steps[i] = LevenshteinDistance(comparestring, strlist[i]);  
             });  
             sw.Stop();  
             Console.WriteLine("[Parallel] Elapsed:");  
             Console.WriteLine(sw.Elapsed.ToString());  
                           
             Console.ReadLine();  
         }  
   
         private static string GenerateRandomString(int length)  
         {  
             Random r = new Random((int)DateTime.Now.Ticks);  
             StringBuilder sb = new StringBuilder(length);  
             for (int i = 0; i < length; i++)  
             {  
                int c = r.Next(97, 123);  
                 sb.Append(Char.ConvertFromUtf32(c));  
             }  
             return sb.ToString();  
         }  
   
         private static int LevenshteinDistance(string str1, string str2)  
         {  
             int[,] scratchDistanceMatrix = new int[str1.Length + 1, str2.Length + 1];  
             // distance matrix contains one extra row and column for the seed values              
             for (int i = 0; i <= str1.Length; i++) scratchDistanceMatrix[i, 0] = i;  
             for (int j = 0; j <= str2.Length; j++) scratchDistanceMatrix[0, j] = j;  
   
             for (int i = 1; i <= str1.Length; i++)  
             {  
                 int str1Index = i - 1;  
                 for (int j = 1; j <= str2.Length; j++)  
                 {  
                     int str2Index = j - 1;  
                     var cost = (str1[str1Index] == str2[str2Index]) ? 0 : 1;  
   
                     int deletion = (i == 0) ? 1 : scratchDistanceMatrix[i - 1, j] + 1;  
                     int insertion = (j == 0) ? 1 : scratchDistanceMatrix[i, j - 1] + 1;  
               int substitution = (i == 0 || j == 0) ? cost : scratchDistanceMatrix[i - 1, j - 1] + cost;  
   
                     scratchDistanceMatrix[i, j] = Math.Min(Math.Min(deletion, insertion), substitution);  
   
                     // Check for Transposition  
  if (i > 1 && j > 1 && (str1[str1Index] == str2[str2Index - 1]) && (str1[str1Index - 1] == str2[str2Index]))  
                     {  
scratchDistanceMatrix[i, j] = Math.Min(scratchDistanceMatrix[i, j], scratchDistanceMatrix[i - 2, j - 2] + cost);  
                    }  
                }  
            }  
 
            // Levenshtein distance is the bottom right element  
            return scratchDistanceMatrix[str1.Length, str2.Length];  
        }  
 
    }  
}

這里只用了最簡單的 Parallel.For 方法，代碼很簡單和隨意，但是看看效果還是可以的。

測試機找了不少，喜歡硬件的朋友興許也能找到你感興趣的:P

Intel Core i7 920 (4物理核心8邏輯核心，2.66G) + DDR3 1600 @ 7-7-7-24

AMD Athlon II X4 630 (4物理核心，2.8G) + DDR3 1600 @ 8-8-8-24

AMD Athlon II X2 240 (2物理核心，2.8G) + DDR2 667

Intel Core E5300 (2物理核心，2.33G) + DDR2 667

Intel Atom N270 (1物理核心2邏輯核心，1.6G) + DDR2 667

還在VM workstation里跑過，分別VM了XP和WIN7，都跑在上述i7的機器里，各自分配了2個核心。

程序設置每個字符串長1000個字符，共1000條字符串。

每個機器上程序都跑了3遍，取平均成績，得到下表：

CPU	Core	Time_Sequential(s)	Time_Parallel(s)	S/P(%)
Intel Core i7 920	4 Cores, 8 Threads, 2.6G	55.132634	14.645687	376.44%
AMD AthlonII X4 630	4 Cores, 4 Threads, 2.8G	58.10592	17.152494	338.76%
AMD AthlonII X2 240	2 Cores, 2 Threads, 2.8G	66.159735	32.293972	204.87%
Intel E5300	2 Cores, 2 Threads, 2.3G	70.827157	38.50654	183.94%
Intel Atom N270	1 Cores, 2 Threads, 1.6G	208.47852	157.27869	132.55%
VMWin7(2 logic core)	2 Cores, 2 Threads	56.965068	33.069084	172.26%
VMXP(2 logic core)	2 Cores, 2 Threads	59.799399	35.35805	169.13%

可見，在多核心處理器上，并行計算的執(zhí)行速度都得到了大幅提升，即便是在單核心超線程出2個邏輯核的Atom N270上亦縮短了32.55%的運行時間。在A240上并行計算的效率竟然是順序計算的204.87% ？！而同樣是4核心，i7 920在超線程的幫助下，并行執(zhí)行效率提升明顯高過A630。最后VM里的測試，是否也可以在某種程度上佐證在多核心的調(diào)度上，Win7要強過XP呢（純猜測）？順帶可以看到，同樣是i7的硬件環(huán)境，單線程宿主OS(Win7)里執(zhí)行花費55.133秒，VM(Win7)里56.965秒，速度上有約3%的差異。

另外，針對性能較強的i7處理器，加大程序中的2個變量后再做測試，并行執(zhí)行的效率比得到了進一步的提升。應該是因為創(chuàng)建/管理/銷毀多線程的開銷被進一步的攤平的緣故。例如在每字符串2000個字符，共2000條字符串的情況下，順序執(zhí)行和并行執(zhí)行的時間分別是07:20.9679066和01:47.7059225，消耗時間比達到了409.42%。

來幾張截圖：

從截圖中可以發(fā)現(xiàn)，這段測試程序在順序執(zhí)行的部分，內(nèi)存占用相對平穩(wěn)，CPU則大部分核心處在比較空閑的狀態(tài)。到了并行執(zhí)行部分，所有核心都如預期般被調(diào)動起來，同時內(nèi)存占用開始出現(xiàn)明顯波動。附圖是在每字符串2000個字符，共2000條字符串的情況下得到的。

這里只是非常局部和簡單的一個測試，目的是希望能帶來一個直觀的體驗。微軟已經(jīng)提供了一組不錯的例子可供參考 Samples for Parallel Programming with the .NET Framework 4

原文標題：小試 .NET 4.0 之并行計算

鏈接：http://www.cnblogs.com/Elvin/archive/2010/04/20/1716258.html

【編輯推薦】

使用ASP.NET 4的自動啟動特性
詳解.NET 4.0并行計算支持歷史
詳讀.NET 4.0環(huán)境配置
詳解.NET 4.0中異常處理方面的新特性
三方面詮釋.NET 4.0的新特性

責任編輯：彭凡來源：博客園

51CTO技術棧公眾號

業(yè)務
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學堂精培企業(yè)培訓 CTO訓練營

<legend id="fwjgq"><track id="fwjgq"></track></legend>

<sub id="fwjgq"><p id="fwjgq"></p></sub>

<sub id="fwjgq"></sub><thead id="fwjgq"><rt id="fwjgq"></rt></thead>