Learning Coding Parallelization (Was Tim's Erlang Exercise - Round V)
>>> Updated Oct 20:
After cleaned up my 4-CPU linux box, the result for the 1M records file is about 7.7 sec, for the 5M records file is about 38 sec.
========
>>> Updated Oct 16:
After testing my code on different machines, I found that disk/io performed varyingly, for some very large files, reading file in parallel may cause longer elapsed time (typically on non-server machine, which is not equipped for fast disk/io). So, I added another version tbray4b.erl, in this version, only reading file is not parallalized, all other code is the same. If you'd like to have a test on your machine, please try both.
========
Well, I think I've learned a lot from doing Tim's exercise, not only the List vs Binary in Erlang, but also computing in parallel. Coding Concurrency is farely easy in Erlang, but coding Parallelization is not only about the Languages, it's also a real question.
I wrote tbray3.erl in The Erlang Way (Was Tim Bray's Erlang Exercise - Round IV) and got a fairly good result by far on my 2-core MacBook. But things always are a bit complex. As Steve pointed in the comment, when he tried tbray3.erl on his 8-core linux box:
"I ran it in a loop 10 times, and the best time I saw was 13.872 sec, and user/CPU time was only 16.150 sec, so it’s apparently not using the multiple cores very well."
I also encoutered this issue on my 4-CPU Intel Xeon CPU 2.80GHz debian box, it runs even worse (8.420s) than my 2-core MacBook (4.483s).
I thought about my code a while, and found that my code seems spawning too many processes for scan_chunk, as the scan_chunk's performance has been improved a lot, each process will finish its task very quickly, too quick to the file reading, the inceasing CPUs have no much chance to play the game, the cycled 'reading'-'spawning scan process' is actually almost sequential now, there has been very few simultaneously alive scanning processes. I think I finally meet the file reading bound.
But wait, as I claimed before, that reading file to memory is very fast in Erlang, for a 200M log file, it takes less than 800ms. The time elapsed for tbray3.erl is about 4900ms, far away from 800ms, why I say the file reading is the bound now?
The problem here is: since I suspect the performance of traversing binary byte by byte, I choose to convert binary to list to scan the world. Per my testing results, list is better than binary when is not too longer, in many cases, not longer than several KBytes. And, to make the code clear and readable, I also choose splitting big binary when read file in the meanwhile, so, I have to read file in pieces of no longer than n KBytes. For a very big file, the reading procedure is broken to several ten-thousands steps, which finally cause the whole file reading time elapsed is bit long. That's bad.
So, I decide to write another version, which will read file in parallel (Round III), and split each chunk on lastest new-line (Round II), scan the words using pattern match (Round IV), and yes, I'll use binary instead of list this time, try to solve the worse performance of binary-traverse by parallel, on multiple cores.
The result is interesting, it's the first time I achieved around 10 sec in my 2-core MacBook when use binary match only, and it's also the first time, on my dummy 4-CPU Intel Xeon CPU 2.80GHz debian box, I got better result (7.700 sec) than my MacBook.
>>> Updated Oct 15:
Steve run the code on his 8-core 2.33 GHz Intel Xeon Linux box, with the best time was 4.920 sec:
"the best time I saw for your newest version was 4.920 sec on my 8-core Linux box. Fast! However, user time was only 14.751 sec, so I’m not sure it’s using all the cores that well. Perhaps you’re getting down to where I/O is becoming a more significant factor."
Please see Steve's One More Erlang Wide Finder and his widefinder attempts.
========
Result on 2.0GHz 2-core MacBook:
$ time erl -smp -noshell -run tbray4_bin start o1000k.ap 20 -s erlang halt 8900 : 2006/09/29/Dynamic-IDE 2000 : 2006/07/28/Open-Data 1300 : 2003/07/25/NotGaming 800 : 2003/10/16/Debbie 800 : 2003/09/18/NXML 800 : 2006/01/31/Data-Protection 700 : 2003/06/23/SamsPie 600 : 2006/09/11/Making-Markup 600 : 2003/02/04/Construction 600 : 2005/11/03/Cars-and-Office-Suites Time: 10527.50 ms real 0m10.910s user 0m13.927s sys 0m6.413s
Result on 4-CPU Intel Xeon CPU 2.80GHz debian box,:
# When process number is set to 20: time erl -smp -noshell -run tbray4_bin start o1000k.ap 20 -s erlang halt real 0m7.700s user 0m25.750s sys 0m3.552s # When process number is set to 1: $ time erl -smp -noshell -run tbray4_bin start o1000k.ap 1 -s erlang halt real 0m22.035s user 0m21.525s sys 0m0.512s # On a 940M 5 million lines log file: time erl -smp -noshell -run tbray4_bin start o5000k.ap 100 -s erlang halt 44500 : 2006/09/29/Dynamic-IDE 10000 : 2006/07/28/Open-Data 6500 : 2003/07/25/NotGaming 4000 : 2003/10/16/Debbie 4000 : 2003/09/18/NXML 4000 : 2006/01/31/Data-Protection 3500 : 2003/06/23/SamsPie 3000 : 2006/09/11/Making-Markup 3000 : 2003/02/04/Construction 3000 : 2005/11/03/Cars-and-Office-Suites Time: 37512.76 ms real 0m37.669s user 2m6.296s sys 0m17.789s
On the 4-CPU linux box, comparing the elapsed time between ProcNum = 20 and ProcNum = 1, the elapsed time of parallelized one was only 35% of un-parallelized one, speedup about 185%. The ratio was almost the same as my pread_file.erl testing on the same machine.
It's actually a combination of code in my four previous blogs. Although the performance is not so good as tbray3.erl on my MacBook, but I'm happy that this version is a fully parallelized one, from reading file, scanning words etc. it should scale better than all my previous versions.
-module(tbray4). -compile([native]). -export([start/1, start/2]). -include_lib("kernel/include/file.hrl"). start([FileName, ProcNum]) when is_list(ProcNum) -> start(FileName, list_to_integer(ProcNum)). start(FileName, ProcNum) -> Start = now(), Main = self(), Counter = spawn(fun () -> count_loop(Main) end), Collector = spawn(fun () -> collect_loop(Counter) end), pread_file(FileName, ProcNum, Collector), %% don't terminate, wait here, until all tasks done. receive stop -> io:format("Time: ~10.2f ms~n", [timer:now_diff(now(), Start) / 1000]) end. pread_file(FileName, ProcNum, Collector) -> ChunkSize = get_chunk_size(FileName, ProcNum), pread_file_1(FileName, ChunkSize, ProcNum, Collector). pread_file_1(FileName, ChunkSize, ProcNum, Collector) -> [spawn(fun () -> Length = if I == ProcNum - 1 -> ChunkSize * 2; %% lastest chuck true -> ChunkSize end, {ok, File} = file:open(FileName, [raw, binary]), {ok, Bin} = file:pread(File, ChunkSize * I, Length), file:close(File), {Data, Tail} = split_on_last_newline(Bin), Collector ! {seq, I, Data, Tail} end) || I <- lists:seq(0, ProcNum - 1)], Collector ! {chunk_num, ProcNum}. collect_loop(Counter) -> collect_loop_1([], <<>>, -1, Counter). collect_loop_1(Chunks, PrevTail, LastSeq, Counter) -> receive {chunk_num, ChunkNum} -> Counter ! {chunk_num, ChunkNum}, collect_loop_1(Chunks, PrevTail, LastSeq, Counter); {seq, I, Data, Tail} -> SortedChunks = lists:keysort(1, [{I, Data, Tail} | Chunks]), {Chunks1, PrevTail1, LastSeq1} = process_chunks(SortedChunks, [], PrevTail, LastSeq, Counter), collect_loop_1(Chunks1, PrevTail1, LastSeq1, Counter) end. count_loop(Main) -> count_loop_1(Main, dict:new(), undefined, 0). count_loop_1(Main, Dict, ChunkNum, ChunkNum) -> print_result(Dict), Main ! stop; count_loop_1(Main, Dict, ChunkNum, ProcessedNum) -> receive {chunk_num, ChunkNumX} -> count_loop_1(Main, Dict, ChunkNumX, ProcessedNum); {dict, DictX} -> Dict1 = dict:merge(fun (_, V1, V2) -> V1 + V2 end, Dict, DictX), count_loop_1(Main, Dict1, ChunkNum, ProcessedNum + 1) end. process_chunks([], ChunkBuf, PrevTail, LastSeq, _) -> {ChunkBuf, PrevTail, LastSeq}; process_chunks([{I, Data, Tail}=Chunk|T], ChunkBuf, PrevTail, LastSeq, Counter) -> case LastSeq + 1 of I -> spawn(fun () -> Counter ! {dict, scan_chunk(<<PrevTail/binary, Data/binary>>)} end), process_chunks(T, ChunkBuf, Tail, I, Counter); _ -> process_chunks(T, [Chunk | ChunkBuf], PrevTail, LastSeq, Counter) end. print_result(Dict) -> SortedList = lists:reverse(lists:keysort(2, dict:to_list(Dict))), [io:format("~b\t: ~s~n", [V, K]) || {K, V} <- lists:sublist(SortedList, 10)]. get_chunk_size(FileName, ProcNum) -> {ok, #file_info{size=Size}} = file:read_file_info(FileName), Size div ProcNum. split_on_last_newline(Bin) -> split_on_last_newline_1(Bin, size(Bin)). split_on_last_newline_1(Bin, Offset) when Offset > 0 -> case Bin of <<Data:Offset/binary,$\n,Tail/binary>> -> {Data, Tail}; _ -> split_on_last_newline_1(Bin, Offset - 1) end; split_on_last_newline_1(Bin, _) -> {Bin, <<>>}. scan_chunk(Bin) -> scan_chunk_1(Bin, 0, dict:new()). scan_chunk_1(Bin, Offset, Dict) when Offset =< size(Bin) - 34 -> case Bin of <<_:Offset/binary,"GET /ongoing/When/",_,_,_,$x,$/,Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/,Rest/binary>> -> case match_until_space_newline(Rest, 0) of {Rest1, <<>>} -> scan_chunk_1(Rest1, 0, Dict); {Rest1, Word} -> Key = <<Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/, Word/binary>>, scan_chunk_1(Rest1, 0, dict:update_counter(Key, 1, Dict)) end; _ -> scan_chunk_1(Bin, Offset + 1, Dict) end; scan_chunk_1(_, _, Dict) -> Dict. match_until_space_newline(Bin, Offset) when Offset < size(Bin) -> case Bin of <<Word:Offset/binary,$ ,Rest/binary>> -> {Rest, Word}; <<_:Offset/binary,$.,Rest/binary>> -> {Rest, <<>>}; <<_:Offset/binary,10,Rest/binary>> -> {Rest, <<>>}; _ -> match_until_space_newline(Bin, Offset + 1) end; match_until_space_newline(_, _) -> {<<>>, <<>>}.
>>> Updated Oct 16:
After testing my code on different machines, I found that disk/io performed varyingly, for some very large files, reading file in parallel may cause longer elapsed time (typically on non-server machine, which is not equipped for fast disk/io). So, I wrote another version: tbray4b.erl, in this version, only reading file is not parallalized, all other code is the same. Here's a result for this version on a 940M file with 5 million lines, with ProcNum set to 200 and 400)
# On 2-core MacBook: $ time erl -smp -noshell -run tbray4b start o5000k.ap 200 -s erlang halt real 0m50.498s user 0m49.746s sys 0m11.979s # On 4-cpu linux box: $ time erl -smp -noshell -run tbray4b start o5000k.ap 400 -s erlang halt real 1m2.136s user 1m59.907s sys 0m7.960s
The code: tbray4b.erl
-module(tbray4b). -compile([native]). -export([start/1, start/2]). -include_lib("kernel/include/file.hrl"). start([FileName, ProcNum]) when is_list(ProcNum) -> start(FileName, list_to_integer(ProcNum)). start(FileName, ProcNum) -> Start = now(), Main = self(), Counter = spawn(fun () -> count_loop(Main) end), Collector = spawn(fun () -> collect_loop(Counter) end), read_file(FileName, ProcNum, Collector), %% don't terminate, wait here, until all tasks done. receive stop -> io:format("Time: ~10.2f ms~n", [timer:now_diff(now(), Start) / 1000]) end. read_file(FileName, ProcNum, Collector) -> ChunkSize = get_chunk_size(FileName, ProcNum), {ok, File} = file:open(FileName, [raw, binary]), read_file_1(File, ChunkSize, 0, Collector). read_file_1(File, ChunkSize, I, Collector) -> case file:read(File, ChunkSize) of eof -> file:close(File), Collector ! {chunk_num, I}; {ok, Bin} -> spawn(fun () -> {Data, Tail} = split_on_last_newline(Bin), Collector ! {seq, I, Data, Tail} end), read_file_1(File, ChunkSize, I + 1, Collector) end. collect_loop(Counter) -> collect_loop_1([], <<>>, -1, Counter). collect_loop_1(Chunks, PrevTail, LastSeq, Counter) -> receive {chunk_num, ChunkNum} -> Counter ! {chunk_num, ChunkNum}, collect_loop_1(Chunks, PrevTail, LastSeq, Counter); {seq, I, Data, Tail} -> SortedChunks = lists:keysort(1, [{I, Data, Tail} | Chunks]), {Chunks1, PrevTail1, LastSeq1} = process_chunks(SortedChunks, [], PrevTail, LastSeq, Counter), collect_loop_1(Chunks1, PrevTail1, LastSeq1, Counter) end. count_loop(Main) -> count_loop_1(Main, dict:new(), undefined, 0). count_loop_1(Main, Dict, ChunkNum, ChunkNum) -> print_result(Dict), Main ! stop; count_loop_1(Main, Dict, ChunkNum, ProcessedNum) -> receive {chunk_num, ChunkNumX} -> count_loop_1(Main, Dict, ChunkNumX, ProcessedNum); {dict, DictX} -> Dict1 = dict:merge(fun (_, V1, V2) -> V1 + V2 end, Dict, DictX), count_loop_1(Main, Dict1, ChunkNum, ProcessedNum + 1) end. process_chunks([], ChunkBuf, PrevTail, LastSeq, _) -> {ChunkBuf, PrevTail, LastSeq}; process_chunks([{I, Data, Tail}=Chunk|T], ChunkBuf, PrevTail, LastSeq, Counter) -> case LastSeq + 1 of I -> spawn(fun () -> Counter ! {dict, scan_chunk(<<PrevTail/binary, Data/binary>>)} end), process_chunks(T, ChunkBuf, Tail, I, Counter); _ -> process_chunks(T, [Chunk | ChunkBuf], PrevTail, LastSeq, Counter) end. print_result(Dict) -> SortedList = lists:reverse(lists:keysort(2, dict:to_list(Dict))), [io:format("~b\t: ~s~n", [V, K]) || {K, V} <- lists:sublist(SortedList, 10)]. get_chunk_size(FileName, ProcNum) -> {ok, #file_info{size=Size}} = file:read_file_info(FileName), Size div ProcNum. split_on_last_newline(Bin) -> split_on_last_newline_1(Bin, size(Bin)). split_on_last_newline_1(Bin, Offset) when Offset > 0 -> case Bin of <<Data:Offset/binary,$\n,Tail/binary>> -> {Data, Tail}; _ -> split_on_last_newline_1(Bin, Offset - 1) end; split_on_last_newline_1(Bin, _) -> {Bin, <<>>}. scan_chunk(Bin) -> scan_chunk_1(Bin, 0, dict:new()). scan_chunk_1(Bin, Offset, Dict) when Offset =< size(Bin) - 34 -> case Bin of <<_:Offset/binary,"GET /ongoing/When/",_,_,_,$x,$/,Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/,Rest/binary>> -> case match_until_space_newline(Rest, 0) of {Rest1, <<>>} -> scan_chunk_1(Rest1, 0, Dict); {Rest1, Word} -> Key = <<Y1,Y2,Y3,Y4,$/,M1,M2,$/,D1,D2,$/, Word/binary>>, scan_chunk_1(Rest1, 0, dict:update_counter(Key, 1, Dict)) end; _ -> scan_chunk_1(Bin, Offset + 1, Dict) end; scan_chunk_1(_, _, Dict) -> Dict. match_until_space_newline(Bin, Offset) when Offset < size(Bin) -> case Bin of <<Word:Offset/binary,$ ,Rest/binary>> -> {Rest, Word}; <<_:Offset/binary,$.,Rest/binary>> -> {Rest, <<>>}; <<_:Offset/binary,10,Rest/binary>> -> {Rest, <<>>}; _ -> match_until_space_newline(Bin, Offset + 1) end; match_until_space_newline(_, _) -> {<<>>, <<>>}.
=======

rss
Comments
No one can deny the shopping at the coach factory outlet is satisfactory. For the low prices and good quality.Over the years, coach factory online has added a multitude of new handbag shapes, styles and materials to their collection. However, the highest care is taken that every Coach handbags is both aesthetically beautiful and functional.Many fashionable women match with practical Coach Purses which will make the street shopping become relaxed,and make every person can enjoy more diversiform combination in coach factory outlet online.
http://www.ccoachfactoryoutlet.com
Don't feel upset, there will be a great conversion to this kind of situation because there are a large number of louis vuitton sale now!louis vuitton outlet,welcome to buy urban louis vuitton on our online shop.discount price is our special offer, durability and high quality is our promise.louis vuitton Outlets offer famous classic brand for LV,Channel, with perfect service.So become to the VIP soon.They offer more new styles,like LV purses,LV wallets etc. And are tested by product quality monitoring center .
http://www.louisvuittonoutletsaleo.com
Turn your attention to such discount coach sneakers for women from coach outlet online, you will find something unique and special of such authentic coach for sale at coach factory outlet store online.coach outlet store sells goods that are constructed to meet the highest standards of quality and functionality.You can trust it 100 percent.coach outlet has become a popular shopping experience for consumers around the world, and a desirable distribution channel for manufacturer's and retailers.
http://www.coachoutletonlinesl.com
any louis vuitton outlet New backpack features usa a function grownup overall look. Varied piece allows that it is captivated me within the approve and also further than your body.It’s induced by way of severe hardworking liver diseases, and also the most familiar factors behind constant louis vuitton bags outlet hard working liver disorder usually are excessive drinking and also liver disease Chemical.Ladies have an ardent love for the Louis Vuitton handbags outlet because Louis Vuitton enjoys a worldwide reputation of high quality and fashionable designs.
http://www.louisvuittonoutletbagsc.com
coach outlet store online marketed properly all greater compared to earth and earn cozy praise from customers. They are made from the finest leather and fabric.coach outlet store have Coach handbags,Coach Shoulder Bags,Coach Briefcases and so on,these bags are so perfectly reproduced,you won't even be able to tell the difference!Coach Outlet Online Store would dynamically change your overall styles right away. The amazing knack about the unique coach handbag is that it would never disappoint your individual styles at all. Rather, it would instantly change your ultimate fashions in a remarkable manner.
http://www.coachoutletstoreonlineeo.com
Our online store offers you discounted Designer louis vuitton replica wallet at present. You could find them in desirable quality and price. If you don't mind high class louis vuitton uk, have a good time here.These is the first time your visit our louis vuit Louis vuitton online shop ,welcome.
http://www.louisvuittonukk.org.uk
http://www.raybansunglassesoutletes.com/ ray bans http://www.raybansunglassoutlet.com/ Rayban Sunglass http://www.hoganshoesoutletstore.com/ hogan shoes http://www.hervelegeroutletes.com/ herve leger uk sale http://www.outlethervelegers.com/ herve leger red http://www.ralphlaurenoutlet-uk.com/ ralph lauren uk http://www.poloralphlaurenoutletes.com/ ralph lauren france
<a href="http://www.shoesadida.com/">Adidas Shoes on Sale</a> is good, but there is a high heel machine so thick waterproof walk not tired feet;<a href="http://www.shoesadida.com/adidas-adizero-c-1.html">Adidas Running Shoes</a> is very light, so long walk also don't feel heavy; The modelling of mouth fish is show small feet, no matter wear pants are good <a href="http://www.shoesadida.com/adidas-climacool-c-8.html">Adidas ClimaCool? </a> or, I like Adidas.
When I was a kid in dozens of pieces of Silver Star gel nails football shoes, put brushes a day job to play wild ball, http://finland.footballsoccers.com it was my most happy time. Now even put on thousands of shoes, wore the jerseys of love, in real play on a football field, why not find back the previous feeling it. I can't go back, back to that brush high marks as the norm by the way, get a few more age of cow's side also have a lot of time. http://finland.footballsoccers.com/lotto-soccer-cleats-c-1014.html
wanted to sexy points of http://fi.nikeseu.com, can wear Shang Staerk of lace perspective Hat shirt; wanted to go noble route http://fi.nikeseu.com/air-jordan-c-801.html, that on selected Celine of produced 's, stereo trim plus Flash silk mass fabric http://fi.nikeseu.com/Jalkapallokengät-c-841.html.
Welcome!Coach Outlet Store are loved by many people,when you walk in the street, you could see many people take Coach Shoulder Bags styles.
Coach Factory http://coachfactoryoutlet.1uxury--best.com
Coach Factory Outlet http://www.coachfactoryoutletcd.com
Coach Outlet Store http://www.coachoutletstoreusac.com
Coach Outlet http://www.coachoutletonlinecss.com