Learning Coding Binary (Was Tim's Erlang Exercise - Round VI)

>>> Updated Nov 1:
Tim tested tbray5.erl on T5120, for his 971,538,252 bytes of data in 4,625,236 lines log file, got:

real    0m20.74s
user    3m51.33s
sys     0m8.00s

The result was what I guessed, since the elapsed time of my code was 3 times of Anders' on my machine. I'm glad that Erlang performs linearly on different machines/os.

My code not the fastest. I did not apply Boyer-Moore searching, thus scan_chunk_1/4 has to test/skip binary 1byte by 1byte when not exactly matched. Anyway, this code shows how to code binary efficiently, and demos the performance of traversing binary byte by byte (the performance is not so bad now, right?). And also, it's what I want: a balance between simple, readable and speed.

Another approach for lazy man is something binary pattern match hacking, we can modify scan_chunk_1/4 to:

scan_chunk_1(Bin, DataL, S, Dict) when S < DataL - 34 ->
    Offset = 
      case Bin of
          <<_:S/binary,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> -> 
              34;
          <<_:S/binary,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              35;
          <<_:S/binary,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              36;
          <<_:S/binary,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              37;
          <<_:S/binary,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              38;
          <<_:S/binary,_,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              39;
          <<_:S/binary,_,_,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              40;
          <<_:S/binary,_,_,_,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              41;
          <<_:S/binary,_,_,_,_,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              42;
          <<_:S/binary,_,_,_,_,_,_,_,_,_,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
              43;
          _ -> undefined
      end,
    case Offset of
        undefined -> scan_chunk_1(Bin, DataL, S + 10, Dict);
        _ ->
            case match_until_space_newline(Bin, S + Offset) of
                {true, E} ->
                    Skip = S + Offset - 12, L = E - Skip,
                    <<_:Skip/binary,Key:L/binary,_/binary>> = Bin,
                    scan_chunk_1(Bin, DataL, E + 1, dict:update_counter(Key, 1, Dict));
                {false, E} -> 
                    scan_chunk_1(Bin, DataL, E + 1, Dict)
            end
    end;
scan_chunk_1(_, _, _, Dict) -> Dict.

The elapsed time dropped to 1.424 sec immediatley vs 2.792 sec before, speedup about 100%, on my 4-CPU linux box.

If you are patient, you can copy-paste 100... such lines :-) (in this case, I'd rather to pick Boyer-Moore), and the elapsed time will drop a little bit more, but not much after 10 lines or so.
========

>>> Updated Oct 29:
Pihis updated his WideFinder, applied guildline II, and beat Ruby on his one-core box. And he also modified Anders's code by removing all un-necessary remaining binary bindings (which cause un-necessay sub-binary splitting), then, Steve tested the refined code, and got 0.567s on his famous 8-CPU linux box. Now, we may reach the real Disk/IO bound, should we try parallelized file reading? but, I've tired of more widefinders. BTW, May we have regexped pattern match in syntax level? The End.
========

>>> Updated Oct 26:
Code cleanup
========

Binary usaully is more efficent than List in Erlang.

The memory size of Binary is 3 to 6 words plus Data itself, the Data can be allocated / deallocated in global heap, so the Data can be shared over function calls, shared over processes when do message passing (on the same node), without copying. That's why heap size affects a lot on binary.

The memory size of List is 1 word per element + the size of each element, and List is always copying between function calls, on message passing.

In my previous blogs about Tim's exercise, I suspected the performance of Binary traverse. But, with more advices, experience, it seems binary can work very efficient as an ideal Dataset processing struct.

But, there are some guidelines for efficient binary in Erlang, I'll try to give out here, which I learned from the exercise and experts.

I. Don't split a binary unless the split binaries are what you exactly want

Splitting/combining binaries is expensive, so when you want to get values from a binary at some offsets:

Do

<<_:Offset/binary, C, _/binary>> = Bin,
io:format("Char at Offset: ~p", [C]).

Do Not *

<<_:Offset/binary, C/binary, _/binary>> = Bin,
io:format("Char at Offset: ~p", [C]).

* This may be good in R12B

And when you want to split a binary to get Head or Tail only:

Do

<<Head:Offset/binary,_/binary>> = Bin.

Do Not

{Head, _} = split_binary(Bin, Offset).

II. Calculate the final offsets first, then split it when you've got the exactly offsets

When you traverse binary to test the bytes/bits, calculate and collect the final offsets first, don't split binary (bind named Var to sub-binary) at that time. When you've got all the exactly offsets, split what you want finally:

Do

get_nth_word(Bin, N) ->
    Offsets = calc_word_offsets(Bin, 0, [0]),
    S = element(N, Offsets),
    E = element(N + 1, Offsets),
    L = E - S,
    <<_:S/binary,Word:L/binary,_/binary>> = Bin,
    io:format("nth Word: ~p", [Word]).

calc_word_offsets(Bin, Offset, Acc) when Offset < size(Bin) ->
    case Bin of
        <<_:Offset/binary,$ ,_/binary>> ->
            calc_word_offsets(Bin, Offset + 1, [Offset + 1 | Acc]);
        _ ->
            calc_word_offsets(Bin, Offset + 1, Acc)
    end;
calc_word_offsets(_, _, Acc) -> list_to_tuple(lists:reverse(Acc)).

Bin = <<"This is a binary test">>,
get_nth_word(Bin, 4). %  <<"binary ">>

Do Not

get_nth_word_bad(Bin, N) ->
    Words = split_words(Bin, 0, []),
    Word = element(N, Words),
    io:format("nth Word: ~p", [Word]).

split_words(Bin, Offset, Acc) ->
    case Bin of
        <<Word:Offset/binary,$ ,Rest/binary>> ->
            split_words(Rest, 0, [Word | Acc]);
        <<_:Offset/binary,_,_/binary>> ->
            split_words(Bin, Offset + 1, Acc);
        _ -> list_to_tuple(lists:reverse([Bin | Acc]))
    end.

Bin = <<"This is a binary test">>,
get_nth_word_bad(Bin, 4). %  <<"binary">>

III. Use "+h Size" option or [{min_heap_size, Size}] with spawn_opt

This is very important for binary performance. It's somehow a Key Number for binary performance. With this option set properly, the binary performs very well, otherwise, worse.

IV. Others

  • Don't forget to compile to native by adding "-compile([native])." in your code.
  • Maybe "+A Size" to set the number of threads in async thread pool also helps a bit when do IO.

Practice

Steve and Anders have pushed widefinder in Erlang to 1.1 sec on 8-CPU linux box. Their code took about 1.9 sec on my 4-CPU box. Then, how about a concise version?

According to above guide, based on my previous code and Per's dict code, Luke's spawn_worker, I rewrote a concise and straightforward tbray5.erl (less than 80 LOC), without any extra c-drived modules for Tim's exercise , and got about 2.972 sec for 1 milli lines log file, and 15.695 sec for 5 milli lines, vs no-parallelized Ruby's 4.161 sec and 20.768 sec on my 2.80Ghz 4-CPU Intel Xeon linux box:

BTW, using ets instead of dict is almost the same.

$ erlc -smp tbray5.erl
$ time erl +h 8192 -smp -noshell -run tbray5 start o1000k.ap -s erlang halt

real    0m2.972s
user    0m9.685s
sys     0m0.748s

$ time erl +h 8192 -smp -noshell -run tbray5 start o5000k.ap -s erlang halt

real    0m15.695s
user    0m53.551s
sys     0m4.268s

On 2.0GHz 2-core MacBook (Ruby code took 2.447 sec):

$ time erl +h 8192 -smp -noshell -run tbray5 start o1000k.ap -s erlang halt

real    0m3.034s
user    0m4.853s
sys     0m0.872s

The Code: tbray5.erl

-module(tbray5).
-compile([native]).
-export([start/1]).

-define(BUFFER_SIZE, (1024 * 10000)).

start(FileName) ->
    Dicts = [wait_result(Worker) || Worker <- read_file(FileName)],
    print_result(merge_dicts(Dicts)).
              
read_file(FileName) ->
    {ok, File} = file:open(FileName, [raw, binary]),
    read_file_1(File, 0, []).            
read_file_1(File, Offset, Workers) ->
    case file:pread(File, Offset, ?BUFFER_SIZE) of
        eof ->
            file:close(File),
            Workers;
        {ok, Bin} ->
            DataL = split_on_last_newline(Bin),
            Worker = spawn_worker(self(), fun scan_chunk/1, {Bin, DataL}),
            read_file_1(File, Offset + DataL + 1, [Worker | Workers])
    end.

split_on_last_newline(Bin) -> split_on_last_newline_1(Bin, size(Bin)).   
split_on_last_newline_1(Bin, S) when S > 0 ->
    case Bin of
        <<_:S/binary,$\n,_/binary>> -> S;
        _ -> split_on_last_newline_1(Bin, S - 1)
    end;
split_on_last_newline_1(_, S) -> S.

scan_chunk({Bin, DataL}) -> scan_chunk_1(Bin, DataL, 0, dict:new()).
scan_chunk_1(Bin, DataL, S, Dict) when S < DataL - 34 ->
    case Bin of
        <<_:S/binary,"GET /ongoing/When/",_,_,_,$x,$/,_,_,_,_,$/,_,_,$/,_,_,$/,_/binary>> ->
            case match_until_space_newline(Bin, S + 34) of
                {true, E} ->
                    Skip = S + 23, L = E - Skip,
                    <<_:Skip/binary,Key:L/binary,_/binary>> = Bin,
                    scan_chunk_1(Bin, DataL, E + 1, dict:update_counter(Key, 1, Dict));
                {false, E} ->
                    scan_chunk_1(Bin, DataL, E + 1, Dict)
            end;
        _ -> scan_chunk_1(Bin, DataL, S + 1, Dict)
    end;
scan_chunk_1(_, _, _, Dict) -> Dict.

match_until_space_newline(Bin, S) when S < size(Bin) ->
    case Bin of
        <<_:S/binary,10,_/binary>> -> {false, S};
        <<_:S/binary,$.,_/binary>> -> {false, S};
        <<_:S/binary,_,$ ,_/binary>> -> {true, S + 1};
        _ -> match_until_space_newline(Bin, S + 1)
    end;
match_until_space_newline(_, S) -> {false, S}.
    
spawn_worker(Parent, F, A) ->
    erlang:spawn_monitor(fun() -> Parent ! {self(), F(A)} end).

wait_result({Pid, Ref}) ->
    receive
        {'DOWN', Ref, _, _, normal} -> receive {Pid, Result} -> Result end;
        {'DOWN', Ref, _, _, Reason} -> exit(Reason)
    end.
    
merge_dicts([D1,D2|Rest]) ->
    merge_dicts([dict:merge(fun(_, V1, V2) -> V1 + V2 end, D1, D2) | Rest]);
merge_dicts([D]) -> D.

print_result(Dict) ->
    SortedList = lists:reverse(lists:keysort(2, dict:to_list(Dict))),
    [io:format("~b\t: ~p~n", [V, K]) || {K, V} <- lists:sublist(SortedList, 10)].

Comments

1. Hynek (Pichi) Vychodil -- 2007-10-29 16:00

Hi Deng, Your suggestions was very useful and important for my <a href="http://pichis-blog.blogspot.com/2007/10/faster-than-ruby-but-scalable.html">last</a> version. It's big improvement. With Boyer-Moore algorithm it is killer implementation.

You seen in erlang-question anyway :-)

Pichi

2. Caoyuan -- 2007-10-29 16:00

Hi Pichi, I've updated the blog head to include your experience.

3. Web Server Hosting -- 2012-03-12 05:55

This is a very important issue and this slide is very helpful. I would mention that many of us readers actually are definitely blessed to live in a fantastic website with very many special  professionals with beneficial points.<a href="http://www.webhostings.in" rel="dofollow">Web Server Hosting </a>

4. eugenedrewer66@… -- 2012-03-29 07:04

I really loved how you have created your site, it's simple, neat, simple to get around and very easy on the eyes. Can you let me know which theme or designer did you use.sameer thapar

5. coach factory outlet -- 2012-03-31 06:03

No one can deny the shopping at the coach factory outlet is satisfactory. For the low prices and good quality.Over the years, coach factory online has added a multitude of new handbag shapes, styles and materials to their collection. However, the highest care is taken that every Coach handbags is both aesthetically beautiful and functional.Many fashionable women match with practical Coach Purses which will make the street shopping become relaxed,and make every person can enjoy more diversiform combination in coach factory outlet online.

http://www.ccoachfactoryoutlet.com

6. louis vuitton sale -- 2012-03-31 06:03

Don't feel upset, there will be a great conversion to this kind of situation because there are a large number of louis vuitton sale now!louis vuitton outlet,welcome to buy urban louis vuitton on our online shop.discount price is our special offer, durability and high quality is our promise.louis vuitton Outlets offer famous classic brand for LV,Channel, with perfect service.So become to the VIP soon.They offer more new styles,like LV purses,LV wallets etc. And are tested by product quality monitoring center .

http://www.louisvuittonoutletsaleo.com

7. coach outlet online -- 2012-03-31 06:03

Turn your attention to such discount coach sneakers for women from coach outlet online, you will find something unique and special of such authentic coach for sale at coach factory outlet store online.coach outlet store sells goods that are constructed to meet the highest standards of quality and functionality.You can trust it 100 percent.coach outlet has become a popular shopping experience for consumers around the world, and a desirable distribution channel for manufacturer's and retailers.

http://www.coachoutletonlinesl.com

8. coach outlet store online -- 2012-03-31 06:17

coach outlet store online marketed properly all greater compared to earth and earn cozy praise from customers. They are made from the finest leather and fabric.coach outlet store have Coach handbags,Coach Shoulder Bags,Coach Briefcases and so on,these bags are so perfectly reproduced,you won't even be able to tell the difference!Coach Outlet Online Store would dynamically change your overall styles right away. The amazing knack about the unique coach handbag is that it would never disappoint your individual styles at all. Rather, it would instantly change your ultimate fashions in a remarkable manner.

http://www.coachoutletstoreonlineeo.com

9. louis vuitton uk -- 2012-03-31 06:22

Our online store offers you discounted Designer louis vuitton replica wallet at present. You could find them in desirable quality and price. If you don't mind high class louis vuitton uk, have a good time here.These is the first time your visit our louis vuit Louis vuitton online shop ,welcome.

http://www.louisvuittonukk.org.uk

10. louis vuitton outlet -- 2012-03-31 06:22

any louis vuitton outlet New backpack features usa a function grownup overall look. Varied piece allows that it is captivated me within the approve and also further than your body.It’s induced by way of severe hardworking liver diseases, and also the most familiar factors behind constant louis vuitton bags outlet hard working liver disorder usually are excessive drinking and also liver disease Chemical.Ladies have an ardent love for the Louis Vuitton handbags outlet because Louis Vuitton enjoys a worldwide reputation of high quality and fashionable designs.

http://www.louisvuittonoutletbagsc.com

11. anonymous -- 2012-04-09 08:50

Thanks a lot for providing the great info is visible in this blog that to using the great services in this blog. This is very much happy much enjoyed for the great technology is visible in this blog that to sharing the great services in this blog Link:md5 encryption

12. michalehinkey55@… -- 2012-04-13 09:48

I really loved this kind of post, if I have your permission may i duplicate this post to my blog and share it with other people as well. Naturally I'm going to give the original credits to you only.stop snoring aids

13. christian louboutin uk -- 2012-04-18 01:35
14. louis vuitton uk -- 2012-04-18 01:45
15. mulberry sale -- 2012-04-18 01:53
16. dgdfg -- 2012-04-24 04:31
17. kokok -- 2012-04-24 04:32

replica tissot watches replica watches joins the iconic models of the brand With a <a href=http://www.havewatches.org/>fake watches</a> Watches Ladies Watches 0 Items in cart It was fake watches purpose Below is a list and brief description of.

Add New Comment