Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Building distributed applications with riak_core

Building distributed applications with riak_core

Presents https://github.com/mrallen1/udon - an example application which demonstrates how to write a distributed application using riak_core, including handoff implementation. From Erlang Factory SF 2015

Avatar for Jade Allen

Jade Allen

March 26, 2015
Tweet

More Decks by Jade Allen

Other Decks in Programming

Transcript

  1. About  me • Sys  admin  and  software  developer  for  almost

      20  years  now. • Work  at  Basho  Technologies  (purveyors  of  fine   databasses) • Coding  Erlang for  about  3  years  now. • Very  interested  in  distributed  systems.
  2. What  is  riak_core? • A  modular  distributed  systems  library •

    Provides  consistent  hashing  functions • Focus  on  your  application  not  on  the  plumbing   of  a  distributed  application.
  3. Concepts:  Consistent  Hash • Limits  reshuffling  of  keys  when  hash

     table   data  structure  is  rebalanced. • riak_core uses  consistent  hashing  to   determine  where  to  store  data  on  a   primary  replica  as  well  as  fallbacks  if  the   primary  is  offline.
  4. Concepts:  Virtual  Node • AKA  "vnode" • An  Erlang process

     which  represents  a  slice  of   the  ring  hash  space. • Number  of  data  replicas  is  tunable  (by  default,   there  are  3)
  5. What  you  get  "out  of  the  box" • "Physical"  cluster

     state  management,   • Ring  state  management,   • Vnode placement,   • Vnode replication,   • Cluster  and  ring  state  gossip  protocols, • Consistent  hashing  utilities   • Handoff  activities • Covering  set  callbacks  (optional)
  6. Contrast  Distributed  Erlang • Sits  at  a  lower  level.  Necessary

     but  not   sufficient  to  build  robust  distributed  apps. • Have  to  manage  a  lot  of  tricky  things  with  a  lot   of  non-­‐intuitive  failure  modes. – Node  failure – Vnode placement  updates
  7. Is  it  a  good  fit? Riak_core makes  certain  assumptions  about

     the   application  you  want  to  build  on  top  it. – It  expects  you  to  have  a  "key"  that  links  to  a  blob   of  data. – The  key  (or  rather  its  chash)  determines  its   primary  vnode and  adjacent  replicas. – The  data  itself  is  opaque  and  has  application   context. – Presumably  you  want  behavior  semantics  that  are   different  than  riak_kv with  your  data.
  8. riak_core gaps • riak_core has  a  lot  of  capabilities  but

     also   some  gaps  you  should  know  about. – Cluster  membership  is  controlled  by  a  human – Yes,  even  when  a  node  failure  has  been  (correctly)   detected  by  the  cluster  manager. – vnode distribution  around  the  ring  is  sometimes   suboptimal  (this  is  not  always  easy  to  fix) • You  should  know  that  Basho  uses/tests/ships   its  own  OTP  (but  17.X  works  fine;  YMMV)
  9. Example  App:  Udon • Distributed  static  file  web  server •

    Why? • Key  =  URL-­‐like  path • Data  =  static  file  at  that  URL
  10. Getting  started $ git clone https://github.com/basho/rebar_riak_core $ cd rebar_riak_core; make

    install; cd .. $ mkdir foo; cd foo $ rebar create template=riak_core appid=foo
  11. udon/ !"" LICENSE !"" Makefile !"" README.md !"" rebar !""

    rebar.config !"" rel # !"" files # # !"" app.config # # !"" erl # # !"" nodetool # # !"" udon # # !"" udon-admin # # $"" vm.args # !"" gen_dev # !"" reltool.config # !"" vars # # $"" dev_vars.config.src # $"" vars.config $"" src !"" udon.app.src !"" udon.erl !"" udon.hrl !"" udon_app.erl !"" udon_console.erl !"" udon_node_event_handler.erl !"" udon_ring_event_handler.erl !"" udon_sup.erl $"" udon_vnode.erl
  12. Write  an  API • ping/0  – you  get  this  for

     free! • store/2  (Path,  Data) • fetch/1  (Path) • redirect/2  (OldPath,  NewPath)  – not   implemented • serve_file_version/2  (Path,  Version)  – also  not   implemented
  13. How  does  this  even  work? • Take  the  path  and

     hash  it  using  MD5  (object   name),  make  some  metadata • Object  name  is  fed  into  the  CHASH  function   for  vnode placement • In  the  vnode,  update  some  metadata  about   the  object  (file  version) • Store  the  metadata  as  (object.meta)  and  the   data  as  (object.version)  on  disk
  14. !"" 0 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta !"" 1004782375664995756265033322492444576013453623296

    !"" 1027618338748291114361965898003636498195577569280 ... !"" 1415829711164312202009819681693899175291684651008 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta !"" 1438665674247607560106752257205091097473808596992 # !"" 098f6bcd4621d373cade4e832627b4f6.1 # $"" 098f6bcd4621d373cade4e832627b4f6.meta ... !"" 91343852333181432387730302044767688728495783936 !"" 913438523331814323877303020447676887284957839360 # !"" 5a105e8b9d40e1329780d62ea2265d8a.1 # $"" 5a105e8b9d40e1329780d62ea2265d8a.meta !"" 936274486415109681974235595958868809467081785344 !"" 959110449498405040071168171470060731649205731328 $"" 981946412581700398168100746981252653831329677312 Udon on-­‐disk  layout
  15. Fetch  implementation fetch(Path) -> PHash = path_to_hash(Path), Idx = riak_core_util:chash_key(?KEY(PHash)),

    %% TODO: Get results from more than one node [{Node, _Type}] = riak_core_apl:get_primary_apl( Idx, 1, udon), riak_core_vnode_master:sync_spawn_command( Node, {fetch, PHash}, udon_vnode_master).
  16. Fetch  Vnode Implementation handle_command({fetch, PHash}, _Sender, State) -> MetaPath =

    make_metadata_path(State, PHash), Res = case filelib:is_regular(MetaPath) of true -> MD = get_metadata(State, PHash), get_data(State, MD); % returns {ok, Data} false -> not_found end, {reply, {Res, filename:join([ make_base_path(State), make_filename(PHash)])}, State};
  17. Store  implementation store(Path, Data) -> N = 3, % number

    of nodes to contact W = 3, % number of writes Timeout = 5000, % millisecs PHash = path_to_hash(Path), PRec = #file{ request_path = Path, path_md5 = PHash, csum = erlang:adler32(Data) }, {ok, ReqId} = udon_op_fsm:op( N, W, {store, PRec, Data}, ?KEY(PHash)), wait_for_reqid(ReqId, Timeout).
  18. Store  Vnode Implementation handle_command({RequestId, {store, R, Data}}, _Sender, State) ->

    MetaPath = make_metadata_path(State, R), NewVersion = case filelib:is_regular(MetaPath) of true -> OldMD = get_metadata(State, R), OldMD#file.version + 1; false -> 1 end, {MetaResult, DataResult, Loc} = store(State, R#file{version=NewVersion}, Data), {reply, {RequestId, {MetaResult, DataResult, Loc}}, State};
  19. Handoffs • There  are  4  distinct  types  of  handoffs  in

      riak_core:   – ownership,   – hinted,   – repair  and, – resize
  20. Implementing  handoff • At  a  high  level  what  are  the

     goals  of  a  handoff   implementation? – Per  vnode,  get  the  objects  associated  with  that   vnode. – Per  object,  find/compute  the  data  needed  to   transfer  object  state  to  another  node – Serialize  the  object  state  and  send  it – At  the  receiving  end,  deserialize and  store  state
  21. Implementing  udon handoff Per  vnode,  get  objects: get_all_objects(State) -> [

    strip_meta(F) || F <- filelib:wildcard("*.meta", make_base_path(State)) ]. Reminder:  Base  path  is  “udon_data/$partition_id”
  22. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  23. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  24. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  25. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  26. Implementing  udon handoff Per  object,  find  data,  serialize  and  send

     it: Do = fun(Object, AccIn) -> MPath = path_from_object(Base, Object, ".meta"), Meta = get_metadata(MPath), %% TODO: Get all file versions {ok, LatestFile} = get_data(State, Meta), VisitFun( ?KEY(Meta#file.path_md5), {Meta, LatestFile}, AccIn), end,
  27. Implementing  udon handoff handle_handoff_command( ?FOLD_REQ{foldfun=VisitFun, acc0=Acc0}, _Sender, State) -> AllObjects

    = get_all_objects(State), Base = make_base_path(State), Do = fun(Object, AccIn) -> %% Already seen this... end, Final = lists:foldl(Do, Acc0, AllObjects), {reply, Final, State};
  28. The  mysterious  VisitFun • riak_core_handoff_sender:visit_ item/3 • Does  the  work

     of  calling  your  serialization   callback  (encode_handoff_item/2)  with  the   {Bucket,  Key},  Data  as  parameters • Sends  the  serialized  data  to  the  receiver
  29. Receiving  handoff  data handle_handoff_data(Data, State) -> {Meta, Blob} = binary_to_term(Data),

    R = case Meta#file.csum =:= erlang:adler32(Blob) of true -> store(State, Meta, Blob), ok; false -> {error, file_checksum_differs} end, {reply, R, State}.
  30. Recent  riak_core resources • Riak_core in  small  bytes:   http://marianoguerra.github.io/presentations/riak-­‐core-­‐

    small-­‐bytes-­‐berlin-­‐efl-­‐2014.html • YouTube  video  of  above  talk:  https://youtu.be/rJXM33EieGQ • FlavioDB project:  https://github.com/marianoguerra/flaviodb • Udon project:  https://github.com/mrallen1/udon • Older  resources  are  still  useful  from  a  conceptual   understanding  but  often  are  “bitrotted”  from  a  running  code   perspective.