Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to use AWS Lambda in Document Processing Pi...

How to use AWS Lambda in Document Processing Pipeline

How to use AWS Lambda in Document Processing Pipeline

本文はこちら

https://gist.github.com/suzuken/6033c20a3a3c9e0f5354b88f405240f5

Avatar for Kenta Suzuki

Kenta Suzuki

April 22, 2016
Tweet

More Decks by Kenta Suzuki

Other Decks in Technology

Transcript

  1. How to use AWS Lambda in Document Processing Pipeline @suzu_v

    VOYAGE GROUP 2016/04/22 at AWS Tokyo Office
  2. ࢲʹ͍ͭͯ • ͚ͣ͢Μ, https://github.com/suzuken, @suzu_v • GopherͰ͢ / ࠓ೔͸Javaͷ࿩Λ͠·͢ •

    http://fluct.jp Ͱ޿ࠂ഑৴ / ղੳج൫ͷιϑτ ΢ΣΞΤϯδχΞΛ͍ͯ͠·͢
  3. ߏ੒ཁૉ • WebΫϩʔϥ (EC2 / Go): URLΛࢦఆͯ͠ίϯςϯπΛऔಘ͢Δ daemon • ຊจநग़ػ

    (Lambda / Java8): ຊจͰ͋Δͱਪఆ͞ΕΔ෦෼Λ ൈ͖ग़͢ • Lucene / KuromojiΛ͔͍͍ͭͨͷͰ • ෼ྨث (EC2 / Go): ຊจ΍ϖʔδ͔ΒಘΒΕΔ৘ใΛݩʹจষΛ ΧςΰϦ෼͚ͳͲΛ୲౰͢Δ • υΩϡϝϯτετΞ (EC2 / Elasticsearch): Ϋϩʔϧͯ͠෼ྨࡁΈ ͷίϯςϯπΛ֨ೲ͠ɺݕࡧՄೳʹ͢Δ • API (EC2 + ELB / Go): ෼ྨ݁ՌΛฦ͢internalͳHTTP API
  4. ΞʔΩςΫνϟ Kinesis StreamΛॏๅ͍ͯ͠·͢ • ϐʔΫͰ~100MB/sͰΫϩʔϥ͕ίϯςϯπΛ fetch • ͦΕΛ௚઀Kinesis StreamʹPutRecordsͰૠೖ •

    Ϋϩʔϥ͸Go੡ (with aws-sdk-go)ɺॻ͖ࠐΈ ͷϦτϥΠ΍όοϑΝϦϯά΋͍ͯ͠Δ • ႈ౳ੑͷ୲อ͸ElasticsearchͰ
  5. ͳͥLambda͔ • Kinesis Streamͱͷ࿈ܞ͕؆୯ • ݕূʹ΋Άͪͬͱ৽͍͠Lambda Function࡞Ε ͹͍͍ͷͰखܰ • Kinesis

    Streamͷσʔλ͸shardʹσʔλ͕͋ ΔͷͰಉ͡σʔλͰͷςετ΋खܰ • Testing in Production (Data)
  6. ࣮૷ྫ in Java KinesisͷϨίʔυܗࣜͱରͱͳΔPOJOΦϒδΣΫ τΛ࡞੒ public class KinesisMessageModel implements Serializable{

    public String id; public String url; public String body; public String title; public String description; // ... } see: ྫ: ϋϯυϥʔͷೖग़ྗʹ POJO Λ࢖༻͢Δ (Java) - AWS Lambda
  7. σʔλΛՃ޻ͯ࣍͠ͷKinesis Stream΁ public class Boiler { // Kinesis Stream͔ΒͷσʔλΛ͏͚ͱΔϋϯυϥ public

    void recordHandler(KinesisEvent event) throws IOException { PutRecordsRequest putRecordsRequest = getPutRecordsRequest(this.kinesisOutputStreamName); List<PutRecordsRequestEntry> putRecordsRequestEntryList = new ArrayList<>(); // 1ͭͷeventʹ͸ෳ਺ͷϨίʔυ͕ೖ͍ͬͯΔ batch sizeͰઃఆՄೳɻ for(KinesisEventRecord rec : event.getRecords()) { KinesisMessageModel record = toClass(rec); PutRecordsRequestEntry putRecordsRequestEntry = new PutRecordsRequestEntry(); // ϨίʔυͷՃ޻ʢ࣮ࡍʹ͸͜͜Ͱຊจநग़Λ͍ͯ͠·͢ʣ ByteBuffer data = ByteBuffer.wrap(new ObjectMapper().writeValueAsString(record)); putRecordsRequestEntry.setData(data); putRecordsRequestEntry.setPartitionKey(record.getSomeKey()); putRecordsRequestEntryList.add(putRecordsRequestEntry); } // ࣍ͷKinesis Stream΁ͷPutRecordsͷ૊Έཱ͍ͯͯΔ putRecordsRequest.setRecords(putRecordsRequestEntryList); PutRecordsResult putRecordsResult = this.kinesis.putRecords(putRecordsRequest); } }
  8. Java࣮૷ͷॴײ • ͬ͘͞ͱॻ͘ͳΒnode.jsͷ΄͏ָ͕ • Javaͷ৔߹͸blueprint͕ͳ͍ & Lambda Console͔Βͬ͘͞ ͱࢼ͢͜ͱ͸Ͱ͖ͳ͍ •

    ύοέʔδϯά͸MavenͰ΍͍ͬͯͯɺMaven Shade PluginͰ uber jarΛ͓͍͍ͭͬͯͯ͘·͢ɻ • uber jar: ґଘϥΠϒϥϦͳͲΛશ෦1ͭͷjarʹ͍Εͨjarͷ͜ͱ • ܗଶૉղੳ༻ͷࣙॻ΋jarʹ͍Ε͍ͯ·͢ Lambda ؔ਺ϋϯυϥʔ (Java) - AWS Lambda Apache Maven Shade Plugin – Introduction
  9. Lambda࡞੒: aws-cli aws lambda create-function --region ap-northeast-1 --function-name my-lambda-function --code

    S3Bucket=mybucket,S3Key=path/to/my.jar --role arn:aws:iam::999999999999:role/lambda_kinesis_rw --runtime java8 --handler com.your.app.Handler::recordHandler --description "my kinesis stream!" --timeout 15 --memory-size 512 aws lambda create-event-source-mapping --event-source-arn arn:aws:kinesis:ap-northeast-1:999999999999:stream/your-stream --function-name my-lambda-function --enable --batch-size 100 --starting-position TRIM_HORIZON
  10. σϓϩΠํ๏: aws-cli • Pull Request -> merge -> build (on

    Travis CI) - > S3 • Travis CIͰuber jarΛ͍ͭͬͯ͘·͢ • ͋ͱ͸ update-function-code Ͱ൓ө aws lambda update-function-code --function-name my-lambda-function --s3-bucket mybucket --s3-key path/to/my.jar
  11. LambdaͰͷϩΪϯά • Log4jΛ͔͍ͭͬͯ·͢ • ΤϥʔϩάͳͲ͸CloudWatch Logs͔ΒݟΔ͜ͱ ͕Ͱ͖Δ • खݩͰ͸࠶ݱ͠ͳ͍ෆ۩߹ͳͲ͕͋Δ৔߹ʹ͸ CloudWatch

    Logs͔ΒݟΔ͜ͱ AWS Lambda ͷ Amazon CloudWatch ϩά΁ͷΞ Ϋηε - AWS Lambda ϩΪϯά (Java) - AWS Lambda