Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Architecting 3D content: what video structuring can teach us about the metaverse SEO [an experiment

Architecting 3D content: what video structuring can teach us about the metaverse SEO [an experiment

The metaverse is coming, and it is about time to get ready. Applying computer vision logic to video structure helps with content distribution on YouTube Shorts. This is true if the content is carefully developed by using computer vision principles. More than 90% of the likes and views can come as a result of YouTube search features because their algorithm comprehends video content better with the help of advanced 3D optimization strategy. In this session, Emilija will show you what video structure is teaching us about 3d content & metaverse SEO. So you will be well prepared for what is to come in the near future.

Emilija Gjorgjevska

March 21, 2023
Tweet

Transcript

  1. The experiment: swiss raclette servings from different angles, in different

    setups w/ different ingredients, incorporated in one video. 2
  2. Research-backed & business-focused computer science engineer, specialising in content engineering

    & end-to-end SEO studied, worked, published in visited what shaped me as an SEO 5
  3. Worked in entertainment and learned creative approaches from award-winning journalists,

    video editors & producers, famous faces, marketing directors since 2010, thanks to Macedonian Idol, Dragi Nedelchevski, Igor Tomeski, Samir Ljuma. what shaped me as an SEO 6
  4. my imperative is that marketing is the GENEROUS act of

    helping other people achieve their goals and an opportunity to SERVE 8
  5. Excerpt(s) Summary: more engagement means better ranking. Evidence: “...means for

    determining a user engagement value for the media item based on at least one user shares of the media item, user indications of interest in the media item, user comments on the media item…” 11
  6. Excerpt(s) Summary: more engagement means better ranking. Evidence: “...A score

    for a media item is computed by determining a plurality of positive user actions associated with the media, combining a plurality of score contributions from the plurality of positive user actions to determine a value for the score, and applying an exponential decay to the value for the score. The media items are ranked based on the scores…” 12
  7. Excerpt(s) Summary: embedding videos can lead to better ranking. Evidence:

    “...A user may thereby be presented with relevant search results, ranked by or including content sharing data, for example how often the item of content has been shared, and possibly in conjunction with other ranking data (internal and/or external reference).” 14
  8. What if I ignore classic SEO tips + these G-patents

    for video ranking optimization? 15
  9. What could be so special about a video that has

    nearly 700+ views and 20+ likes on YouTube until today? 16
  10. 18

  11. Excerpt(s) Summary: demonstrated ability to detect and predict 3D shapes.

    Evidence: “...The core novelty of our method is a fast, single-pass architecture that both detects objects in 3D and estimates their shapes…Thus our model is able to extract shapes without access to groundtruth shape information in the target dataset.” 19
  12. My real video talk - verified by Yandex leak too!

    Lacking: no synonyms for swiss raclette, no tags, no comments, no shares, no description, no ad or email campaign, no user signals, no social signals, no language settings, no captions, no start and end screens, undefined location/geography, no established niche youtube channel fan-base (core audience), never had a history of channel advertising in any way, no embeds, no schema markup, no backlinks, no link text on YouTube, no link depth, URL length and slug were defined by Google, my channel & personal region did not overlap with Switzerland, no tagged products, not even in a playlist! 23
  13. My real video talk Implemented: short video (filtered multiple public

    Instagram Swiss Raclette videos combined together), verified host (Google), simple filename (raclette.mp4), good objects, quality title (longer one combining topical entities = lemmas), no prohibited content, had YT channel with some history already in place, was nearby in Germany during video’s lifetime. Approach: apply computer science knowledge to video. 24
  14. Can we deconstruct videos by using computer vision? Yes, computer

    vision technology can be used to deconstruct videos. Computer vision algorithms can analyze video frames to extract and process visual information, such as object detection, image segmentation, optical flow, etc. 28
  15. Popular computer vision algorithms for video deconstruction 1. Object Detection:

    YOLO, Faster R-CNN, RetinaNet. 2. Image Segmentation: Mask R-CNN, U-Net, DeepLabv3+. 3. Optical Flow: Farneback, Lucas-Kanade. 4. Action Recognition: Two-Stream Convolutional Networks, Temporal Segment Networks (TSN), 3D Convolutional Neural Networks (3D-CNN). 5. General Video Analysis: Keyframe Extraction, Video Summarization. 6. Object Tracking: KCF, Deep SORT, GOTURN. 29
  16. 1. Object character recognition or extracting written text from video

    (could be books, notes, shops’ names). 2. Object categorization or organizing objects by their look, shape, texture (items, people, animals, stuff). 3. Automatic speech recognition or what is said during the video. 4. Audio or other relevant sounds that can help grasp the topic of the video to match better (example: water, forest...), also in scene understanding. 5. Sentiment understanding or emotions during the video. 6. Safe search classification and which color scheme is used. 7. Even certain movements that people make like the “what’s the time gesture”. 30
  17. ..a paradox! Assume that everything can be detected and identified.

    Assume that everything can be misinterpreted or inappropriately tagged or classified. Google has a lot of data and engineering resources that we cannot get or implement but we have a… V/S 31
  18. “The experiments we performed are clearly indicating that even when

    we use advanced algorithms like YOLO, there’s still a space for the objects to be incorrectly labeled in visual environments like videos and metaverse spaces (virtual reality and augmented reality platforms). Having this in mind, we need to find a more structured way of providing 3D information to search engines.” - Emilija Gjorgjevska, WordLift’s blog https://wordlift.io/blog/en/metaverse-seo/ 3D schema markup 33
  19. Everything matters Beware: Sometimes the story behind the video won’t

    allow optimizing for objects etc. in video. User signals like comments, likes, subscribers, and so on matter a lot. However, the reach to other audiences without a solid basis that focuses on how the video is created in the first place is limited. 36
  20. Connection to the metaverse ”You’ll own the things you create,

    build out and earn in the metaverse. Even more importantly, you will be able to monetise it. Creators will be highly incentivised to be present and create in this space.” 39
  21. “We can now do generative AI for images. We can

    do it for videos. At the rate that it’s moving, you’ll do it for entire villages; 3D villages and landscapes and cities and so on. You’ll be able to assemble an example of an image and generate an entire 3D world.” 41 Connection to the metaverse
  22. 1. Harder to reverse engineer. 1. Harder to replicate. 1.

    Usually overlooked. 1. Usually lacking strategy. 43
  23. ...questions? *all images are found on the Internet, except the

    YT channel one and are not used for commercial purposes 46