Upgrade to Pro — share decks privately, control downloads, hide ads and more …

R/Pharma RStudio Connect Admin Workshop

kellobri
November 01, 2021
170

R/Pharma RStudio Connect Admin Workshop

RStudio Connect Admin Training Workshop (Virtual) for R/Pharma 2021

kellobri

November 01, 2021
Tweet

Transcript

  1. Workshop Agenda • Part 1: Get Set Up for Success

    • Part 2: The Admin Experience • Part 3: Preview of things to come
  2. What is the purpose of RStudio Connect? 1 Publishing Deploy

    R or Python content via a variety of m ethods: push-button, CLI, git-backed, or Server API. 2 Execution Publish static content, or source code-backed item s. Set up docum ents to run on a schedule, control resource allocation to interactive applications and APIs. 3 M anagem ent Control the m etadata associated w ith a content item . Publish new versions in place, roll forw ard/backw ard. Access scheduled report history. Add organizational tags. Track the usage m etrics. 4 Distribution Add view ers and collaborators. Set a vanity URL. Schedule a custom em ail to be set on success criteria.
  3. Content Basics All the work you do in R &

    Python: “Data Products” • Applications ◦ Shiny ◦ Dash, Streamlit, Bokeh • Documents ◦ R Markdown ◦ Jupyter Notebooks ◦ Static content: sites, plots, graphs • Pins • Web APIs (RSC Standard & Enterprise) ◦ Plumber ◦ Flask, FastAPI ◦ Tableau Analytic Extensions: plumbertableau, fastAPItableau • Models
  4. User Roles • Administrators ◦ Have all privileges, but must

    explicitly grant themselves content access ◦ Actions are audited ◦ Special access to the Admin tab and certain content settings • Publishers ◦ Can upload new content items • Viewers ◦ Can see content items • Content Collaborators ◦ Can publish new versions ◦ Manage settings ◦ Download source bundles (code) • Content Viewers ◦ Can only see and interact with the content itself Content Privileges
  5. RStudio Connect Overview • Demo of Publishing Mechanisms ◦ Push-button

    (RStudio IDE, Jupyter Notebooks) ◦ Git-backed (manifest generation) ◦ Programmatic (Azure DevOps example) • Demo of Application Permissions Management ◦ Users ◦ Groups • Demo of Admin Dashboard Functionality ◦ Metrics ◦ Process listing (updated) ◦ Tags ◦ Audit Logs ◦ Unpublished Content ◦ Scheduled Content Calendar
  6. Supported Linux Distributions • RHEL/CentOS 7 & 8* • Ubuntu

    18.07 LTS & 20.04 LTS • SLES 12 SP5 • SLES 15 SP2 / openSUSE 15.2 *Distributions such as Rocky Linux and AlmaLinux can be used as long as they stay 1:1 binary compatible with RHEL 8. CentOS Stream is not supported by RStudio.
  7. RStudio Connect & Docker https://github.com/rstudio/rstudio-docker-products RStudio products are designed to

    live on long-running Linux servers. RStudio products are entirely compatible with treating a container like the underlying Linux server to better encapsulate dependencies and diminish server statefulness. In this model, each RStudio product is placed in its own long-running container and treated as a standalone instance of the product. Multiple containers can be load-balanced and treated as a cluster. These containers can be managed by a Kubernetes cluster, should you wish. There are some specific considerations for running RStudio products in containers, which are detailed in this article.
  8. Types of Evaluations 1. Not very useful to Admins: RStudio

    Hosted Evaluation https://www.rstudio.com/products/connect/evaluation/ 2. Useful but you have to DIY: 45-day Evaluation Key https://www.rstudio.com/products/connect/download-commercial/
  9. Authentication Decision Making Authentication Provider Type User Self-Registration Authorization via

    Groups Current-User Execution Password (built-in) Yes (can also be disabled) Groups must be managed locally in Connect No PAM No Groups must be managed locally in Connect Yes (per-app basis) LDAP/AD No Groups can come from the Provider or be local to Connect No SAML No Groups can come from the Provider or be local to Connect No OIDC - Google No Groups must be managed locally in Connect No OIDC - Others No Groups can come from the Provider or be local to Connect No Proxied through an external service No Groups can come from the Provider or be local to Connect No
  10. The Key to Configuration: Publisher Relations The most successful RStudio

    Connect Installations require open dialog between admins and publishers. • Do you know who your Publishers are? • Do your Publishers know who you are? • Do you know what types of content they will be publishing? • Do you know what versions of R and Python they'll need? • Do you know how they plan to connect to data sources? This is also your chance to make any Dev/SecOps policies and expectations known.
  11. More Publisher Discussion Topics Connect Access Restrictions • Will you

    place limits on allowed viewership? (all, logged_in, acl) ◦ Applications.MostPermissiveAccessType ◦ Applications.AdminMostPermissiveAccessType Vanity URL Management • Will Publishers be allowed to set vanity URLs for content? ◦ Authorization.PublishersCanManageVanities Content Organization: Tags • Work with your Publishers to set up a Tag Schema for content organization
  12. Scaling Applications • R is single threaded • Load balance

    between processes • Application owners can set Minimum and Maximum processes • Use Scheduler.MinProcessLimit to cap resources if this becomes a problem • Scheduler.MaxProcessLimit is also available
  13. Application Timeouts • The maximum amount of time to wait

    for an app to start Scheduler.InitTimeout = 60s • The minimum time to keep a worker process alive after it goes idle Scheduler.IdleTimeout = 5s After the last user disconnects from a process, RStudio Connect waits 5s before that process is reaped. You might want to increase Scheduler.IdleTimeout if you have a process that is resource-intensive to start up.
  14. Report Concurrency • Applications.ScheduleConcurrency (default: 2) • Maximum number of

    scheduled reports to run in parallel • Setting this to zero will disable scheduled execution This lets you control (throttle) scheduled content execution • If all your publishers schedule reports to run at midnight, Connect will iterate through them as quickly as possible.
  15. Disk Usage Resource Management Things Connect stores on disk: •

    Content bundles (uploaded compressed bundles from users) • Unzipped bundles for running applications • Package cache ◦ One copy of each version of each package specific to the R (or Python) version • Metrics (RAM and CPU usage) • R/Python process information/logs
  16. Content Bundle Retention Throttle the number of bundles retained for

    each content item • Applications.BundleRetentionLimit (default 0, which retains everything) If you experience problems with large bundles: • Ask publishers not to package large sets of data in the content bundle and provision data on the server separately
  17. Process Information Retention • Maximum number of jobs preserved on

    disk for any one application: ◦ Jobs.MaxCompleted (default: 1000) • Maximum age of a completed job retained on disk: ◦ Jobs.OldestCompleted (default: 30d) On-disk job metadata is removed if either the MaxCompleted or OldestCompleted restrictions are violated. Adjust this retention window based on your auditing requirements.
  18. How will your publishers deploy to Connect? Three ways to

    publish content to RStudio Connect:
  19. Importance of an Environment Management Strategy Environment management takes work.

    Here are some cases where the reward is worth the effort: • When you are working on a long-term project, and need to safely upgrade packages. • In cases where you and your team need to collaborate on the same project, using a common source of truth. • If you need to validate and control the packages you’re using. • When you are ready to deploy a data product to production, such as a Shiny app, R Markdown document, or plumber API.
  20. Private Packages Many organizations find value in hosting their own

    package repository. Hosting an internal repository allows organizations to: • Share and version their internal packages • Access and govern packages from external sources • Audit package use
  21. Validated Environment Management Recommended Exercises: ❏ Review the curated resources

    and recommendations for Using R for Validated Work ❏ Can you recreate your environment? ❏ Can you trust the things in your environment? ❏ Learn about the Validated Environment Strategy ❏ Learn about Internal Package Repositories
  22. Reproducibility & Environment Strategy Maps To select a strategy, you

    need to answer two questions: • Who is responsible for managing the environment? • How open is the environment?
  23. Custom Branding • Replace the RStudio logo and favicon with

    your own. • Direct logged-in users to a landing page of your choice when they first enter RStudio Connect. • Generate custom content landing pages with R code using connectwidgets. • Customize what anonymous and logged-out users see when they visit your server. • Control email settings such as sender display name, “from” address, sender address headers, and subject prefix. • Hide the Documentation tab from viewers.
  24. Custom Landing Pages Create a custom landing page that all

    anonymous or logged-out users will see. Workbook Exercise: Use the Server.LandingDir configuration setting to specify the path to a directory that contains index.html and all assets (CSS, images, javascript, etc.)
  25. Other Types of Custom Landing Pages Landing Pages for Logged-in

    Users • Server.RootRedirect (Default: The Server.Dashboard path) The URL logged-in users will be redirected to when visiting the public URL used to access the server. • Server.DashboardPath (Default: "/connect") The URL path name to be used where RStudio Connect's dashboard is hosted. One option for creating a custom landing page is to make a content showcase with the connectwidgets R package.
  26. Unsupported Customizations (November 2021) • RStudio Connect dashboard color palette

    • Hiding Tags from viewers • Removal of footer text that says “Powered by RStudio Connect” • Removal of RStudio copyright information
  27. Special Considerations for Consultancies (External Users) • Branding and Landing

    Page Customization • Managing multiple clients ◦ User Isolation: Authorization.ViewersCanOnlySeeThemselves, Server.HideEmailAddresses ◦ Viewer Restrictions: Server.ViewerKiosk When enabled, users with viewer role will not be allowed to submit permission requests for content access or to request elevated role privileges. • Multiple authentication providers ◦ Federated authentication: RStudio Connect will authenticate against an external identity provider (usually via SAML), and the provider will federate identity management to all the different authentication providers.
  28. Golden Rules of RStudio Connect Configuration ❏ Check your configuration

    file: Is Server.Address set? ❏ Verify your email server configuration: Send a test email ❏ Maintain an open dialog with your publisher users ❏ Before you start publishing content: ❏ Make an informed decision about your authentication provider ❏ Make an informed decision about your package repository ❏ Life is better with Package Manager or an Internal Repository
  29. License Management • RStudio Connect uses the license-manager to determine

    if a valid license is available: sudo /opt/rstudio-connect/bin/license-manager status • The Connect dashboard will display a notification to admins and publishers when the license is within 15 days of expiration. • You can disable this with Licensing.ExpirationUIWarning
  30. User Management • Adding Users ◦ Accounts can be either

    created / pre-provisioned or auto-registered. Details and capabilities differ by authentication provider. ◦ Example: Server API driven user provisioning • Locking Users ◦ Forbids login and publishing ◦ Removes user from your license count ◦ Example: Server API documentation • Removing Users ◦ Last resort option ◦ Could Require content ownership migration
  31. Group Management • Local Groups ◦ Manage through the UI:

    “People” tab ◦ Manage with the RStudio Connect Server API ◦ Disable local group support with: Authorization.UserGroups (existing groups must be removed) • Remote Groups ◦ Management is the responsibility of of the external authentication provider ◦ Group memberships are locally synchronized through successful login events Note! Having a mix of Local and Remote groups on your server is not recommended. Migrate completely from one mode to the other when making a change.
  32. RStudio Connect API Keys • Programmatically access content on RStudio

    Connect and use the Server API • API Keys are associated with users, not content Resources: • Server API documentation • Server API Cookbook
  33. Setting up Programmatic Deployments DEMO: Azure DevOps Pipelines for content

    deployments Additional Resources: • Publishing Methods Explained • Publishing to RStudio Connect with Github Actions
  34. Making Announcements RStudio Connect provides several methods for posting custom

    HTML messages to the User Interface: • Server.PublicWarning - Visible on the unauthenticated landing page • Server.LoggedInWarning - Visible above recent content when logged in ◦ Useful for things like scheduling maintenance windows
  35. End of Support for Python 2 (January 2022) Starting January

    2022, RStudio Connect will no longer support Python 2. Factors that have gone into our decision include the following: • Python 3 is now widely adopted and is the actively-developed version of the Python language. • In January 2021, the pip 21.0 release officially dropped support for Python 2. • A large number of projects pledged to drop support for Python 2 in 2020 including TensorFlow, scikit-learn, Apache Spark, pandas, XGBoost, NumPy, Bokeh, Matplotlib, IPython, and Jupyter notebook.
  36. Other Server API Project Ideas • Build a report examine

    access control list details for each content item on your RStudio Connect server Example • Audit all the unpublished (orphaned) content items on your RStudio Connect server Example • Audit all the vanity URLs currently in use on your RStudio Connect server Example • Audit all the tags currently in use on the server, and list all the tagged content items Example
  37. Content Usage Data & Tracking Example Shiny Applications: • Records

    information about each visit and the length of that visit Other Content: • Records information about each visit: user, timestamp, content rendering info
  38. Managing RStudio Connect Upgrades • RStudio Connect versions are supported

    for 18 months • We recommend upgrading at least once a year. • Most upgrades should require less than five minutes unless breaking changes have occurred in the interim and require configuration adjustments. • Consult the release notes before undergoing an upgrade.
  39. Performing an Upgrade Download and run the installation script The

    installation script works across all supported Linux distributions, validates the GPG key of the downloaded package, and includes support for offline use. Example: curl -Lo rsc-installer.sh https://cdn.rstudio.com/connect/installer/installer-v1.9.5.sh sudo -E bash ./rsc-installer.sh 2021.10.0
  40. RStudio Product Support Submit a Support Ticket: https://support.rstudio.com/hc/en-us/requests/new Generate a

    server diagnostic report: If you are on RStudio Connect version 1.7.2 and later, run the following command on the server and send us the output: sudo /opt/rstudio-connect/scripts/run-diagnostics.sh /path/to/output/dir
  41. RStudio Connect Investments Short Term Future Vision: Data scientists own

    the publication, execution, management, and distribution of their work in a safe and sophisticated manner, fully sanctioned by their IT admins. Strategic Goals: Increase the types of content available to share, improve content discovery and management, and facilitate production deployments. Early 2022 Administrators can enable remote content execution on a Kubernetes back-end while maintaining easy self-serve publishing. • Publishers are able to drive viewer engagement on their work • Publishers can manage process automation workflows • Feature parity for Python users • Extend Cloud Native capabilities to ease integrations • Improvements to Docker-friendly installation October 2021 BI Integration: Extend Tableau dashboards with R, Shiny and Python
  42. Invitation to the Beta Program for Off-Host Execution • Begins

    in December, runs until the GA launch in early 2022 • Beta will not have feature parity with RStudio Connect local execution Sign-up form Requirements: • A Kubernetes cluster where you have full cluster-admin privileges • A PostgreSQL database that meets Connect’s requirements • An NFS server that meets Connect’s shared storage requirements • Willingness to provide feedback on the installation/configuration process • Publishers who are willing to provide feedback