Essential toolset for analytics

Self-service, fail-safe exploratory environment for collaborative data science workflow

Customizable authenticationSecured access to personal exploratory environmentEasy to use Web Interface
Leverage the power of analytical toolsCollaboration capabilities. Multiple Git repositories supportControl Cloud Services Usage
Amazon Web Services
Amazon Elastic MapReduce, EMR
Jupyter Notebook
Apache Zeppelin
Python
MongoDB. Cross-platform document-oriented database program
TensorFlow. An open source machine learning framework
RStudio

Problems We Solve

Data access problem

+

Problems with tools

+

Increased IT handholding

+

Lack of security

+

Data access problem

Solve data access problem
  • Access to full datasets is on “need to know” basis
  • Access to production data is limited
  • Data access is governed by corporate/BU/department policies

Problems with tools

Solve problems with tools
  • Installation of tools difficult and expensive (lack of ops & security skills)
  • Latest & greatest tool versions aren’t available in the shared environment
  • Lack of collaboration: everyone has its own silo

Increased IT handholding

Increased IT handholding
  • Data sciences are bound to local machines, due to the cost of IT support
  • Lack of self-services leads to stream of requests to ops teams
  • Resources allocation monitoring and security control in the shared environments

Lack of security

Solve lack of security
  • Local data copies are not secure (DS get copies of data on their local machines)
  • Cloud security policies for data protection are difficult to implement
  • Cutting edge tools require enterprise hardening (tools have to be secured)

Our Solution

Self-service and customization

Self-service and customization

  • Self-provisioning of the environment
  • Plug-in proprietary components; change tool versions
  • No dedicated IT Support is involved

Secured and fail safe

  • Break it and simply re-provision a fresh environment
  • Provides a client’s security perimeter
  • Security policies and access controls
Secured and fail safe
Collaborative workflow

Collaborative workflow

  • Collaboration between data science team members
  • Collaboration between data science and engineering teams
  • Easy path from R&D to Production

Exploratory environment

  • Best open source data tools
  • Horizontally scalable compute
  • Easy access to data and metadata
  • Training / workshops
Exploratory environment
Flexible deployment architecture

Flexible deployment architecture

  • Public cloud: Azure & AWS
  • No platform or vendor lock-in

Features

Customizable authentication Customizable authentication
Secured access to environment Secured access to environment
Easy to use Web Interface Easy to use Web Interface
Leverage the power of analytical tools Leverage the power of analytical tools
Data visualization with data science libraries Data visualization with data science libraries
Manage analytical environment Manage analytical environment
Install libraries and dependencies Install libraries and dependencies
Multiple Git repositories support Multiple Git repositories support
Control Cloud Services Usage Control Cloud Services Usage
Customizable

Customizable authentication

  • Open LDAP
  • Active Directory
  • Cloud Identity and Access Management service user validation
Secured access to personal exploratory environment

Secured access to personal exploratory environment

  • Gateway node to access analytical environment through secured tunnel
  • Private key for identity control and encryption
Easy to use Web Interface

Easy to use Web Interface

  • Single-click life-cycle management of analytical tools and computational resources
  • Dashboard and in-grid filtering
  • Built-in “how-to” pages
Leverage the power of analytical tools

Leverage the power of analytical tools

  • Jupyter
  • Zeppelin
  • RStudio
  • Tensor Flow
  • Deep Learning
  • Add computational power to your jobs by deploying elastic Spark cluster on memory, storage, compute, GPU optimized instances
Run your analytics effectively

Run your analytics effectively

  • Python v2/v3, Scala, Apache SparkR support
  • Data visualization with pre-provisioned data science libraries
  • Cloud Storage compatibility
Manage analytical environment

Manage analytical environment

  • Start / Stop / Terminate notebooks
  • Add / Remove computational resources
Manage analytical environment

Manage analytical environment

  • Enrich analytical experience installing libraries and dependencies
Collaboration capabilities

Collaboration capabilities

  • Shared codebase
  • Shared Storage
  • Shared Wiki
  • DLab Web UI for repositories access management
  • Build in UnGit for easy to use source collaboration
Control Cloud Services Usage

Control Cloud Services Usage

  • In-grid billing functionality
  • Details billing report
  • Admin mode to see all users
×

You can download DLab from GitHub

Our DLab sources is free to use!

Apache License

Version 2.0, January 2004

Fail-safe exploratory environment for collaborative data science workflow