NetApp DataSense on AWS

Over the last years, demands and expectations on data security have increased dramatically. The main drivers are local data privacy regulations like the EU-GDPR, which imply awareness of sensitive data and an overview of potential risks.

Amazon has offered its service Macie in 2017, added new capabilities lately, and is doing a great job with data on S3. But how do you include your EFS/FSx file shares and RDS databases to eliminate blind spots? Meet NetApp Cloud DataSense.

NetApp Cloud DataSense

NetApp Cloud DataSense offers a SaaS dashboard on data compliance (for various standards), governance, and inventory. It can be deployed either on AWS or even on-premises to scan and classify data, never sending back contents.

The service does not only supports S3 but also NFS and CIFS-based data shares like EFS and FSx. It also can scan OneDrive and, soon, Sharepoint Online and Exchange Online. This combination creates a central point for raising awareness of data risks, sensitive data, and finding entries for specific people (EU Right to be Forgotten).

And the price? Apart from the needed EC2 instances, it is free for 1TB of data and continues at moderate pricing after this.

So let’s see how to deploy this.

Step 1: NetApp Cloud Manager

As a SaaS solution, you will first have to sign up for an account at NetApp Cloud Central - without any associated cost. The Cloud Manager interface offers access to DataSense and all the different NetApp services, including data replication and backup solutions.

Cloud Manager Canvas

Step 2: Connector Setup

Next, you have to deploy a connector instance that bridges the gap between your AWS Account/VPC and the NetApp SaaS service. With a working connector, you can deploy services from the Cloud Manager SaaS interface - such as DataSense.

There are two distinct deployment models:

AWS Marketplace:

You create an Instance Profile with the needed privileges and launch the connector instance via Marketplace. This way of deployment has the advantage of not hardcoding access keys but using the automatically rotated credentials instead. It will also limit our access to DataSense later, requiring VPN connections to your VPC.

Cloud Manager:

By creating an IAM user with a matching policy and their access keys added to the Cloud Manager SaaS interface, all deployments can happen automatically. Also, you will have direct access to DataSense in the SaaS portal.

As always, please review the linked policies, check if they are the most recent version and if you agree to their (broad) privileges.

Instance sizes start at t3.xlarge, which makes running the Connector reasonably cheap. As soon as the Connector is ready, you will automatically see your S3 buckets pop up in the Canvas.

You could also add your Cloud Volumes Service for AWS, FSx for ONTAP, ONTAP Select or even on-premises storage systems/shares for data replication and backup. But that’s beyond the scope of this document.

Other NetApp services

Cloud DataSense Deployment

Now that we have our AWS account integrated with the NetApp solutions, we can deploy additional products such as DataSense.

We need to create an EC2 instance role first, which will be extended by policies granting access to the individual AWS services. If you want to scan S3 buckets, you will need a variation of the provided S3 Policy.

To bypass your NAT Gateways, the clear recommendation is to create an S3 Endpoint for your VPC.

Deploying via NetApp CloudManager is now a wizard-led task initiated from the NetApp Canvas under the “Data Sense” menu.

To enable the functionality of DataSense, you finally need to target it to your discovered S3 buckets and classify them into “Map” (only work with document metadata) or “Map + Classify” (inspect actual contents, e.g., for PHI/PII detection).

Any additional data sources can be added in guided steps via the various wizards under “Data Sense / Configuration”.

Adding Data Source

Instance Size

By default, the DataSense instance runs on an m5.2xlarge which is the recommended size. But for smaller environments, eight vCPUs and 16 GB memory are enough and supported. In these cases, c5a.2xlarge or t3a.2xlarge instance types might be sufficient and save 60% of the cost. 1

Working with Data

In the next post related to NetApp Cloud DataSense, I will go into the various ways of scanning and inspecting your data across different storage providers.

Similar Posts You Might Enjoy

Testing Terraform with InSpec (Part 2)

In this post, we will set it all up for easy working in Visual Studio Code. Let’s start! - by Thomas Heinen

Testing Terraform with InSpec (Part 1)

While Infrastructure-as-Code slowly becomes omnipresent, many of the communicated advantages of the approach stay mostly unrealized. Sure, code style checks (linting) and even automated documentation get more common every month. But one of the cornerstones often gets ignore: testing. Let’s see which types of code testing are available and how to do it without writing too much code. The promise of the Infrastructure-as-Code (short: IaC) movement is to handle infrastructure just as if it was a program. - by Thomas Heinen

Map out your IAM with PMapper

Writing “Least Privilege” policies is an art in itself, but it inevitably leads to a large number of JSON-based policies in your accounts. As one of the rules of good security is “low maintainability = low security”, let’s dive into tools which can show us risks inside our policies - which might even result in paths to administrative privileges! - by Thomas Heinen