If you have a question regarding Glean's support for AWS (outside of what is covered below), please reach out to Glean Support, or your designated Glean contact.
General
Is AWS region _____ supported?
Generally yes. We now support any region that offers all of our required services.
Certain regions have limitations. For example, as of early April 2024, VPC endpoints for specific managed services may be available only in us-east-1 and us-west-2.
We currently do not support GovCloud regions yet. We have no immediate time frame on when we will support this.
LLM Support
Can we choose our LLM?
Yes. As long as the LLM is supported by Glean, you can leverage it. Glean offers the following options:
- Anthropic Claude (via Bedrock - recommended)
- Anthropic Claude (BYOK)
- GPT - BYOK Azure OpenAI
- GPT - BYOK OpenAI
- GPT - Glean Account Key (additional charges apply)
- Gemini (BYOK)
Security
What access to the AWS account is required from Glean?
Glean requires access from:
- The central Glean project which orchestrates setup and release deployments.
- The Glean AWS account which hosts the images.
More information:
- Glean AWS Account Access and Deployment Model (Trust Portal Access Required)
- Glean Architecture on AWS (Trust Portal Access Required)
Why does Glean request the customer to create an admin role?
There are situations where the Glean on-call engineer needs admin-level access to remediate or mitigate escalations. They must get approval from Glean leadership to access a Glean-side internal admin GCP service account which can then be used for federated access to the AWS-side IAM admin role.
Will NAF and WAF be managed by Glean?
Yes.
Which WAF are you using?
We’re using AWS WAF natively: https://aws.amazon.com/waf/
Does WAF log to CloudWatch?
Yes, this is enabled by default for all logs except for deny requests.
Do you apply data protection filters on CloudWatch logs?
Currently, we do not apply AWS masking to our logs. This would render logs unusable in important support and debugging situations.
What’s the path of incoming webhooks?
- Webhooks first go through the WAF (you can add rules like IP restrictions)
- Then the application load balancer.
- Then the k8s cluster.
The authentication scheme depends on the specific vendor's API.
Can we attach custom security groups to one of the managed services?
Please provide the details to our support team who can further discuss this.
Does Glean provide any Intrusion Detection capabilities on AWS?
Glean recommends customers to leverage AWS GuardDuty for IDS capabilities on AWS.
- More information: AWS GuardDuty and Glean
Networking
What are the network requirements?
Glean will set up and deploy all infrastructure, including VPC components, within an empty AWS account the customer owns, so there is nothing that the customer needs to do proactively with respect to networking.
Compute
What OS are the EC2 instances running on and where do the AMIs come from?
Generally Amazon Linux 2 on EKS nodes. We use the default AWS-provided AMIs here.
For some standalone EC2 instances, we run a Glean AMI image built on top of Ubuntu 20.04 LTS (Focal).
Will the OS’s be patched by Glean or is that a customer responsibility?
Glean will handle the patching and maintenance of all compute instances. This is automated by our internal systems.
Cost & Resourcing
How do the infrastructure costs compare between AWS and GCP self-hosting options?
Currently we estimate that the infrastructure costs for AWS are 1.5x those of GCP. Glean will continue to iterate on reducing the cost on both platforms.
How do we appropriately size our Glean instance?
Glean will handle dynamically sizing all of the infrastructure based on many different factors relevant to the customer-specific corpus.
Can you give me an estimate of the cost of the AWS resources?
Can you give me an estimate of:
1. how much data is transferred out of the AWS account per day
2. Number of instances and their sizes across all services (e.g. EC2, RDS, EKS, S3, SageMaker)?
All of this can vary depending on the characteristics of your Glean deployment. To answer this question, please reach out to your Glean contact with the following information:
- Number of employees in your organization
- Number of documents in your corpus
- The data sources to be connected, and ideally the number of docs per data source.
While these are some high-level factors, there are many more nuances that go into figuring out how much data needs to be stored and processed. We can provide some estimates based on comparable deployments, but any numbers provided by Glean should be used as an estimate only.
What GPU instance types are typically needed?
Our SageMaker training jobs require ml.g4dn.* instance types (primarily ml.g4dn.xlarge). We run about 1-4 training jobs a day, with varying runtimes from 30 minutes to a few hours.
However, none of the instances we explicitly create, e.g. on the EKS cluster, require GPUs.
How do we minimize egress cost?
Most Glean-relevant traffic is ingress (incoming data). AWS generally does not charge for ingress.
Storage - RDS
Which database are you using?
We’re AWS RDS for MySQL: https://aws.amazon.com/rds/mysql/
How often are SQL backups taken?
Once a day.
Storage - S3
Do buckets have Inventory enabled?
No, we don’t enable Inventory.
Are S3 buckets accessible publicly or from Glean Central?
No.
Is S3 configured for cross-region replication?
No, we don’t configure cross-region replication and in practice have not had a strong reason to.
Lambda
The EKS cluster is separate from the private lambdas. What is the purpose of these lambdas, which provide serverless functions?
These lambdas are used for:
- Setup & deployment (Bootstrap configuration template)
- Maintenance operations and cron jobs, e.g. restarting or upgrading node pools
Are the lambdas configured to be publicly accessible?
No. None of them are publicly accessible.
Do you add layers to lambdas, and if so, are they accessible from outside the organization?
No, Glean doesn’t add layers to lambdas.
Do you use lambda function URLs?
No, they are disabled.
Disaster Recovery
How does Glean handle Disaster Recovery?
Please refer to:
- Glean Business Continuity & Disaster Recovery Policy (Trust Portal Access Required)
Feature Support
Does Glean on AWS support vanity URLs?
Yes. Vanity URLs, i.e. companyname.glean.com, are supported on AWS.
Does Glean on AWS have feature parity with other Glean deployment methods?
Glean on AWS is feature parity, with the exception of the below:
-
DLP / Sensitive Content Reporting:
This feature is dependent on GCP's DLP service. While AWS does have a similar service that Glean could leverage, it functions in a fundamentally different way (requiring a full export of all data to S3) which is incompatible with our platform. We are investigating alternative options to bring this capability to our AWS customers. -
OCR:
OCR is an optional feature and requires a subscription uplift for use. Glean leverages GCP Cloud Vision for OCR capabilities. We are currently investigating the use a multi-modal local LLM to provide this capability instead.