If you run into the following error when connecting to the new Claude 3.5 Sonnet 2 models from AWS - 400 Invocation of model ID anthropic.claude-3-5-sonnet-20241022-v2:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.
You can fix this using the following config:
amazon.titan-embed-text-v2:0
as your embeddings model.
cohere.rerank-v3-5:0
as your reranking model, you may also use amazon.rerank-v1:0
.
promptCaching: true
under defaultCompletionOptions
in your model
configuration.
Prompt caching is generally available for:
~/.aws/credentials
under a configured profile (e.g. “bedrock”).