Score:1

AWS Architecture Advice - multiple EC2 instances with shared database / file system with dynamic start and stop

cn flag

I am very new at cloud architecture but have decent application development experience. Right now, I am in the process of making a large computational pipeline more accessible to 5-10 users via a web application and am setting this all up in AWS.

My current implementation is a lightweight React web app that uses two APIs and a MySQL backend that allows users to queue up jobs with parameters and access end results through the web app or from emails sent to users after a run is done.

In the middle of this pipeline is a dependency on a proprietary software piece that needs a very hefty machine to compute these steps (64GB ram, 16 cores, 1TB HDD) and can run for up to 1.5 days for just this one step. This is my biggest bottle neck of the entire pipeline.

To save on costs as much as possible, I am trying to make the bottleneck/service piece scalable/cost-effective by having multiple EC2 instance "agents" available to be turned on, run the steps, send an email, write to the web app database, and then stop the instance via AWS lambda functions that would be triggered by an action from the web app.

I am planning on hosting one EC2 instance for the web app, 2 APIs, and MySQL server on since concurrency/scalability on this piece is very small. I will also have another 1-3 instances for the bottleneck services to share concurrent runs from the 5-10 users which could allow up to 3 runs of the heavy step going at the same time.

Since the bottleneck services require similar files to run the programs and the input to these steps can sometimes be file sizes of 150GB, I am thinking of using either EFS or S3 storage to hold the inputs so that I only have to worry about transferring the input files to one place that could be shared across EC2 instances and I wouldn't need to ensure they are started to do the transfer step. This is one manual piece that I also haven't figured out a good way to be more automated since the file sizes are so large.

My questions are does my setup sound reasonable and do you see any holes in my implementation ideas? Currently I am using EBS storage for the service instances but I want to minimize the input locations for the 150GB transfers / maintenance. I also am unsure of the difference between S3 and EFS since they both seem to be multi-instance mountable, but which one should I use? And does it make sense to keep the web app, api's, and database on one EC2 instance if I need the service ones able to write to the database after they are done? That instance would be on all the time.

Thank you for your help and forgive me if I have said anything naively.

Score:0
gp flag
Tim

In general in the cloud it's good to try to use services rather than servers. You have to keep an eye on cost, but it can make solutions more robust, faster, and more compliant.

I have a couple of thoughts about your workload:

  • Can you use an orchestrator like AWS Step functions calling many AWS lambda functions to do the computation? I do note that lambda is probably the most expensive compute time on AWS, so maybe not ideal. With limits set right and a suitable workload maybe you could start 10,000 lambdas and do the job in parallel in 15 mins.
  • Instead of EFS / S3 how about creating a golden EC2 image / AMI then for every job spinning up a spot / dynamic EC2 instance large enough to do the processing for that one job shut down when it's done? Lambda could maybe orchestrate the job based on events of some type? That would avoid data transfer charges - though not sure if they're charged to EBS / S3 or not. Spot compute is quite cheap, and if you choose your region / AZ / instance size correctly interruptions should be rare. Interrupted instances are shut down and the EBS volume kept, so this would work better if your job is written to disk regularly and can be restarted.

I'd probably also put some time into optimizing that huge job.

Score:0
la flag

Your setup does sound reasonable. I might suggest you look into having an API Gateway to "host" your API, and give it some thought if it works for you. You could also consider having your heavy-load EC2 instances in an Autoscaling Group and have your control Lambda interact with it instead of directly the instances.

S3 and EFS are different data storage solutions. S3 is object storage while EFS is file storage. S3 is not exactly mountable though it maybe presented as if it were through different utilities. Whether it's correct to use S3 or EFS depends on how you're using the files you have on there.

For your database you might consider going to RDS, perhaps with using a burstable instance class or one of the serverless options. But this will depend on your budget and use case.

Nonchalahnt avatar
cn flag
Thank you so much for this, these are all good ideas for me to further explore and get clarification on. Regarding the S3/EFS distinction, I have researched a bit on these and my current understanding is that object storage is great for larger websites to hold asset type resources like pictures / videos. My input files are effectively just very large formatted text files (~60GB) and the heavy-load programs are reading from these to do their work. That makes me think EFS is preferred due to sharing that space, however some articles say object storage is better with larger files? Thanks!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.