Ideal architecture for ML training -> API service workflow, with multiple models/services?

by CaptainPlanet   Last Updated July 12, 2019 23:05 PM

I'm planning to construct a workflow/environment for training and serving NLP classifiers, that follows something like:

  1. The model training system takes in annotated documents from a subset of a variety of preconfigured sources, along with a set of user-defined parameters on how to run the model (e.g. which n-gram features to generate, whether to apply negation/lemmatization, etc)
  2. Model training system outputs a model file to an S3 bucket
  3. A Flask-based API service loads the model from S3 on startup, and uses it to provide real-time prediction

However, there are a few caveats:

  • The training workflow will feed into multiple stand-alone services, not just one
  • Each service might have multiple models attached to it (so an incoming POST of a document would receive a response with multiple classifications, based on multiple different model predictions)
  • The calls per minute per service would be relatively low (maybe one call to a service every few minutes)

I've looked into existing offerings like SageMaker, but that's limited to one API service per model. It's also seemingly designed for API services that would receive thousands of calls per second, which is not at all cost-effective for my needs.

As such, here's my plan:

Pre-/Post-processing Package. Have a code repo containing all the pre- or post-processing methods that might be called by the classifier pipeline (both in training and in prediction). These methods all include a high amount of logic variance, dictated by input params. This code is, on its own, not deployed anywhere.

Training service. A high-resource EC2 instance which imports the above pre-/post-processing package, has input connectors to all possible data sources, and outputs to an S3 bucket. Data scientists would input a set of params and data source(s) and run the training on this instance.

Model storage. Outputted models are stored in various S3 buckets, based on some organizational structure related to the data sources and classifier type.

API services. A series of low-resource Flask-based API services which use config files to dictate which models to load from the S3 bucket. This also would need to import the pre-/post-processing package, so it could apply said methods during prediction of incoming docs.

So, my questions:

Does this general architecture make sense? Or are there sections I should rethink?

Are there existing cost-effective systems I should be looking into which would handle this better than constructing the entire ecosystem myself?

Related Questions

R vs Python for data analysis

Updated December 11, 2016 08:02 AM

LDA tokens in one topic

Updated February 28, 2017 20:05 PM