Hey Bryan, thanks for taking the time to read through and offer thoughts on the piece! I appreciate you calling out these open source libraries and the challenge of building vs. buying vs. importing. The cop-out answer is that we saw Metaflow and MLFlow as both being new enough where there was a desire to see how these libraries developed in the future. Both of them offer stronger and cleaner abstractions for experimentation and artifact tracking out of the box, and I appreciate the way that attaching Python decorators within these frameworks facilitates the movement of model code from development to production with less code changes required. I am definitely following the development of these projects with interest and may very well try to migrate to one of these libraries in the future.
The decision to use the tools that we ended up selecting was mostly a byproduct of existing infrastructure and the org chart layout. We already had a robust Airflow architecture with well-understood, mature data ETL services for controlling data extraction and transformation operations. There was also a desire to be able to onboard ML models onto our Kubernetes cluster in the future, which made Docker images for each model a more attractive proposal.
The net new code that needed to be written mostly centered around the SQLAlchemy ORMs for laying down the experimentation tracking data model and the correct API call syntax for being able to deploy custom Docker images to AWS Sagemaker. As a ML engineer, my primary goal was to be able to allow a data scientist to bring a set of SQL scripts, a preprocessing Python script, and a training Python script and they would get an Airflow pipeline orchestrating their model training and deployment needs in return. This does ends up concentrating more operation burden on the data engineers and ML engineers that need to support this lightweight orchestration framework.
I see libraries like Metaflow and MLFlow increasingly empowering data scientists to be able to self-service their own production model deployment and monitoring needs. To me, that level of responsibility and full-stack support implies higher-skill data scientists that might be difficult for startups to reliably hire and retain. Purely my own personal opinions though, I think a lot of this is still more art than science. My own hesitation around being prescriptive with ML tooling is also why I tried to focus less in the article on specific frameworks and moreso around higher-level best practices with ML pipeline management.