Explore DVC, CML and MLFlow to improve your machine learning developments

As a data scientist🧑🏻‍💻, I spend a lot of my time creating models for a huge variety of tasks. Creating a machine learning model is a complex task where you start cleaning the data, then you make the representation and at the end, you create the model. During this process is necessary to answer a lot of questions like:

  • Which are the correct features for my problem?
  • Is this model better than the one before?
  • Which are the best hyper-parameters?
  • I have a good model. How can I deploy into production?
  • How can I monitor the results in production?
  • I…

As a data scientist, my work usually consists on developing machine learning or deep learning models for a huge variety of tasks. I often start with a Jupyter notebook doing exploration and experimentation with the data. When I am clear about the model that I am going to use, I create some deploy scripts. Normally, these scripts consist of APIs and Docker images to allow fast deployments in the cloud or on-premise systems.

In this article, I am going to provide a simple way to develop and deploy an API for an NLP task. In this case, I will create…

Hace 1 año aproximadamente descubrí, mientras realizaba mi trabajo final de grado, el mundo de los embeddings.

La idea es simple, tal como proponen las hipótesis de la semántica distribucional, palabras con distribuciones similares tienen significados similares. Por ejemplo, pongamos las palabras peine pelo y elefante. Como se puede observar, las palabras peine y pelo comparten un contexto semántico, por lo que, según la hipótesis distribucional sus representaciones serán similares. En cambio, ni peine ni pelo guardan ninguna relación semántica con elefante y por esta razón su representación será distinta.

Entonces, ¿Es posible aprovechar las hipótesis de la semántica distribucional…

Marcos Esteve

Data Scientist & Machine Learning Engineer. Multimodality (text+image) research

