Member-only story

Hackerrank In Top 30 MAANG Interview question — Minimum configurations required on Databricks to make SPARK-NLP work

Amber Ivanna Trujillo
2 min readMar 2, 2023

--

As you try to make your own embeddings model to work using Spark-NLP, on Databricks, there are a few set of minimum required configurations that are needed to make it work.

Though the open source documentation is very clear and easy to follow through, I often find it very useful to have the set of configurations ready for me to follow through.

There are only two absolute things needed to be included in your cluster-

  1. Libaries
  • Install the spark-nlp with the laters coordinates from opensource maven repository.
  • Install the spark-nlp library using the pip

Install `tensorflow` and `tensorflow hub` if needed to work on prototyping, but u can live without it as well.

  1. Spark configurations.
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 2000M
spark.jsl.settings.pretrained.cache_folder /<custom dbfs location where you can write>
spark.driver.maxResultSize 0…

--

--

Amber Ivanna Trujillo
Amber Ivanna Trujillo

Written by Amber Ivanna Trujillo

I am Executive Data Science Manager. Interested in Deep Learning, LLM, Startup, AI-Influencer, Technical stuff, Interviews and much more!!!

No responses yet