Member-only story

Hackerrank In Top 30 MAANG Interview question — Minimum configurations required on Databricks to make SPARK-NLP work

2 min readMar 2, 2023

As you try to make your own embeddings model to work using Spark-NLP, on Databricks, there are a few set of minimum required configurations that are needed to make it work.

Though the open source documentation is very clear and easy to follow through, I often find it very useful to have the set of configurations ready for me to follow through.

There are only two absolute things needed to be included in your cluster-

Libaries

Install the spark-nlp with the laters coordinates from opensource maven repository.

Install the spark-nlp library using the pip

Install `tensorflow` and `tensorflow hub` if needed to work on prototyping, but u can live without it as well.

Spark configurations.

spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 2000M
spark.jsl.settings.pretrained.cache_folder /<custom dbfs location where you can write>
spark.driver.maxResultSize 0…

Hackerrank In Top 30 MAANG Interview question — Minimum configurations required on Databricks to make SPARK-NLP work

Written by Amber Ivanna Trujillo

No responses yet