Member-only story
Hackerrank In Top 30 MAANG Interview question — Minimum configurations required on Databricks to make SPARK-NLP work
As you try to make your own embeddings model to work using Spark-NLP, on Databricks, there are a few set of minimum required configurations that are needed to make it work.
Though the open source documentation is very clear and easy to follow through, I often find it very useful to have the set of configurations ready for me to follow through.
There are only two absolute things needed to be included in your cluster-
- Libaries
- Install the spark-nlp with the laters coordinates from opensource maven repository.
- Install the spark-nlp library using the pip
Install `tensorflow` and `tensorflow hub` if needed to work on prototyping, but u can live without it as well.
- Spark configurations.
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max 2000M
spark.jsl.settings.pretrained.cache_folder /<custom dbfs location where you can write>
spark.driver.maxResultSize 0…