Need saving variable json string as a .json file

Hi Everyone,

Hope you can help.

I’m new to Scala and struggling to find a solution to my problem.
Working on Databricks project and need to produce a .json file from spark dataFrame or variable string.
The spark dataFrame.write.json(“dataLake\Folder\file.json”) function is producing a folder “file.json” with multiple files describing the status of the process + the json file but with different name “part-0000”.
What I’m after is a single file output where I can specify the name for.

Many Thanks in advance
Mariusz

LMGTFY.

TL;DR; use coalesce(1) before saving, but it will still produce a folder just with a single file.
Remember to use coalesce instead of repartition since, as the documentation explains, it avoids shuffling.

Note that the behaviour of outputting multiple files is correct because Spark is intended for distributed computation. If your use case doesn’t need multiple machines and it is not for educational purposes DO NOT use Spark (and for educational purposes then simple problems like having a folder with a single file named part-00000 shouldn’t be a problem).

Hi Balmung,

Thanks for your reply,
However, I need to be able to output a single file with a specific name model.json or manifest.json (CDM files) so the file can be utilised by other application like Power BI.

Is there no way of defying a file name?

Thanks
Mariusz

You would need to do a custom processing of the file yourself.
But then again, are you sure Spark was the right fit for your use case?

Hi Balmung,

I’m open to any solution, it does not have to be spark, as long as it delivers the results.
As mentioned in the original post I’m new to Scala.

Thanks again for your reply.
Mariusz

This SO answer shows a way to rename the file after being saved by Spark.

1 Like