I’m new to Scala and struggling to find a solution to my problem.
Working on Databricks project and need to produce a .json file from spark dataFrame or variable string.
The spark dataFrame.write.json(“dataLake\Folder\file.json”) function is producing a folder “file.json” with multiple files describing the status of the process + the json file but with different name “part-0000”.
What I’m after is a single file output where I can specify the name for.
TL;DR; use coalesce(1) before saving, but it will still produce a folder just with a single file.
Remember to use coalesce instead of repartition since, as the documentation explains, it avoids shuffling.
Note that the behaviour of outputting multiple files is correct because Spark is intended for distributed computation. If your use case doesn’t need multiple machines and it is not for educational purposes DO NOT use Spark(and for educational purposes then simple problems like having a folder with a single file named part-00000 shouldn’t be a problem).
Thanks for your reply,
However, I need to be able to output a single file with a specific name model.json or manifest.json (CDM files) so the file can be utilised by other application like Power BI.