Where to put autogenerated data in the maven dir structure

jimka · November 14, 2019, 6:00pm

I’d like to put some autogenerated reference data (not input to any program, just example output) inside the project area somewhere. This will be useful for documentation purposes. When I explain to someone which git repo to clone, I want to tell him where in the cloned repo is example data which was generated by the program. Normally the code generates it in /tmp.

What’s the correct place in the directory structure to put this non-input reference data?

sangamon · November 14, 2019, 7:39pm

Build tools only care about files they process and generate, and they only prescribe folder structure for those files - with sbt that’d be src, project and target. Everything else is up to you. I’m not aware of any authoritative convention for Java/Scala projects beyond the build.

If these files are generated once by you and checked into VCS, I’d put them in some doc or reference folder (and refer to them in README.md). If the user is supposed to run the program themselves locally in order to generate the files, you could hardwire it to some out folder below the project root and exclude that in .gitignore, but I’d prefer to ask the user to provide a target directory outside the project root when invoking the program.

This assumes that your audience are developers for whom checking out from VCS is the natural way of accessing the program. If it’s users who are only supposed to run the program, this becomes an entirely different question of assembly/deployment, and the original project structure doesn’t really matter.

jimka · November 14, 2019, 9:56pm

Thanks. No, the user should not regenerate the files in this special directory.

This is a paper I’m submitting to the symposium on Trends in Functional Programming.

The files are there so that user’s who do not care to run the program (perhaps they are not Scala experts) can still see the raw data I used for the plots in my paper. I’m including the gnuplot files, the .csv files (containing parallel information), and the .png files which is the output of gnuplot. If the user cares to he can run the Scala program in the same git repo to regenerate the .csv and gnuplot files in his own environment, cpu, OS etc and verify or falsify my results.

I really prefer keeping this sample data with the code as I can change them together as the program progresses, but the revision history remains intact.