Typically you wouldn't want to do one spectrum for the whole 350 seconds worth of data.
Quick quiz: what is the relationship between the length of time and the lowest frequency? What is the limit of human hearing? And so what lowest frequency do we want, and what does that imply about the size of the hamming window?
And for another thing, the frequencies in a typical sound sample change over time - taking a long time sample means you get lots of fairly different spectra averaged together.
In this code I read in an audio mp3, turn it into a wav, and plot its amplitude as a function of time. I then compare plots of the power spectrum using just the FFT vs. using Hanning windowing plus the FFT. I then create a spectogram of the frequency and intensity as functions of time.