Sample rate for frequency, but depth for dynamic range.
Both are “limited” in human hearing – 44kHz is overkill, but give a bit of room for a decent analogue filter to exclude Nyquist violating frequencies, so that’s good. 48kHz allows better syncing with video – that’s the obvious advantage, an integer of samples per video frame.
The dynamic range of the human ear is about 15 bits – 16 is mild overkill, but very convenient for computing…
Taking 8 extra bits gives a huge dynamic range, which we can exploit for an insanely low noise floor (assuming clean signals of course) or a extremely high transient capacity.