That filter is why there is a significant cpu cost – I’ve never quite got my head around where (in the frequency spectrum) things get folded, but I do know that a decent re sampler will kill them dead (I.e filter them out before the resampling).
If you up sample (as per the OP) then you don’t have the folded audio, because the signal is already bandpass filtered, then the subsequent down sampling (because they’ll presumably want 44.1kHz back) should also be perfect (I think).

I think we’re probably singing from a very similar hymn sheet. The main point is that resampling isn’t “guesswork”, it’s an exact science. Loss of bit depth is slightly less accurate, but we’re so far above the point where it makes a difference (assuming a good dither) that no one will hear that either. But we keep the accuracy whilst mixing – because we’re affecting all sorts of interesting maths on the sound first.

I.e. Don’t be scared of using a decent re-sampler. Do check for band pass filters and dither settings…