Fix parallel code gen in RT backend
- Properly use
Arc<Mutex<_Option<JoinHandle<_, _>>_>>
for task parallel calls.- Include
Arc
clones right before opening async closure (for async calls and for fork-joins).
- Include
- Make data parallel fork-joins not task parallel by default (can still be outlined for task parallelism).
- Fix edge case in function inlining.
Edited by rarbore2