Job Category: Internship

Job Type: Full Time

Job Location: India - Gurgaon India - Kolkata Pakistan - Islamabad

You’ll profile the existing codebase, design a GPU strategy (CUDA/SYCL/OpenMP offload), implement kernels, and validate numerical parity against CPU baselines. You’ll use modern AI coding tools (Cursor, GitHub Copilot) to speed up refactors—paired with rigorous reviews and benchmarking to ensure quality.

What You’ll Do

Assess & Plan:
- Build and run the current C/C++ code; map dependencies and hot paths.
- Create a migration plan (targets, expected speedups, parity criteria, risks).
Accelerate:
- Implement GPU kernels and parallel patterns (tiling, shared memory, coalesced access).
- Choose and justify the approach: CUDA (preferred), SYCL/oneAPI, HIP, or OpenMP offload based on portability needs.
- Optimize memory layout (AoS→SoA), minimize host↔device transfers, and exploit libraries (e.g., cuBLAS/cuFFT/Thrust when applicable).
Validate & Benchmark:
- Build a CPU↔GPU parity test harness with numerical tolerances; develop unit/integration tests.
- Benchmark with representative Radio Planning workloads; iterate using Nsight/VTune/rocprof.
- Document speedups, bottlenecks, and trade-offs.
Engineer for Maintainability:
- Add comments, diagrams, and a GPU Migration Guide (build/run, architecture, profiling notes).
- Set up CI jobs for linting (clang-tidy), tests, and benchmark smoke checks.
- Use Cursor and GitHub Copilot responsibly to draft code/tests, then refine via manual review.

Minimum Qualifications

Strong C/C++ skills and comfort reading legacy code.
Coursework or hands-on experience with GPU programming (CUDA, SYCL, HIP, or OpenMP offload).
Familiarity with parallel computing concepts (threads, warps, memory hierarchy, vectorization).
Experience with profiling & optimization (e.g., Nsight Compute/Systems, nvprof, perf, VTune).
Comfortable with CMake, Git, and unit testing (e.g., GoogleTest).
Practical use of Cursor and/or GitHub Copilot (or similar AI pair-programming tools).

Nice to Have

Knowledge of numerical methods/linear algebra; experience with cuBLAS/cuFFT/Thrust.
Exposure to telecom/RF propagation or scientific computing on large grids.
Python for quick experiments/plots; basic PyTorch/NumPy for validation scripts.
Docker/Containers for reproducible dev; understanding of licensing and IP hygiene for AI-assisted code.

Apply for this position

Back to listings