In this paper, we employ the deep Q-network (DQN) algorithm to study user pairing downlink (D/L) non-orthogonal multiple access (NOMA) system considering the multiple user equipment (UEs). In this work, the independent and identically distributed (i.i.d.) fading links are considered. The channel is expected to become time-varying as a result of node mobility. User pairing and optimum power distribution algorithms based on reinforcement learning (RL) are investigated initially in NOMA systems. To make the analysis simpler and for reducing the computational complexity the Q-learning-based scheme is used jointly to investigate the user pairing and optimal power allocation problems. In real-time propagation conditions with numerous users, DQN was employed to conduct user pairing and power allocation at the same time. When the learning rate is 0.2000, the DQN method, on the other hand, converges quicker but does not reach the maximum throughput (or average sum rate). In our simulation, we have eight UEs, and it has been observed that employing near-user far-user (N-F) results in a better sum rate. The symbol error rate (SER) performance falls dramatically as the node velocity increases, according to simulation curves. This is because, when node mobility increases, the channel will change extremely quickly. The simulation results confirm the derived analytical expressions.