In this paper, we propose a deep reinforcement learning (DRL)-based algorithm to generate policies of Baseband Function (BBF) placement and routing. In order to explore the performance of the proposed algorithm in practical systems, the online scenario with the completely random requests is used in the simulation considering C-RAN and NG-RAN architectures. Besides, an Integer Linear Programming (ILP) model is formulated to generate the optimal solution as the benchmark. The simulation results show that DRL-based algorithm converges in a short time, and its performance closes to the optimal benchmark obtained by ILP in terms of latency and bandwidth for the online scenarios. In addition, the performance of the generated policies based on DRL is compared with a classic heuristic algorithm, i.e., first-fit algorithm. The performance of DRL-based algorithm is superior to the first-fit algorithm from above two perspectives. The fast convergence and the near-optimal performance prove that the DRL-based algorithm is a promising approach for the BBF placement and routing of RAN in 5G and Beyond.