This paper presents a hierarchical neuro-fuzzy network system for segmenting continuous speech into syllables. The system formulates speech segmentation as a two-phase procedure. In the first phase, the Hybrid Neuro-Fuzzy network (HNFN) is utilized to classify the speech signal into three different types. The hybrid model (HNFN) composed of a distributed representation of a fuzzy system (DRF) and a hyperrectangular composite neural network (HRCNN) is proposed and used to cluster frames. This special hybrid system may neutralize the disadvantages of each alternative. The parameters in the trained HNFN are utilized to extract both crispy and fuzzy classification rules. In the following phase, the self-tuning back-propagation neural network (STBNN) is utilized to solve the coarticulation effects of vowel-vowel (V-V) concatenation. In our experiments, a database containing continuous reading-rate Mandarin speech recorded from newscasts was utilized to test the performance of the proposed speaker-independent speech segmentation system. The effectiveness of the proposed segmentation system is confirmed by the experimental results.