As the number of transistors on a single chip continues to grow, it is important to think beyond the traditional approaches of compiler optimization for deeper pipelines and wider instruction issue units to improve performance. This single-threaded execution model limits these approaches to exploiting only the relatively small amount of instruction-level parallelism available in application programs. While integrating an entire multiprocessor onto a single chip is feasible, this architecture is limited to exploiting only relatively coarse-grained parallelism. We propose a concurrent multithreaded architecture, called the superthreaded architecture, as an alternative. As a hybrid of a wide-issue superscalar processor and a multiprocessor-on-a-chip, this new concurrent multithreading architecture can leverage the best of existing and future parallel hardware and compilation technologies. By combining compiler-directed thread-level speculation for control and data dependences with run-time checking of data dependences, the superthreaded architecture can exploit the multiple granularities of parallelism available in general-purpose application programs to reduce the execution time of a single program.