Gene families are sets of proteins that have been clustered based on sequence similarity. In Ensembl Genomes, these are used to provide a way of exploring similar proteins across a wide range of bacterial genomes for which the standard peptide comparative pipeline cannot be run. Gene families are displayed in the web interface or can be accessed using the Ensembl Compara Perl API.
In Ensembl Bacteria, gene families are populated with proteins on all bacterial genomes by using the HAMAP and PANTHER classification provided by InterPro. Note that while this uses the same database schema and API as Ensembl, it does not use the same gene family pipeline. Gene families are also not available for any other Ensembl Genomes division.