Guide to Ansible Facts and Fact Gathering

It’s tempting to think Ansible knows what operating system (OS) you’re running solely through the modules you use. For instance, apt implies Debian, and dnf implies you are on a Fedora-like distribution. This is true to some extent, but it doesn’t paint the whole picture.

For Ansible to perform tasks such as conditionally installing packages based on the OS, dynamically configuring services with different names across distributions, or templating configuration files with system-specific paths and values, it needs to gather its facts accurately.

In this article, we discuss Ansible facts and how fact gathering works. We’ll review how to use these facts and when to use them.

Prerequisites

To follow along and get the most out of this guide, you need to have the following:

An Ansible installed on your control machine (version 2.9 or later recommended)
Access to a target server (either a local virtual machine (VM) or a remote server) that Ansible can connect to for executing commands.

What are Ansible facts?

Ansible facts are pieces of information that Ansible discovers about your target systems. Every time you run an Ansible playbook, fact gathering is the first thing that happens (unless you explicitly disable it). This process runs before your playbook is executed.

These facts include data such as:

Operating system type and version
Hardware specifications (CPU, memory, disk space)
Network configuration (IP addresses, network interfaces)
Installed software and package versions
System architecture and kernel information

If you have used Ansible long enough, you will have noticed this as part of the console output when you execute a playbook.

See the reference below:

How fact gathering works

When Ansible connects to a target machine, it doesn’t immediately start executing your tasks. Instead, it first runs what’s called the setup module. The setup module uses various methods to gather this information.

All of this happens through the same SSH connection that Ansible uses for everything else, which means you don’t need to configure anything extra. Under the hood, Ansible runs small Python scripts on the target machine that collect system details.

Such details include network interfaces, disk usage, environment variables, and information about installed packages. These details are then sent back to Ansible in JSON format, making them easy to store, query, and use later in your playbook.

Once collected, Ansible facts become available as variables throughout your entire playbook. You can reference them in tasks, use them in conditionals, or even display them for debugging purposes.

How to use Ansible facts

To use Ansible facts, reference them within playbooks or templates using the ansible_facts dictionary or shorthand variables like ansible_hostname or ansible_distribution.

1. Display all gathered facts

Sometimes you may want to view all the information Ansible has gathered about a system in one place. The simplest way to do this is to print the entire ansible_facts dictionary, which contains all gathered information.

Consider the following playbook:

# facts/all.yaml
---
- name: Display all facts
  hosts: all
  tasks:
    - name: Print all gathered facts
      debug:
        var: ansible_facts

As previously stated, this playbook performs one task: displaying all facts stored in ansible_facts.

It’s worth noting that this playbook is not particularly useful on its own. It dumps every fact Ansible has collected, which can be overwhelming and hard to parse.

2. Conditional package installation by OS

Sometimes, you may want to use facts to install packages conditionally, based on your operating system. This use case is helpful if the distribution varies across a fleet of machines or if packages are named slightly differently on a specific distribution:

---
- name: Install web server based on OS
  hosts: all
  tasks:
    - name: Install Apache on Ubuntu/Debian
      apt:
        name: apache2
        state: present
      when: ansible_facts['os_family'] == "Debian"
    
    - name: Install Apache on CentOS/RHEL
      yum:
        name: httpd
        state: present
      when: ansible_facts['os_family'] == "RedHat"

In the playbook above, accessing the os_family key will yield the operating system distribution. Combining this with the when field, you can conditionally install packages based on distributions.

Running the above playbook should yield an output similar to:

Note that the Apache installation on CentOS/RHEL has been skipped because the target machine is Debian.

3. Dynamic configuration based on memory

Another common Ansible facts use case is dynamically configuring services or files based on system resources. Applications like OpenSearch require you to set the JVM heap size appropriately for optimal performance. This value should typically be 50% of the total available RAM on the node.

If you were running a self-hosted OpenSearch cluster across multiple servers with varying memory configurations, manually calculating and setting these values for each node would be tedious and prone to error.

The memtotal_mb fact becomes incredibly useful as it allows you to dynamically calculate the appropriate heap size for each server based on its actual available memory.

Below is an example playbook:

---
- name: Configure OpenSearch heap size based on system memory
  hosts: all
  tasks:
    - name: Calculate heap size (50% of RAM)
      set_fact:
        heap_size_mb: "{{ [ansible_facts['memtotal_mb'] // 2, 31744] | min }}"
    
    - name: Print calculated heap size
      debug:
        msg: "Setting heap size to {{ heap_size_mb }}MB for this {{ ansible_facts['memtotal_mb'] }}MB system"

As discussed above, this playbook determines the appropriate Java heap size for OpenSearch by allocating half of the system’s total memory, but never more than 31,744MB. It then displays a message showing both the calculated heap size and the machine’s total memory, so you can confirm the configuration.

In a real-world scenario, you would use lineinfile with a regexp to insert or update the actual JVM configuration files.

Running the playbook above should yield an output similar to:

💡 You might also like:

4. Validating system resources with assertions

Ansible facts can also be used to determine if a workload can run on a machine before you kick off your deployment. This is useful if you are not directly responsible for allocating resources to a fleet of machines.

To check if a host has the right amount of memory to run a workload, you can leverage the assert module to fact-check:

---
- name: deploy
  hosts: all
  tasks:
    - name: Ensure at least 4GB RAM
      assert:
        that:
          - ansible_facts['memtotal_mb'] >= 4096
        fail_msg: "Not enough memory to run this workload."

This playbook above checks that every target host has at least 4GB of RAM before deployment. If a host doesn’t meet the requirement, it fails with the message: “Not enough memory to run this workload.

Running the above playbook should yield the following output if you have less than four gigabytes of memory:

Validating system resources with assertions

5. Tagging hosts dynamically by OS family

Continuing with the theme of managing a fleet of machines, another common use case for Ansible facts is tagging machines based on their distribution. This can be used alongside the conditional install.

The main advantage is that you do not need to tag potentially hundreds of machines individually.

Here’s an example playbook:

- name: Tag hosts
  group_by:
    key: "os_{{ ansible_facts['os_family'] }}"

# Later use:
- hosts: os_Debian
  tasks:
    - name: Only runs on Debian hosts
      debug:
        msg: "Debian detected"

The group_by module can be used to build a list of hosts to run Debian. Later in your playbook, you can use it to make decisions. For this example, the playbook simply prints out that it detected a Debian host.

6. Gather subset module

By default, Ansible collects as much information as it can find on a target machine, which you might not necessarily need on every run. Thankfully, you can filter for specific information by using the gather_subset module.

---
- name: Configure firewall with minimal fact gathering
  hosts: all
  gather_facts: true
  gather_subset:
    - network
  tasks:
    - name: Display only network facts
      debug:
        var: ansible_facts['default_ipv4']
    
    - name: Configure iptables rule
      debug:
        msg: "Allow traffic to {{ ansible_facts['default_ipv4']['address'] }}"

The playbook above uses the gather_subset to filter for network-related facts. This is useful when writing a playbook that focuses solely on networking.

Running the playbook above should yield an output similar to:

From the output above, you can see all networking-related information for your server. For a more comprehensive list of supported subsets, refer to this section of the Ansible documentation.

For situations where you need to deploy a binary on a specific architecture or access hardware-specific information, Ansible also allows you to access this through a hardware subset.

You can print out all hardware-related facts using:

---
- name: hardware info
  hosts: all
  gather_facts: true
  gather_subset:
    - hardware
  tasks:
    - name: hardware facts 
      debug:
        var: ansible_facts

After running this playbook, you should get an output similar to:

If you wanted to leverage the architecture fact, here’s how you’d use it to copy a file based on the target architecture:

---
- name: Deploy hardware-optimized settings
  hosts: all
  gather_facts: true
  gather_subset:
    - hardware
  tasks:
    - name: Copy optimized config for amd64
      copy:
        src: files/db-config-amd64.conf
        dest: /etc/mydb/config.conf
        mode: '0644'
      when: ansible_facts['architecture'] == "x86_64"

Combining the conditional when and the Ansible fact architecture enables you to copy files based on the target.

7. Bonus: Fact caching

Fact caching is enabled by default in Ansible; however, facts are stored in memory.

In a production environment, you want facts to be kept in more stable storage. This becomes particularly important when you’re running multiple playbooks against the same hosts or when you want to persist facts between Ansible runs.

The quickest way to switch your cache is by setting the environment variable. You can do this using the command below:

export ANSIBLE_CACHE_PLUGIN=jsonfile

Or in the ansible.cfg file:

[defaults]
fact_caching=jsonfile

Supported cache plugins include jsonfile, redis, and memcached. Each offers different benefits: jsonfile is simple and requires no additional infrastructure, whereas redis provides better performance and can be shared across multiple Ansible control nodes.

Ansible also allows you to control how long facts are cached for. This is useful in scenarios where machines might change but have the same hostname, or you do not want to run the risk of having stale facts.

[defaults]
fact_caching=jsonfile
fact_caching_timeout = 7200

The default is 86400s or 24 hours. You can lower this value to refresh the data automatically.

For cache plugins like Redis, you can specify the connection details with the following configuration option:

fact_caching_connection = localhost:6379:0

With the connection format being host:port:db:password.

Using fact caching can significantly improve playbook performance because Ansible won’t need to gather facts on every run, and it enables more sophisticated workflows where facts collected in one playbook can be used by another.

When to use or skip Ansible facts?

Facts are incredibly useful, but they’re not always necessary.

If you’re writing a simple playbook that just copies files or restarts services without any conditional logic, you might not need facts at all. In these cases, you can disable fact gathering entirely by setting gather_facts to false, which can help speed up execution.

On the other hand, facts become essential when making decisions based on the target system’s characteristics.

For instance, consider the Apache installation example, where the OS family determined which package to install, or the OpenSearch memory configuration that adapted to each server’s available RAM.

When you do need facts, gather_subset is your best friend if you know exactly what information you’ll be using.

When you just need network details for firewall configuration, you can use gather_subset: network.
When working with memory-sensitive applications, use gather_subset: hardware.

This means you retrieve only the facts you need without the overhead of collecting everything.

Why use Spacelift for your Ansible projects?

Spacelift’s vibrant ecosystem and excellent GitOps flow are helpful for managing and orchestrating Ansible. By introducing Spacelift on top of Ansible, you can easily create custom workflows based on pull requests and apply any necessary compliance checks for your organization.

Another advantage of using Spacelift is that you can manage infrastructure tools like Ansible, Terraform, OpenTofu, Pulumi, AWS CloudFormation, and even Kubernetes from the same place and combine their stacks with building workflows across tools.

You can bring your own Docker image and use it as a runner to speed up deployments that leverage third-party tools. Spacelift’s official runner image can be found here.

Our latest Ansible enhancements solve three of the biggest challenges engineers face when they are using Ansible:

Having a centralized place in which you can run your playbooks
Combining IaC with configuration management to create a single workflow
Getting insights into what ran and where

Provisioning, configuring, governing, and even orchestrating your containers can be performed with a single workflow, separating the elements into smaller chunks to identify issues more easily.

If you want to learn more about using Spacelift with Ansible, check our documentation, read our Ansible guide, or book a demo with one of our engineers.

Key points

Ansible facts can allow you to make more intelligent decisions when writing playbooks or developing roles.

The setup module runs in the background of every playbook execution, gathering system information that enables your tasks to adapt to different environments automatically. Whether you’re conditionally installing Apache based on the OS family or dynamically calculating heap sizes based on available memory, you can achieve this.

By using gather_subset to collect only the Ansible facts you need, or disabling fact gathering entirely for simple tasks, you can significantly reduce playbook execution time. For production environments running multiple playbooks, implementing persistent fact caching with plugins like jsonfile or redis prevents redundant system discovery and enables facts to be shared across different automation workflows.

Manage Ansible better with Spacelift

Managing large-scale playbook execution is hard. Spacelift enables you to automate Ansible playbook execution with visibility and control over resources, and seamlessly link provisioning and configuration workflows.

Learn more

Frequently asked questions

How do I print all Ansible facts for a host?
To print all Ansible facts for a host, run: ansible <hostname> -m ansible.builtin.setup

To retrieve just the IP or hostname, filter facts with a setup module filter: ansible <hostname> -m ansible.builtin.setup -a 'filter=ansible_default_ipv4'

For just the IP: ansible <hostname> -m ansible.builtin.setup -a 'filter=ansible_default_ipv4.address'
How do I disable or skip gathering facts to speed up a play?
To disable fact gathering in an Ansible playbook, set gather_facts: false at the play level. This prevents Ansible from running the setup module, which collects system facts and can slow down execution, especially when executed across multiple hosts.
What’s the difference between set_fact and facts from the setup module?
set_fact creates custom variables during a playbook run, scoped to the current host and available for later tasks. These are dynamic and defined at runtime.

Facts from the setup module are system facts gathered automatically by Ansible (e.g., IP addresses, OS, memory). They’re static unless refreshed and provide environment-specific details.
Why is Ansible stuck on “Gathering Facts,” and how can I fix it?
Ansible often gets stuck on “Gathering Facts” due to SSH connectivity issues, slow or misconfigured remote nodes, or custom fact-gathering scripts that hang. By default, Ansible runs setup to collect system info before executing tasks, which can cause delays if the remote system is unresponsive.

To fix it:
- Test SSH manually: ssh user@host
- Disable fact gathering: gather_facts: no
- Limit facts: setup module with filters
- Increase timeout: Use ANSIBLE_TIMEOUT or edit ansible.cfg

Guide to Ansible Facts and Fact Gathering

What are Ansible facts?

How fact gathering works

How to use Ansible facts

1. Display all gathered facts

2. Conditional package installation by OS

3. Dynamic configuration based on memory

4. Validating system resources with assertions

5. Tagging hosts dynamically by OS family

6. Gather subset module

7. Bonus: Fact caching

When to use or skip Ansible facts?

Why use Spacelift for your Ansible projects?

Key points

Manage Ansible better with Spacelift

Frequently asked questions

How do I print all Ansible facts for a host?

How do I disable or skip gathering facts to speed up a play?

What’s the difference between set_fact and facts from the setup module?

Why is Ansible stuck on “Gathering Facts,” and how can I fix it?

Ansible Service Module: Start, Stop, & Manage Services

Using Ansible apt Module to Manage Packages

Ansible Delegate_to: Run Tasks on Delegated Host

Share your data and download the cheat sheet