It’s tempting to think Ansible knows what operating system (OS) you’re running solely through the modules you use. For instance, apt
implies Debian, and dnf
implies you are on a Fedora-like distribution. This is true to some extent, but it doesn’t paint the whole picture.
For Ansible to perform tasks such as conditionally installing packages based on the OS, dynamically configuring services with different names across distributions, or templating configuration files with system-specific paths and values, it needs to gather its facts accurately.
In this article, we discuss Ansible facts and how fact gathering works. We’ll review how to use these facts and when to use them.
Prerequisites
To follow along and get the most out of this guide, you need to have the following:
- An Ansible installed on your control machine (version 2.9 or later recommended)
- Access to a target server (either a local virtual machine (VM) or a remote server) that Ansible can connect to for executing commands.
Ansible facts are pieces of information that Ansible discovers about your target systems. Every time you run an Ansible playbook, fact gathering is the first thing that happens (unless you explicitly disable it). This process runs before your playbook is executed.
These facts include data such as:
- Operating system type and version
- Hardware specifications (CPU, memory, disk space)
- Network configuration (IP addresses, network interfaces)
- Installed software and package versions
- System architecture and kernel information
If you have used Ansible long enough, you will have noticed this as part of the console output when you execute a playbook.
See the reference below:
When Ansible connects to a target machine, it doesn’t immediately start executing your tasks. Instead, it first runs what’s called the setup module. The setup module uses various methods to gather this information.
All of this happens through the same SSH connection that Ansible uses for everything else, which means you don’t need to configure anything extra. Under the hood, Ansible runs small Python scripts on the target machine that collect system details.
Such details include network interfaces, disk usage, environment variables, and information about installed packages. These details are then sent back to Ansible in JSON format, making them easy to store, query, and use later in your playbook.
Once collected, Ansible facts become available as variables throughout your entire playbook. You can reference them in tasks, use them in conditionals, or even display them for debugging purposes.
To use Ansible facts, reference them within playbooks or templates using the ansible_facts
dictionary or shorthand variables like ansible_hostname
or ansible_distribution
.
1. Display all gathered facts
Sometimes you may want to view all the information Ansible has gathered about a system in one place. The simplest way to do this is to print the entire ansible_facts
dictionary, which contains all gathered information.
Consider the following playbook:
# facts/all.yaml
---
- name: Display all facts
hosts: all
tasks:
- name: Print all gathered facts
debug:
var: ansible_facts
As previously stated, this playbook performs one task: displaying all facts stored in ansible_facts
.
It’s worth noting that this playbook is not particularly useful on its own. It dumps every fact Ansible has collected, which can be overwhelming and hard to parse.
2. Conditional package installation by OS
Sometimes, you may want to use facts to install packages conditionally, based on your operating system. This use case is helpful if the distribution varies across a fleet of machines or if packages are named slightly differently on a specific distribution:
---
- name: Install web server based on OS
hosts: all
tasks:
- name: Install Apache on Ubuntu/Debian
apt:
name: apache2
state: present
when: ansible_facts['os_family'] == "Debian"
- name: Install Apache on CentOS/RHEL
yum:
name: httpd
state: present
when: ansible_facts['os_family'] == "RedHat"
Note that the Apache installation on CentOS/RHEL has been skipped because the target machine is Debian.
3. Dynamic configuration based on memory
Another common Ansible facts use case is dynamically configuring services or files based on system resources. Applications like OpenSearch require you to set the JVM heap size appropriately for optimal performance. This value should typically be 50% of the total available RAM on the node.
If you were running a self-hosted OpenSearch cluster across multiple servers with varying memory configurations, manually calculating and setting these values for each node would be tedious and prone to error.
The memtotal_mb
fact becomes incredibly useful as it allows you to dynamically calculate the appropriate heap size for each server based on its actual available memory.
Below is an example playbook:
---
- name: Configure OpenSearch heap size based on system memory
hosts: all
tasks:
- name: Calculate heap size (50% of RAM)
set_fact:
heap_size_mb: "{{ [ansible_facts['memtotal_mb'] // 2, 31744] | min }}"
- name: Print calculated heap size
debug:
msg: "Setting heap size to {{ heap_size_mb }}MB for this {{ ansible_facts['memtotal_mb'] }}MB system"
As discussed above, this playbook determines the appropriate Java heap size for OpenSearch by allocating half of the system’s total memory, but never more than 31,744MB. It then displays a message showing both the calculated heap size and the machine’s total memory, so you can confirm the configuration.
In a real-world scenario, you would use lineinfile
with a regexp
to insert or update the actual JVM configuration files.
Running the playbook above should yield an output similar to:
4. Validating system resources with assertions
Ansible facts can also be used to determine if a workload can run on a machine before you kick off your deployment. This is useful if you are not directly responsible for allocating resources to a fleet of machines.
To check if a host has the right amount of memory to run a workload, you can leverage the assert module to fact-check:
---
- name: deploy
hosts: all
tasks:
- name: Ensure at least 4GB RAM
assert:
that:
- ansible_facts['memtotal_mb'] >= 4096
fail_msg: "Not enough memory to run this workload."
This playbook above checks that every target host has at least 4GB of RAM before deployment. If a host doesn’t meet the requirement, it fails with the message: “Not enough memory to run this workload.
Running the above playbook should yield the following output if you have less than four gigabytes of memory:
5. Tagging hosts dynamically by OS family
Continuing with the theme of managing a fleet of machines, another common use case for Ansible facts is tagging machines based on their distribution. This can be used alongside the conditional install.
The main advantage is that you do not need to tag potentially hundreds of machines individually.
Here’s an example playbook:
- name: Tag hosts
group_by:
key: "os_{{ ansible_facts['os_family'] }}"
# Later use:
- hosts: os_Debian
tasks:
- name: Only runs on Debian hosts
debug:
msg: "Debian detected"
The group_by
module can be used to build a list of hosts to run Debian. Later in your playbook, you can use it to make decisions. For this example, the playbook simply prints out that it detected a Debian host.
6. Gather subset module
By default, Ansible collects as much information as it can find on a target machine, which you might not necessarily need on every run. Thankfully, you can filter for specific information by using the gather_subset
module.
---
- name: Configure firewall with minimal fact gathering
hosts: all
gather_facts: true
gather_subset:
- network
tasks:
- name: Display only network facts
debug:
var: ansible_facts['default_ipv4']
- name: Configure iptables rule
debug:
msg: "Allow traffic to {{ ansible_facts['default_ipv4']['address'] }}"
The playbook above uses the gather_subset
to filter for network-related facts. This is useful when writing a playbook that focuses solely on networking.
Running the playbook above should yield an output similar to:
From the output above, you can see all networking-related information for your server. For a more comprehensive list of supported subsets, refer to this section of the Ansible documentation.
For situations where you need to deploy a binary on a specific architecture or access hardware-specific information, Ansible also allows you to access this through a hardware subset.
You can print out all hardware-related facts using:
---
- name: hardware info
hosts: all
gather_facts: true
gather_subset:
- hardware
tasks:
- name: hardware facts
debug:
var: ansible_facts
After running this playbook, you should get an output similar to:
If you wanted to leverage the architecture fact, here’s how you’d use it to copy a file based on the target architecture:
---
- name: Deploy hardware-optimized settings
hosts: all
gather_facts: true
gather_subset:
- hardware
tasks:
- name: Copy optimized config for amd64
copy:
src: files/db-config-amd64.conf
dest: /etc/mydb/config.conf
mode: '0644'
when: ansible_facts['architecture'] == "x86_64"
Combining the conditional when
and the Ansible fact architecture
enables you to copy files based on the target.
7. Bonus: Fact caching
Fact caching is enabled by default in Ansible; however, facts are stored in memory.
In a production environment, you want facts to be kept in more stable storage. This becomes particularly important when you’re running multiple playbooks against the same hosts or when you want to persist facts between Ansible runs.
The quickest way to switch your cache is by setting the environment variable. You can do this using the command below:
export ANSIBLE_CACHE_PLUGIN=jsonfile
Or in the ansible.cfg
file:
[defaults]
fact_caching=jsonfile
Supported cache plugins include jsonfile
, redis
, and memcached
. Each offers different benefits: jsonfile
is simple and requires no additional infrastructure, whereas redis
provides better performance and can be shared across multiple Ansible control nodes.
Ansible also allows you to control how long facts are cached for. This is useful in scenarios where machines might change but have the same hostname, or you do not want to run the risk of having stale facts.
[defaults]
fact_caching=jsonfile
fact_caching_timeout = 7200
The default is 86400s or 24 hours. You can lower this value to refresh the data automatically.
For cache plugins like Redis, you can specify the connection details with the following configuration option:
fact_caching_connection = localhost:6379:0
With the connection format being host:port:db:password
.
Using fact caching can significantly improve playbook performance because Ansible won’t need to gather facts on every run, and it enables more sophisticated workflows where facts collected in one playbook can be used by another.
Facts are incredibly useful, but they’re not always necessary.
If you’re writing a simple playbook that just copies files or restarts services without any conditional logic, you might not need facts at all. In these cases, you can disable fact gathering entirely by setting gather_facts
to false
, which can help speed up execution.
On the other hand, facts become essential when making decisions based on the target system’s characteristics.
For instance, consider the Apache installation example, where the OS family determined which package to install, or the OpenSearch memory configuration that adapted to each server’s available RAM.
When you do need facts, gather_subset
is your best friend if you know exactly what information you’ll be using.
- When you just need network details for firewall configuration, you can use
gather_subset: network
. - When working with memory-sensitive applications, use
gather_subset: hardware
.
This means you retrieve only the facts you need without the overhead of collecting everything.
Spacelift’s vibrant ecosystem and excellent GitOps flow are helpful for managing and orchestrating Ansible. By introducing Spacelift on top of Ansible, you can easily create custom workflows based on pull requests and apply any necessary compliance checks for your organization.
Another advantage of using Spacelift is that you can manage infrastructure tools like Ansible, Terraform, OpenTofu, Pulumi, AWS CloudFormation, and even Kubernetes from the same place and combine their stacks with building workflows across tools.
You can bring your own Docker image and use it as a runner to speed up deployments that leverage third-party tools. Spacelift’s official runner image can be found here.
Our latest Ansible enhancements solve three of the biggest challenges engineers face when they are using Ansible:
- Having a centralized place in which you can run your playbooks
- Combining IaC with configuration management to create a single workflow
- Getting insights into what ran and where
Provisioning, configuring, governing, and even orchestrating your containers can be performed with a single workflow, separating the elements into smaller chunks to identify issues more easily.
If you want to learn more about using Spacelift with Ansible, check our documentation, read our Ansible guide, or book a demo with one of our engineers.
Ansible facts can allow you to make more intelligent decisions when writing playbooks or developing roles.
The setup module runs in the background of every playbook execution, gathering system information that enables your tasks to adapt to different environments automatically. Whether you’re conditionally installing Apache based on the OS family or dynamically calculating heap sizes based on available memory, you can achieve this.
By using gather_subset
to collect only the Ansible facts you need, or disabling fact gathering entirely for simple tasks, you can significantly reduce playbook execution time. For production environments running multiple playbooks, implementing persistent fact caching with plugins like jsonfile
or redis
prevents redundant system discovery and enables facts to be shared across different automation workflows.
Manage Ansible better with Spacelift
Managing large-scale playbook execution is hard. Spacelift enables you to automate Ansible playbook execution with visibility and control over resources, and seamlessly link provisioning and configuration workflows.
Frequently asked questions
How do I print all Ansible facts for a host?
To print all Ansible facts for a host, run:
ansible <hostname> -m ansible.builtin.setup
To retrieve just the IP or hostname, filter facts with a
setup
modulefilter: ansible <hostname> -m ansible.builtin.setup -a 'filter=ansible_default_ipv4'
For just the IP:
ansible <hostname> -m ansible.builtin.setup -a 'filter=ansible_default_ipv4.address'
How do I disable or skip gathering facts to speed up a play?
To disable fact gathering in an Ansible playbook, set
gather_facts: false
at the play level. This prevents Ansible from running thesetup
module, which collects system facts and can slow down execution, especially when executed across multiple hosts.What’s the difference between set_fact and facts from the setup module?
set_fact
creates custom variables during a playbook run, scoped to the current host and available for later tasks. These are dynamic and defined at runtime.Facts from the
setup
module are system facts gathered automatically by Ansible (e.g., IP addresses, OS, memory). They’re static unless refreshed and provide environment-specific details.Why is Ansible stuck on “Gathering Facts,” and how can I fix it?
Ansible often gets stuck on “Gathering Facts” due to SSH connectivity issues, slow or misconfigured remote nodes, or custom fact-gathering scripts that hang. By default, Ansible runs
setup
to collect system info before executing tasks, which can cause delays if the remote system is unresponsive.To fix it:
- Test SSH manually:
ssh user@host
- Disable fact gathering:
gather_facts: no
- Limit facts:
setup
module with filters - Increase timeout: Use
ANSIBLE_TIMEOUT
or editansible.cfg
- Test SSH manually: