Linux: “Oops, forgot to copy directory ACLs in an rsync…”

Let’s say you want to move a (relatively large) filesystem, with an application’s data files (including a lot of subdirectories, many of them with specific ACLs), to a faster disk (for instance). The old FS is mounted as /PROD, while the new disk is mounted as /NEW; the idea is to stop the application, rsync the data, and switch mount points. Easy, you think:

rsync -av /PROD/ /NEW/

So, the application guys stop the app, you enter the command above, then switch mount points so the old /NEW is now /PROD, and the old /PROD is now /OLD . Everything looks fine, the application guys start their thing again…

… and then complain about access problems. Oops, the rsync didn’t copy the ACLs — you needed the -A argument to include them (e.g. rsync -aAv).

But they took a while to notice it, so that there’s a lot of new data in the new filesystem, so simply doing a new rsync from /OLD to /PROD is unacceptable. You’d simply like to copy directory ACLs (for simplification’s sake, let’s assume there are no file ACLs here — but see below). But there are a couple thousand directories, and of course you don’t want to do it manually…

The solution I came up with was:

cd /OLD
find . -type d | cut -d '.' -f 2- > /tmp/directories.txt
IFS=$'\n' ; for i in `cat /tmp/directories.txt`; do getfacl "/OLD$i" | setfacl --set-file=- "/PROD$i" ; done

Basically, it stores a list of directories in /OLD , then, for each of them, it copies the ACL from /OLD/<directory> to /PROD/<directory> . setfacl’s “–set-file=-” means use the standard input as a “file”, which comes from getfacl‘s output. The “IFS=$’\n’” bit means that the for loop cycles through entire lines, not “words” — otherwise, it would try to split paths with spaces in them.

A limitation of this is that any newly created directories in the new /PROD filesystem won’t be affected, but hopefully they won’t be too many.

What if you wanted to include files as well? Just remove the “-type d” in the find command. Note that, in this case, it’s much more likely that there are new files in the new /PROD FS that won’t have their ACLs corrected.

Today I learned: root’s cron (including stuff like logrotate) doesn’t run if root’s password is expired

… which can be hard to spot, since in most places you never use root’s password anywhere (you “sudo su” to root using your user‘s password).

Today’s story:

  1. server has a logfile of several GB;
  2. head logfile shows it hasn’t been rotated in more than a year;
  3. running logrotate /etc/logrotate.conf manually works;
  4. /var/log/cron includes entries like:

Mar 22 14:10:01 xxxxxxxx crond[26561]: (root) PAM ERROR (Authentication token is no longer valid; new one required)
Mar 22 14:10:01 xxxxxxxx crond[26561]: (root) FAILED to authorize user with PAM (Authentication token is no longer valid; new one required)

  1. chage -l root shows that its password has expired…

Now, why did root have password expiration enabled? It’s a mystery 🙂 — probably someone ran a script configuring password expiration for all users and forgot to add some exceptions to it, root among them. Anyway,

chage -E -1 -M -1 root ; passwd root

solved the problem. Hope this is useful. 🙂

Note to Self #1: resolv.conf uses “nameserver”, not “server”

(Welcome to “Note to Self”, a new series of hopefully short posts where I attempt to remind my future self of the correct way of doing things I often sometimes get wrong, even after decades.)

The correct syntax for specifying nameservers in /etc/resolv.conf is:

nameserver 1.2.3.4

not:

server 1.2.3.4

Worse, the system won’t complain; it’ll just ignore those entries, so you may wonder for a while about why name resolution is not working.

Of course, when adding an entry to an existing file, you probably won’t make this basic mistake, as any current entries will use the correct “nameserver” syntax. But when creating a new file from scratch… not that it happened to me yesterday, oh no. 🙂

Linux: How to increase the size of an ext2/3/4 filesystem (using LVM)

(Insert obligatory disclaimer about this being basic stuff, but this blog being — among other things — about documenting stuff I’m asked about/help others with at work, because it may be useful to other users, etc. etc.)

If you work, or have worked, as a sysadmin, you’ve probably had to do this in the past, but, even on your own system — if you were wise enough to select “use LVM” during installation — you may find yourself facing a common problem: needing to grow a filesystem, such as / (the root filesystem), /opt, etc..

First, in many cases, you may not actually need to grow an existing filesystem: if you will need to store a lot of data in a specific directory (let’s say /opt/data, but /opt is currently almost full), then simply creating a new partition and mounting it (possibly mounting it first in a temporary location so that you can move the existing directory into it) as (for the above example) /opt/data 1, may well solve your problem. In this case, by the way, you don’t even need to be using LVM. You’ve just freed space on /opt and (depending on the new partition’s size) you now have a lot more room to grow on /opt/data as well.

But, since we want to challenge ourselves at least a little bit, let’s say we really need to grow that filesystem. As a requisite, the current disk must be a Logical Volume (LV) of an existing Volume Group (VG), made up of one or more Physical Volumes (PVs). Let’s also assume the filesystem you want to grow is /opt, that it’s an ext4 filesystem 2, and that it’s an LV named “lv_opt“, which is part of a VG called “vg_group1“, which currently has no free/unused space (if it did have enough unused space for your needs, you could skip to step 4 below.)

The process has several steps:

1- Add the new disk (or disks, but let’s say it’s just one for simplicity’s sake) to the system. This may mean adding a physical disk to a machine, or adding a new drive to a VM, or the company’s storage team presenting a new LUN, or… Anyway, the details of the many possibilities go beyond the scope of this tutorial, so let’s just assume that you have a new disk on your machine, and that Linux sees it (possibly after a reboot, in the case of a non-hotpluggable physical disk), as something like /dev/sdc (which we’ll use for the rest of the examples.)

2- Create the new PV: if you’re going to add the entire sdc disk to the VG, just enter:

pvcreate /dev/sdc

Otherwise, use fdisk to create a new partition inside it (sdc1), with the desired size, and change it to type “LVM” (it’s “8e” on fdisk). After exiting fdisk (saving the current configuration), enter (this is apparently not mandatory on modern Linux versions, but there’s no harm in it):

pvcreate /dev/sdc1

3- Add the new PV to the existing VG:

vgextend vg_group1 /dev/sdc # or /dev/sdc1, if the PV is just a partition instead of the entire disk

If you enter a “vgs” command now, you should see the added free space on the VG.

4- Extend the LV: if you want to add the entirety of the VG’s currently unused space, enter:

lvextend -l +100%FREE lv_opt

If, instead, you want to specify how much space to add (say, 10 GB):

lvextend -L +10G lv_opt

Note that one version of the command uses “-l” (lower case L) and the other uses “-L“. That’s because one refers to extents, while the other refers directly to size.

Now the LV has been extended, but wait! You still need to…

5- Extend the filesystem itself: just enter:

resize2fs /opt

Note that it’s a “2” in the command name, regardless of whether the filesystem is ext2, 3 or 4. This operation may take a while if the filesystem is large enough, but after it’s completed you should be able to see the new filesystem’s larger size (and added free space) with a “df” command.

Any questions or suggestions, feel free to add a comment.

Networking definitions: Allowing traffic, Port forwarding, NAT, and Routing

I’m more of a systems administrator than a network admin (though I’ve also worked as the latter, in the past), but, of course, one can’t be a sysadmin without knowing at least something about networking. And yet (and like my previous post about compiling stuff), I’ve found that it’s possible to do a perfectly good job as a sysadmin, and yet still mix up a few networking concepts from time to time.

Therefore, I wanted to write about four different, but related, concepts, that I’ve noticed people sometimes confuse.

1. Allowing traffic

This just means that a network interface allows traffic to a specific destination and/or port, possibly restricted to a specific source (an IP address, or a network). Any traffic that is not allowed is simply refused.

Note that this by itself doesn’t mean that the interface (of a server, a router, etc.) will do anything (or anything desirable, at least) with the received traffic. For instance, it may not have something listening on that port, or it may not be configured to route that traffic to somewhere useful. This just means that it doesn’t instantly block the connection.

This is typically controlled by a local software firewall such as iptables.

2. Port forwarding

Port forwarding just means: if you receive a packet on interface A, port B, then redirect it to IP address C, port D. “D” may be the same as “B”, and “C” may be on the same host or at the other end of the planet.

Again, note that the mere fact that there’s a matching redirect rule for the initial destination interface and port doesn’t mean that the host actually accepts redirecting the traffic (see 1.) or that it knows how to route it to its final destination (see 4.)

3. NAT

NAT, or Network Address Translation, can be seen as a special case of two-way port forwarding (see 2.). Basically, a router (which may well be a simple server with two or more (physical or virtual) network interfaces, it doesn’t have to be a “router” bought in a store) accepts traffic from a (typically private) network, then translates it so that it goes to the destination address in the (typically public) network, with (and this is the important part) the source “masked” as the router’s public IP address, and a different source port that the router “remembers”, and then knows how to handle the returning traffic and “untranslate” it so that it goes to the original source.

In short, NAT allows many private hosts in a network to access the Internet using a single public IP, and a single connection. (It can have other similar uses, even some not related to the public Internet, of course; this is just the most common one.)

4. Routing

Routing is simply a host knowing that a packet intended for IP address X should be directed to IP address Y 1.

Again, the obligatory caveats: this doesn’t mean that the traffic is even accepted before attempting to route it (see 1.), or that the host actually knows how to reach IP address Y itself (it may not have a local route to it). It may also be that the routing is correct, but there is no address translation (see 3.), so the destination host receives a packet in a public interface that claims to be from a private address, and refuses it. Finally, the destination host itself needs a route to the original source, and it may not have one configured (or have a misconfigured one, causing asynchronous routing and possibly making the source refuse the returning traffic).

Thoughts? Corrections? Clarifications? Yes, I know this is relatively basic stuff. 🙂

Linux: Detecting potential disk space problems before going home

As is typical in sysadmin work, one member of my team is always on call, having to be available during the night and on weekends. Sometimes the problems are big (especially hardware failures), sometimes they’re trivial, and sometimes they’re false alarms, but one thing is always true: it’s not fun to be woken up by a ringing phone, especially when you’re already not having enough sleep.

One of the most common reasons for being called at night (or during weekends) is when a disk’s occupation exceeds a threshold (let’s say 95%, although that value varies in reality). Logs get larger (especially when some error is constantly occurring), old logs aren’t compressed/rotated/deleted automatically, users leave huge debug files somewhere and forget to delete them for years… all of these happen, it’s mostly inevitable.

But, if we’re woken up at 4 A.M. because a /var partition reached 95% utilization, isn’t it true that in most cases the increase was gradual, and the value was already abnormally high — let’s say 93 or 94% — at 4 P.M.?

And wouldn’t it have been much better (for our well-earned rest, among other things) for the person on call to have fixed the problem during the day, before going home?

I thought so, too. Which is why I wrote a script some months ago to detect which servers (from more than a thousand) will need attention soon.

(Looking at that script now, I’m a bit ashamed to share it, as it’s pretty lazily coded, repeating code one time for each filesystem instead of having some kind of function… but it was made in a bit of a hurry some time ago, so because of laziness… I mean, historical accuracy 🙂 I’ll share it as it is (other than translating variable names and output messages from my native Portuguese)).

So, here it is. Don’t laugh too hard, OK? 🙂

#!/bin/bash

# check / partition
ROOT=`df -hl / | tail -1 | awk '{ print $5 }' | cut -d '%' -f 1`
case $ROOT in
    ''|*[!0-9]*) ROOT=`df -hl / | tail -1 | awk '{ print $4 }' | cut -d '%' -f 1` ;;
esac
case $ROOT in
    ''|*[!0-9]*) ROOT=`df -hl / | tail -1 | awk '{ print $3 }' | cut -d '%' -f 1` ;;
esac

TOP=$ROOT
WORST="/"

# check /var partition
VAR=`df -hl /var | tail -1 | awk '{ print $5 }' | cut -d '%' -f 1`
case $VAR in
    ''|*[!0-9]*) VAR=`df -hl /var | tail -1 | awk '{ print $4 }' | cut -d '%' -f 1` ;;
esac

if [ "$VAR" -gt "$TOP" ]; then
        TOP=$VAR
        WORST="/var"
fi

# check /home partition
# comment out this entire block if you don't want to monitor /home
HOME=`df -hl /home | tail -1 | awk '{ print $5 }' | cut -d '%' -f 1`
case $HOME in
    ''|*[!0-9]*) HOME=`df -hl /home | tail -1 | awk '{ print $4 }' | cut -d '%' -f 1` ;;
esac

if [ "$HOME" -gt "$TOP" ]; then
        TOP=$HOME
        WORST="/home"
fi


# check /tmp partition
TMP=`df -hl /tmp | tail -1 | awk '{ print $5 }' | cut -d '%' -f 1`
case $TMP in
    ''|*[!0-9]*) TMP=`df -hl /tmp | tail -1 | awk '{ print $4 }' | cut -d '%' -f 1` ;;
esac

if [ "$TMP" -gt "$TOP" ]; then
        TOP=$TMP
        WORST="/tmp"
fi


# check /opt partition
OPT=`df -hl /opt | tail -1 | awk '{ print $5 }' | cut -d '%' -f 1`
case $OPT in
    ''|*[!0-9]*) OPT=`df -hl /opt | tail -1 | awk '{ print $4 }' | cut -d '%' -f 1` ;;
esac

if [ "$OPT" -gt "$TOP" ]; then
        TOP=$OPT
        WORST="/opt"
fi


#echo "TOP = $TOP"

if [ "$TOP" -ge 70 ] && [ "$TOP" -le 80 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 7
fi


if [ "$TOP" -ge 80 ] && [ "$TOP" -le 90 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 8
fi

if [ "$TOP" -ge 90 ] && [ "$TOP" -le 95 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 9
fi

if [ "$TOP" -ge 95 ] && [ "$TOP" -lt 98 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 10
fi

if [ "$TOP" -eq 98 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 11
fi

if [ "$TOP" -eq 99 ]; then
	echo "Worst partition: ($WORST) at $TOP"
        exit 12
fi


if [ "$TOP" -gt 99 ]; then
	echo "Worst partition: ($WORST) at $TOP :("
        exit 20
fi

echo "Everything OK: worst partition: ($WORST) at $TOP"
exit 0

The goal is to run it on every server you’re responsible for (we use an HP automation tool where I work, but it could be done in many ways, including with a central machine who could ssh “passwordless-ly” into all other servers — it wouldn’t need root access for that, since it doesn’t change anything anywhere. Afterwards, you’re supposed to sort the output by exit codes; anything other than 0 might indicate a problem, and the higher, the worse it is; exit code 20 means a partition at 100%. Again, that HP tool does that sorting easily, but it’s far from being the only way — if you’re reading this, you’ve probably already thought of some.

And then, naturally, you’re supposed to go through the worst cases, either by deleting obvious trash, or by contacting the application teams so that they clean up their stuff.

Depending on your company’s policies, you may want to comment out the block that refers to the /home partition, if that is completely the responsibility of the users.

I don’t have statistics, of course, but I can tell you that being called because of a(n almost) full filesystem, which was relatively frequent a year or so ago, is now extremely rare (only happening when — after work hours — an application goes really wrong (typically because an untested settings change) and starts dumping error messages/debug logs as fast as it can). If only hardware problems didn’t exist, either…

MySQL/MariaDB easy optimization tutorial

Introduction:

MySQL and MariaDB (note: for brevity, I’ll be referring to “MySQL” from now on, but everything here applies to both and, indeed, these days I use MariaDB exclusively on my own servers) are very popular as database servers, but in my experience are rarely properly configured to take advantage of the server’s resources. While I’m not a database administrator (DBA), as a sysadmin I often have to diagnose performance problems in servers I maintain, and MySQL is a common culprit; it seems that most MySQL DBAs are well-versed in SQL, but the my.cnf configuration file seems to be a mostly unexplored mystery to them: either it’s untouched from the default configuration (which is conservatively set to work on the barest of machines), or they limit the memory usage to some 3% of the server’s available RAM(!) 1, or else they go in the other direction and try to use more resources than are available.

Without, of course, attempting to be a full MySQL guide (or even a full optimization guide), this tutorial will provide a few guidelines, rules-of-thumb, and settings suggestions so that a MySQL non-expert can still make his/her server run a lot more smoothly.

A vital tool:

MySQLTuner is an essential tool for MySQL administration, and, indeed, you should run it on your MySQL system before any configuration changes (whether suggested here or not), and also after implementing them. Its results, however, can be a bit intimidating, and (as its documentation warns) its recommendations should never be implemented blindly 2.

So, *do* install and use MySQLTuner (and, by following this guide, you should be seeing a lot more “green” and less “red” in its results), but don’t blindly follow its suggestions without understanding their point.

Oh, and keep your MySQLTuner script mostly up-to-date (assuming you installed it from its site, instead of your distro’s packaging system); don’t just keep using the version you downloaded some 4 years ago. Software evolves, you know. 🙂

About memory:

Apart from other “always-a-good-idea” settings that we’ll see later, the amount of memory you set MySQL to use is probably the most important configuration.

Before we begin: I still keep seeing co-workers (both at my job and the previous one) thinking of free memory as something that is desirable for a production server to have a lot of. They see a machine with, say, 90% of its RAM in use and get worried.

Guys and girls, Linux is not MS-DOS! (Really, you don’t have to obsess about freeing conventional memory anymore! 🙂 ) There’s such a thing as virtual memory, and also another thing called disk caching! Linux actually uses most of its free memory to cache disk access (and recent versions of top actually show that as still “available”, but for decades it said “cached”). Memory that’s truly free is simply being wasted (and a server with a large amount of it even after a week of uptime is a server that should have some of its RAM removed to be better used elsewhere).

Having more than half of a machine’s RAM as disk cache is certainly not as bad as having it unused/wasted, but it would have been far better to let applications (in this case, MySQL) actually, you know, use it. Which we’ll do here.

OK, end of rant. 🙂

So, what’s a good rule of thumb for how much memory to give to MySQL? I’d say 80% of the system’s unused RAM.

Note the “unused”. If the server just has MySQL on it, then, sure, give MySQL 80% of the free memory after loading up the OS. But if it runs other services (e.g. a web server, some other database, etc.), then see how much RAM you have after all of them are running, and give MySQL 80% of that.

An exception to this rule might be a very, very small database, say, with a single table with just a couple of entries, and rarely or never updated. In this case, giving it so much memory might be overkill… but in such a case, you wouldn’t be here reading an optimization guide, right?

Memory configuration:

(Note: there are other memory-related settings, such as join_buffer_size or innodb_log_buffer_size, but their defaults should be good enough. Also note that defaults may change (they’re usually increased) between MySQL/MariaDB versions, which is one more reason to keep your database server updated as much as possible.)

(Note 2: if you’re using very old versions of MySQL or MariaDB, it’s possible that some settings below don’t “exist” yet. If the server refuses to (re)start and complains about an unknown setting, simply remove it/comment it out. Better yet, start planning a software update…)

First, back up your database. Really. Nothing here should be dangerous, but better safe than sorry. And, if possible, try out these changes on a test machine before moving to the real, production one.

For extra fun, run MySQLTuner now and save the results (e.g. mysqltuner.pl > ~/mysql-pre-optimization.txt), so you can compare them later…

So, edit (after backing it up, of course) your /etc/my.cnf (or /etc/mysql/my.cnf in Ubuntu, or /etc/my.cnf.d/server.cnf in Fedora), and look for the [mysqld] section.

You can add the following entries to the end of that section, they’ll replace their settings if they’ve already been previously set.

If you’re using InnoDB (and, between it and MyISAM, you certainly should):

innodb_buffer_pool_size = MEM1
innodb_log_file_size = MEM2
innodb_buffer_pool_instances = NUMBER
  • MEM1 should be about 80% of your available RAM (see “About memory” above). You can specify it as a number followed by “M” for megabytes, “G” for gigabytes, etc.;
  • MEM2 should be one eighth (1/8) of MEM1. For example, for 1 GB of MEM1, set MEM2 to 128M;
  • NUMBER should be the number of gigabytes (rounded down) of MEM1. For instance, for 4 GB, set NUMBER to 4. Note that this value doesn’t have a unit (M, G, etc.)

If you’re not using InnoDB at all (why?), then don’t add any of the above, of course.

For MyISAM:

key_buffer_size = MEM3

If your databases are all InnoDB, then leave this value small (e.g. 4M), as the MySQL user tables and such are still MyISAM. If you have many and/or large MyISAM tables, on the other hand, set MEM3 to 25% of the available RAM (see “About memory” above), and reduce the InnoDB memory by that amount (see MEM1 above). If you have only MyISAM tables, then you can raise MEM3 to 50%-80% of available memory.

If you’re using Aria tables, the setting is aria_pagecache_buffer_size; follow the key_buffer_size recommendations.

Other settings:

I find the following settings to work pretty well in most cases (and MySQLTuner seems to agree):

query_cache_type = 0 # recommended to be off, these days
query_cache_size = 0
thread_cache_size = 128
table_open_cache = 2048
low_priority_updates = 1 # MyISAM only, but no harm
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
tmp_table_size = 64M
max_heap_table_size = 64M

And that’s it! Restart MySQL, and you should enjoy much better performance, both for MySQL itself and for the rest of the server.

Linux: create a Volume Group with all newly added disks

Let’s say you’ve just added one or more disk drives to a (physical or virtual) Linux system, and you know you want to create a volume group named “vgdata” with all of them — or add them to that VG if it already exists.

For extra fun, let’s also say you want to do it to a lot of systems at the same time, and they’re a heterogeneous bunch — some of them may have the “vgdata” VG already, while some don’t; some of them may have had just one new disk added to it, while others got several. How to script it?

#!/bin/bash

# create full-size LVM partitions on all drives with no partitions yet; also create PVs for them
for i in b c d e f g h i j k l m n o p q r s t u v w x y z; do sfdisk -s /dev/sd$i >/dev/null 2>&1 && ( sfdisk -s /dev/sd${i}1 >/dev/null 2>&1 || ( parted /dev/sd$i mklabel msdos && parted -a optimal /dev/sd$i mkpart primary ext4 "0%" "100%" && parted -s /dev/sd$i set 1 lvm on && pvcreate /dev/sd${i}1 ) ) ; done

# if the "vgdata" VG exists, extend it with all unused PVs...
vgs | grep -q vgdata && pvs --no-headings -o pv_name -S vg_name="" | sed 's/^ *//g' | xargs vgextend vgdata

# ... otherwise, create it with those same PVs
vgs | grep -q vgdata || pvs --no-headings -o pv_name -S vg_name="" | sed 's/^ *//g' | xargs vgcreate vgdata

As always, you can use your company’s automation system to run it on a bunch of servers, or use pssh, or a bash “for” cycle, or…

Linux: find users with full sudo access on many machines

Disclaimer: there are surely many, far better ways to do this — feel free to add them in the comments. This was just a quick and dirty script I came up with yesterday, after a co-worker wondered if there was an easy way to do this on all the servers we administer.

The situation: you administer 1000 or more servers, you and your team are the only users who are supposed to be able to sudo to root (unlike simply running certain specific commands, which is typically OK), but sometimes you have to grant temporary full sudo access to a particular user or group of users who, for instance, are doing the initial application installations, but who are supposed to lose that access when the server enters production.

The problem: it’s easy to forget about those, and so the temporary access becomes permanent (yes, there are other ways around that, such as using a specific syntax for those accesses that includes a comment that you then use a script, called by the “at” daemon, to remove later, but bear with me for now). Wouldn’t it be useful to be able to look at a group of some, or even all, of the servers you administer, and find those unwanted, forgotten sudo accesses?

Continue reading “Linux: find users with full sudo access on many machines”

Red Hat / CentOS: How to list RPMs installed on year X

Paranoid-minded auditor: “I need a list of all RPMs that were installed this year on these servers, stat!”

You: “OK, let’s see here…”

On a single machine:

for i in `rpm -qa`; do rpm -qi $i | grep Install | grep -q 2017 && echo $i; done

If you need to check several servers, you can use a bash “for” cycle, or pssh, or…

How to update Red Hat or CentOS without changing minor versions

If you’re a Linux system administrator or even a “mere” user, you’ve probably noticed that, when using a Red Hat-like system, if you do “yum update” it may well raise the minor version level (e.g. 6.7 to 6.9). In fact, it should move your system to the latest minor version (the number after the dot) of your current major version (the number before it).

You may, then, have wondered if it is possible to update your system and yet remain on your current version. You may even have been asked to do so by a very, very timid boss, or some development/application team (“this is supported only on Red Hat 7.1, we can’t move to 7.2!”).

Before I go on, I have to say that there is absolutely no technical reason to do this (EDIT: not necessarily true any longer, at least for 7.x, see CertDepot’s comment and link. Still true in most cases; the reason for this demand is almost always ignorance, fear, and laziness, not knowledge of any actual change causing incompatibilities). I really hope you’ve arrived here just because a boss, project manager or developer is demanding it (and, sadly, you don’t work at a place where you can say “no, that’s stupid, I won’t do it”… yet 😉 ), or simply because of scientific curiosity, not because you actually think that doing this is a good idea.

Red Hat (or CentOS) minor versions aren’t really” versions” in the usual sense, where new versions of software packages, libraries, etc. are included. Instead, (with a few desktop-related exceptions, such as web browsers) they take pains to only fix security problems and other bugs. If you look at a particular package’s versions, whether you’re on Red Hat Enterprise Linux 7.0 or 7.3, those always stay the same, only the “Red Hat” number (e.g. file-5.11-21.el7) increases. Therefore, there is never (EDIT: see above edit) any question of “compatibility”; it may, however, be a question of “officially supported”, which is code for “we tested our product with this version, and can’t be bothered to test it with any others.”

Sorry about the rant. 🙂 So, since you’re obviously a competent sysadmin, I’ll assume you’re being forced to do it. Here’s how:

With Satellite:

To see which releases you have available:

subscription-manager release --list

Example:

# subscription-manager release --list
+-------------------------------------------+
 Available Releases
+-------------------------------------------+
5.11
5Server
6.2
6.7
6.8
6.9
6Server
7.0
7.1
7.2
7.3
7Server

To lock on a release (e.g. 7.1):

subscription-manager release --set=7.1

And to unlock it:

subscription-manager release --unset

(or maybe –set=7Server)

Without Satellite:

For a single update, add –releasever=x.y to your yum command; for instance:

yum --releasever=7.1 update

To set it permanently, add:

distroverpkg=x.y

to the [main] section in your /etc/yum.conf file.

Notes: at least on CentOS, since CentOS 7.x, versions aren’t just “x.y”, they also include a third number, apparently the year and date of release. Browsing on http://vault.centos.org/centos/ , for instance, you see you have these versions available:

[DIR] 6.7/ 21-Jan-2016 13:22 - 
[DIR] 6.8/ 24-May-2016 17:36 - 
[DIR] 6.9/ 10-Apr-2017 12:48 - 
[DIR] 6/ 10-Apr-2017 12:48 - 
[DIR] 7.0.1406/ 07-Apr-2015 14:36 - 
[DIR] 7.1.1503/ 13-Nov-2015 13:01 - 
[DIR] 7.2.1511/ 18-May-2016 16:48 - 
[DIR] 7.3.1611/ 20-Feb-2017 22:23 - 
[DIR] 7/ 20-Feb-2017 22:23 -

and, yes, you have to specify the third number in your command/config file.

You may also have to enable the several entries in your /etc/yum.repos.d/CentOS-Vault.repo file (change enabled=0 to 1).

Sources: 1