I observed this slowness while setting up the sync for two directories in Production cluster. Important notes: The size of the disks (directories) and the total number of files inside those directories are little high. Size around 150GB plus and file count more than 10000000.
So we can imagine, it’s something related to the Size & Count itself. However, we can not assume anything, we need proof.
Tried different scenarios:
- Replaced the actual directories with two dummy directories with less files. It worked.
- Tried different settings for Rsync section, no change.
- Started the Lsyncd daemon with all logging enabled. This actually helped to explain the situation.
- Started with strace.
Initially, in the Lsyncd log I was only seeing this message:
Tue Jul 14 06:22:22 2020 Normal: --- TERM signal, fading ---
Tue Jul 14 06:22:28 2020 Normal: --- Startup ---
It started, and the status was also up and not showing more information. Restarted with strace and and process existing with okay code. Then stoped the process and started checking Lsyncd configurations in details and tried tweaking.
Finally started the daemon manually with all logs enabled.
lsyncd -log all /etc/lsyncd.conf
This helped to identify the actual issue/state of the process. The Lsyncd was actually adding Inotify function to all subdirectories on that directories in the Lsyncd configuration file. Remember the size and file/directory count is high. Due to this it was taking high time for initial sync startup.
Mon Jul 13 17:47:38 2020 Inotify: addwatch( /var/lib/docker/volumes/dir )-> 134390
Mon Jul 13 17:47:38 2020 Inotify: addwatch( /var/lib/docker/volumes/dir )-> 134391
Mon Jul 13 17:47:38 2020 Function: Inotify.addWatch( /var/lib/docker/volumes/dir)
Mon Jul 13 17:47:38 2020 Inotify: addwatch( /var/lib/docker/volumes/dir )-> 134392
So it will take some time to start the sync after the service restart or server reboot.