We recommend using a custom script to remove the avero files, since MapR (or any Hadoop distribution) does not provide any features to adjust the maximum retention.
The script below can be used and scheduled on one of the nodes in the cluster to run on a daily/weekly basis :
# cat /opt/mapr/server/retention.sh
#!/bin/bash
usage="Usage: avro_retention.sh [days]"
if [ ! "$1" ]
then
echo $usage
exit 1
fi
now=$(date +%s)
hadoop fs -lsr /rsasoc | grep -E "avro$" |grep -v "meta" | while read f; do
dir_date=`echo $f | awk '{print $6}'`
difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
if [ $difference -gt $1 ]; then
echo $f;
result=$(hadoop dfs -rm $(echo $f | awk -F " " '{printf $8}')) #>> /opt/mapr/logs/retention.log
result="`date` $result"
echo $result >> /opt/mapr/logs/retention.log
fi
done
For example:
The cronjob below will run data retention cleanup every Thursday at 5:15 pm to clean data older than 180 days:
# crontab –e is :
15 17 * * 4 sh /opt/mapr/server/retention.sh 180
If there are any questions regarding the above script, please contact RSA Netwitness Technical Support