Detecting Expense Fraud
You are given two list summaries of company expenses although one of the lists contains expense amounts that are fake. There are nearly a hundred line items to analyze but you will be running descriptive statistics on each of the lists and compare their properties to the properties of the Benford distribution. Use the properties of Benford’s Law below to analyze the transactional data and compare each lists’ mean, variance, skewness, and kurtosis to see which is closely attributed to the Benford distribution.
Transactional level data such as disbursements or sales are expected to obey the Benford’s Law distribution. See this summary of how Benford’s Law can help identify fraudulent amounts: http://investexcel.net/benfords-law-excel/
Learning Objectives
To apply an understanding of distribution laws to identify irregular behavior and further prove irregular behavior using descriptive statistics
Statistics Needed
- Understanding of Benford’s Law distribution and properties
- Line chart / Bar chart
- Descriptive Statistics
Data Source
Benford distribution:
mean = 3.44 variance = 6.057 skewness = 0.796 kurtosis = -0.548
Tasks:
Use the 6 steps of the business analytics process as a framework for completing the questions below and then write a report to explain your final decision and reasoning.
Step 1. Recognizing the problem
- Is Benfords-Law applicable?
- What background knowledge is needed to use it effectively?
Step 2. Defining the problem
- How can we use Benford’s-Law to figure out the fake transactions?
- What questions do I need to ask to describe the distributions?
- What would sufficient answers look like?
Step 3. Structuring the problem
- Does it make sense to base our conclusions off of Benford’s-Law?
- Is the data provided sufficient?
Step 4. Analyzing the problem
- What is the best representation to plot a distribution?
- What are the assumptions we need to make?
- Are the assumptions reasonable? How will they affect our models?
Step 5. Interpreting Results and Making a Decision
- How confident can we be in our results?
- What assumptions were made and how do they affect these results?
Step 6. Implementing the solution
- What is the reasoning behind the decision?
- What resources or limitations do we need to consider?
Questions to answer:
-
Create a bar graph of the Benford distribution or reference an external source of the Benford distribution model. Describe this distribution.
-
Create line graphs of the distributions of the 1st-digits for the two expense lists, Expenses1 and Expenses2. Compare the lists’ line graph distributions to the Benford’s distribution.
-
Determine how well each distribution fit the Benford’s Law distribution.
-
Interpret what this may imply about the expense data.
-
Use the descriptive stats and the properties of Benford’s Law to confirm your findings from (3). Compare the descriptive stats from each of the expense lists to the following Benford’s properties:
mean = 3.44 variance = 6.057 skewness = 0.796 kurtosis = -0.548
- How closely does the distribution summary of the expense lists match the Benford’s distribution? Give some indication of your confidence from the models and descriptive stats and why.
Report
Write a professional report (as if you were a hired consultant or employee) for the company. The report should be to the point and give specific, actionable advice or solutions based on the data and analytics. Avoid technical aspects and terms that are non-essential and any speculations not substantiated by the data. This report should be concise without lengthy explanations being necessary to understand it.
There is no min or max page limit as charts and tables can take up a highly variable amount of space. However, any charts or tables included need to be understandable to a layman at first glance (labeled and captioned if needed). The particular models you use, interpretations, and advice given are your choice and you should be prepared to explain or defend this if needed!
Use this as an outline for the report:
A. Description of the business problem
- What are we looking for?
- What are the key pieces of information that we need to determine which is the fraudulent list?
- Why are your findings important?
- What questions will be answered and how do these explicitly help address the decision?
B. Data, methods, and models and results
- Discuss the basic approach used to analyze the data and any concerns about the integrity and quality of the data used.
- Briefly describe the models used (formulas, tables, graphs) indicating what they are used for (try to avoid technical details) e.g. “These graphs plot the distribution of numbers for expenses 1 & 2.”
- Indicate all “important” assumptions made and why you think they are reasonable. An assumption is important if: you need it to get a result e.g. “the fraudulent person creating the expense sheet does not know about Benford’s-Law” or, if wrong or invalid, would significantly affect your results. e.g. the assumption “Benford’s-Law is applicable to our legitimate expense sheet”.
- Do not list technical assumptions used for statistical analysis e.g. “We assume the sales data is Normally distributed.”
C. Decision making
- Explain specifically how you used the models and results to make your decisions. This may be literal results such as “Expense sheet number x is so dissimilar to the Benford distribution that we must assume it contains fraudulent transactions.”
- Detail special considerations or issues to watch out for e.g. “Fraudulent transaction could be faked to match closely to Benford’s-Law.”
- Describe how the decision made can be measured or observed for accuracy.