When on-boarding a new user, one of the questions we are often asked is, “can I import my old unit price contracts from Excel spreadsheets?”. Well that’s actually a complicated question. From a technical perspective, reading data from a spreadsheet is easy, providing the program doing the reading knows what to expect. The problem is that everyone has different preferences on how their contracts are structured. There’s a certain amount of pride in having a distinctive yet tasteful spreadsheet template. However, for importing data, boring is better.
The Downloadable Spreadsheet Template for Unit Price Contracts – An Early Approach
Structured spreadsheets with well-defined columns are much easier to import using a program. So our initial approach for importing unit price contracts was based on this idea. We would provide a spreadsheet template for download, into which our customers’ contract could be copied. Then we could import the contract the “easy way”.
Problems with this Approach
The problem with this approach was that copying data from one spreadsheet to another while changing the format was an arduous task. We didn’t want our customers (who, ironically are trying to move away from spreadsheets) to be forced to spend hours manipulating data in spreadsheets as their first experience using ContractComplete. So we knew pretty quickly that this was not really a sustainable approach in the log run.
A second problem with this approach is establishing the hierarchy. Spreadsheets are inherently flat. But ContractComplete supports hierarchical unit price contracts. Our rudimentary solution to this problem was to require the user to use a specific numbering format which allowed us to infer the hierarchy of items. For example,
- 1 Site Preparation
- 1.1 Grade leveling
- 1.2 Access road
- 2 Planting
- 2.1 Deciduous Trees
- 2.1.1 Maple
- 2.1.2 Elm
- 2.1.3 Oak
- 2.2 Coniferous Trees
- 2.2.1 Pine
- 2.2.2 Spruce
- 2.2.3 Cedar
- 2.1 Deciduous Trees
This approach proved to be quite tedious since it often required users to change their numbering system. For example, many users use different numbering systems such as
- A), b), c)
- I, II, III, IV
- Part A, Part B, Part C
Or different variations of the above; often for different levels in the hierarchy.
Introducing the Smart Importer for Unit Price Contracts
We knew we had to find a way to make our importer more adaptive. We were under no illusions that our importer could work for every possible spreadsheet format, but what about just decently structured ones? Well it turns out that we could build an AI to have a reasonable success rate with a wide variety of spreadsheets. Providing these spreadsheets had their data in columns. If we assumed that all spreadsheets cleared this bar, then we could break down the problem into 2 parts.
- How do we decide which spreadsheet column maps to which column in ContractComplete so we can conform to ContractComplete’s structure?
- How do we establish a hierarchy in ContractComplete from a spreadsheet that is inherently flat?
AI, Fuzzy Logic, and Hungarians
The first problem can be presented as a classic computer science problem known as the “assignment problem”. The assignment problem is typically demonstrated by imagining a scenario in which you have N tasks, or jobs, and N people/workers available to perform those tasks. Each person can perform one task and one task only. But each person performs each task with a different speed. So in an assignment problem, we are looking for the optimal (lowest cost) assignment for these jobs. As it turns out, our problem is quite similar. We have a number of different “tasks” (columns in ContractComplete), and a number of different “workers” (columns in the Excel spreadsheet) that we can use to perform those tasks.
Problem Formulation
Structuring the problem in this way simplifies things. Now, we just need to establish the “cost” of each spreadsheet column being assigned to each ContractComplete column. Well, our AI can do this using a simple points system. We evaluate each Excel column for viability against each ContractComplete column. Some of that criteria includes, but is not limited to the following:
- Does the column have currency formatting applied and is therefore likely a price column?
- Does the column have numeric/decimal formatting applied and is therefore a numeric column?
- Is the column header something similar to “Number”, “#”, “Name”, “Description”, “Unit”, “Unit Price”?
- Does the column have things that look like units? Eg. sqm, sqft, m, ft, yd, each
- Are the values in the column really long? Are the values in the column really short?
- Does the column have lots of blank cells?
This is known as fuzzy logic. We can use the above to come up with a weighting matrix, much like the following:
# | Name | Description | Unit | Unit Price | Quantity | |
Excel Column 1 | 4 | 56 | 81 | 57 | 28 | 22 |
Excel Column 2 | 94 | 31 | 24 | 2 | 83 | 56 |
Excel Column 3 | 82 | 72 | 51 | 12 | 11 | 19 |
Excel Column 4 | 96 | 41 | 19 | 73 | 22 | 61 |
Excel Column 5 | 61 | 22 | 23 | 61 | 64 | 9 |
Excel Column 6 | 29 | 90 | 89 | 48 | 74 | 72 |
The Hungarian Method
We can find an optimal (lowest total cost) assignment for this matrix using a procedure affectionately known as the Hungarian Algorithm. The algorithm is based on the work of two Hungarian mathematicians and is currently the least computationally expensive way to solve the assignment problem. Applying the Hungarian Algorithm to the above matrix yields the following optimal solution. Now we know which spreadsheet columns to use!
# | Name | Description | Unit | Unit Price | Quantity | |
Excel Column 1 | 4 | 56 | 81 | 57 | 28 | –22– |
Excel Column 2 | 94 | 31 | 24 | –2– | 83 | 56 |
Excel Column 3 | 82 | 72 | 51 | 12 | –11– | 19 |
Excel Column 4 | 96 | 41 | –19– | 73 | 22 | 61 |
Excel Column 5 | 61 | –22– | 23 | 61 | 64 | 9 |
Excel Column 6 | –29– | 90 | 89 | 48 | 74 | 72 |
Building a Hierarchy
Once we have our columns established, we need to figure out which rows are section headers and which are line items. We support arbitrary grouping of sections so this can be challenging. As in our original strategy, we need to rely on a numbering system. But we wanted to be able to use the user’s own numbering system without requiring a rigid convention that was used in the original spreadsheet template.
Given the following flat list, how would you go about establishing a hierarchy?
- Site A
- Part 1
- 1
- 1.1
- 1.2
- a
- b
- c
- 1.3
- 2
- 2.1
- 2.2
- a
- b
- Part 2
- Site B
- …
Well, most of us would probably look at the first 2 lines and say it’s illogical for “Site A” and and “Part 1” to both be top level headings, so “Part 1” is probably a subheading inside the “Site A” heading. The smart importer does the same thing. If the smart importer sees a change in naming convention, it can infer that we probably need to create a new subheading level below. The same logic could be applied in saying “1” is probably a subheading of “Part 1”. But what about “1.1”?
Pattern Recognition
The smart importer has a set of pre-defined common patterns that it can recognize and in order to infer depths just from looking at the number. The most important example is numbering systems that follow the format “X.X.X….”, such as 1.1.A, or 1.4.5. So we can usually say that if we see the number 1.2, that this number has a depth of 2 beyond the level at which this numbering system was introduced. So, from our example, we can infer:
- …
- Part 1
- 1
- 1.1
- 1.2
- 1
So in a nutshell, we can build a contract hierarchy by repeating the following steps:
- For each item …
- Does the item have a numbering system that looks different from the other numbering systems that have been seen so far in this contract?
- If yes, indent
- If no, then we can infer the indentation by looking at where that numbering system was used earlier in the contract
This procedure leads to the following:
- Site A
- Part 1
- 1
- 1.1
- 1.2
- a
- b
- c
- 1.3
- 2
- 2.1
- 2.2
- a
- b
- 1
- Part 2
- Part 1
- Site B
- …
Putting it All Together
The strategies used above often work well, but they aren’t perfect. Sometimes they produce an imported contract that isn’t indented quite right, or that is missing a header. To help with these situations, we’ve introduced an “import preview” mode in which contracts can be easily manipulated before finalizing the import. It’s great for things like indenting multiple rows at a time or deleting extra rows.
The future is exciting for Smart Importer. New heuristics that we add will be able to handle a wider range of spreadsheets, for example, ones in which multiple logical columns are combined into a single column. A typical example is combining units and quantities into cells that look like “45 feet” or “75 SQM”. Smart importer is and will always be a work in progress, but one day we hope to be able to handle even the worst spreadsheets.
Want to learn how easy it is to import your unit price contracts into ContractComplete so you can benefit? Schedule a demo!