Billing accuracy standards have never caught on worldwide, and if you take a dispassionate look at the few national accuracy regulations that have been adopted (and sometimes abandoned) you will understand why. No sensible business wants to waste money complying with a standard that is not fit to deliver the promised results. The history of complaints confirms customers are right to pay more attention to the error-strewn bills they receive than the national accuracy certificates that telcos hang on the wall. A robust international accuracy standard would be a blessing, because it could potentially help us to move beyond the bureaucratic politics, vested interests, and idiosyncratic personalities that have shaped national regulations. We would all benefit from the creation of a valid international focal point for work that improves billing accuracy. However, the TS 102 845 billing accuracy standard created by the European Telecommunications Standards Institute (ETSI) has largely been ignored by telcos. That has not discouraged ETSI Specialist Task Force 375 from attempting to reboot their 102 845 Technical Specification by issuing an updated version. How good is the revised standard, and are telcos likely to adopt it?
The Long and Short of It
Though the new standard represents a significant amount of work, a major flaw will be apparent to anyone who flicks through the page numbers: the document is too long for many people to want to read it straight through. This reduces the likelihood that telco managers will champion it. A tightly-written executive summary would have encouraged more readers. The value of the standard would be magnified by adding a short precis of the approach that the authors advocate, supporting it with simple arguments in favor of verifying charging accuracy, and then expanding the commentary in a structured way that allows different readers to obtain the level of detail they need. Senior managers should only be asked to understand the merits of adopting a standard without making them wade through technical specifics that are more appropriate for the staff who will implement the standard. On the other hand, some of the omissions of useful detail may be because the authors did not want the document to be any longer.
Presenting an overlong and inadequately structured standard is a shame because its key principles are sound and they could have been adequately outlined in a few pages. Chief amongst those principles is that telcos should perform tests that emulate the user’s experience of consuming a service, then compare the amounts charged in practice to the amounts that should have been charged. This is a sound and worthy basis for assuring accuracy, despite the inevitable anti-mathematical heckling that originates with vendors who have a vested interest in opposing sample-based assurance methods.
It is hardly surprising that a test-oriented approach is endorsed by manufacturers of test call generators. There is a problem with bias in this standard, but the bias is not manifest in the core principles. Though they do not state it this way, the authors are correct to present an automated version of mystery shopping as being the safest and best way to impartially measure accuracy. Experiencing both the network and the resulting charge from the point of view of the customer gives a proper end-to-end metric that straddles all the potential sources of error. It also avoids the more serious kinds of bias that are introduced when professionals create composite measures of accuracy by combining separate sets of data from differing sources, each of which only covers only some of the potential kinds of error.
Even a relatively small sample of end-to-end tests will produce a more reliable measure of accuracy than some of the artificial composite metrics I have seen telcos adopt in practice. For example, taking a measure of the accuracy of the clock used to determine when a call starts, then adding some data about the checks performed to validate rating reference data, and topping that all with some figures drawn from customer complaints will only give you a measure of the things you measured – and ignores all the types of error that would not be captured by data from any of these sources.
The Devil in the Detail
As typical of standards that must be approved by multiple contributors, TS 102 845 comes up with definitions which are long enough to provide a distraction, but so hollow that they fail to address the supposed purpose of the work. Consider the following excerpt:
7.1.4 Definition of Duration
[18.104.22.168] The Service Provider shall define a clear method to compute the duration of electronic
communications charged to customers. The duration shall be determined as the difference of time
between two well defined trigger points.
On the one hand, this is utterly superficial. The duration of something is the difference between two points in time. Who needs to be told that? That is just what the word ‘duration’ means anyway. On the other hand, this definition is fundamentally wrong. The duration of the service provided to the customer is not the difference between one arbitrary ‘trigger point’ and another arbitrary ‘trigger point’. It is the difference between the moment in time when the customer starts receiving the service, and the moment in time when the customer stops receiving the service. If you do not base your accuracy measure on what the customer actually receives then you are not measuring the accuracy of the charge that is supposed to be based on the service that was provided.
It is easy to understand the rationale for such an abstract definition of duration. Telcos cannot determine the exact time when a service becomes available to a customer, or when it stops becoming available to customer. To do that would be practically impossible. The points in time used by telcos reflect when processing occurs in a network element. There will always be some lag between the moments when a service begins or ends, and the moments when a record of the service is logged. If you want an accurate charge for the service, you must understand and manage the lag so that customers are not overcharged. If you define accuracy so it ignores the lag then you make it easier to be accurate per your own definition – but that would only beg the question of why you do not pick some other arbitrary start and end points which increase the duration, increase the revenues, but which would also be deemed ‘accurate’ per this definition.
The definition of accuracy for data volumes better illustrates the same conceptual defect.
7.1.5 Definition of Data Volume
[22.214.171.124] The Service Provider shall define a clear method to define volume of data exchanged during an electronic communication.
[126.96.36.199] Volume metering method, defined in [188.8.131.52], shall define the way electronic communication is exchanged: upload, download, multicast, etc.
[184.108.40.206] Volume metering method, defined in [220.127.116.11], shall define if protocol overhead information added on top of user information is included or not in the metering.
Consider 18.104.22.168, which says the overheads may or may not be included in the amount to be charged to the customer. This is technically correct – telcos choose whether to include overheads in charges, and the decision reflects the reality of how technology works. It may not be practical to generate a pure measure that includes only the content provided to the customer (the message, video or song they actually want) whilst excluding the overhead (the data included in packets so the technology works as it should). But that does not mean that a pricing plan which transparently includes the costs of overheads eliminates the risk that customers have been overcharged for the overheads. That is because we understand there is the potential for the overhead to be excessive. Just as a timing lag may lead to a significant variance between the duration charged and the duration of service as actually experienced, the overhead may represent a non-trivial addition to the customer’s charge which they cannot control or manage, but which may ultimately be caused by abusive business practices.
I do not know if any telco has done this in practice, but there is no technical law that would stop a business from needlessly inflating the overheads for data traffic. Overheads might be systematically inflated for all customers, or telcos could devise systems that arbitrarily increase overheads for specific customers. Such behavior would rightly be considered overcharging, which means a complete accuracy standard needs to recognize the potential for inaccuracy to be caused by an inappropriate design decision, and not just depending on whether the technology works as designed. And so we need to measure and review overheads too, in order to verify they are not exploitative, even if the customer tariffs state that overheads are included in their charge.
How Soon Is Now?
The most serious defect in this standard is also found in actual practice worldwide. The problem is located in Annex A of the standard, which describes (but seemingly does not mandate) how to design a test sample. As is common worldwide, all the duration-based tests are described as lasting a whole number of seconds. There is no mention of any need to perform tests where the duration is not a whole number of seconds. In other words, it envisages a world where phone calls can last 1.0s, 5.0s, 90.0s, 147.0s but they never last 14.2s, or 70.85s, or 146.999s, or 147.001s. In this important respect the samples are skewed; actual customers on actual networks do not start and end calls that last a whole number of seconds. Only machines behave like that.
This common skew in test design has important (but poorly understood) ramifications for the precision of tests for per second charging. You cannot confidently state you know the accuracy of a measurement device that produces a reading in whole seconds if you only compare its output to test data which is also stated in whole seconds. If a call is charged per second, then we need to understand the distinction between a 45.49 second call being rounded down to 45 seconds, a 45.51 second call being rounded down to 45 seconds, a 45.49 second call being rounded up to 46 seconds, and a 45.51 second call being rounded up to 46 seconds. This is why sloppiness around the definition of duration, and carelessness about lag, undermines the purpose of supposedly impartial accuracy testing. This would be more apparent if test samples were routinely designed to cover a more realistic spread of durations as consistent with actual user and network behavior, and not the idealized sample which assumes tests can only last a whole number of seconds – a property which simplifies the design and interface for test systems, but which is not helpful for the stated goal of measuring accuracy.
Whatever logic is applied that manipulates the recorded duration of a call – whether we truncate to the whole second or consciously choose to round up every call – there will always be real instances where a difference of a millisecond in the actual duration of a service will alter whether the charged duration should be incremented by a whole second or not. One of the possible explanations for this omission is naivety about the speed at which computers work, and the speed at which signals are transmitted over a network. The user might have the impression these things occur instantaneously, but actually a non-trivial amount of time will pass which can be measured in milliseconds. This lag in time will be influenced by load on the network; is the network element handling very many calls at the same time? Lag is also influenced by distance; was the call ended by a B-party on the far side of the planet from the A-party? Network engineers know this, and that is why they have designed systems that consciously make allowances for lag, so as to protect customers from overcharging. If network elements are designed with allowances worth a portion of a second, it is contradictory to adopt a naive testing methodology that assumes the recorded test duration, as stated in whole seconds, should be the same as the charged duration, also stated in whole seconds. Clearly what is happening is that some relevant data is being lost because of a lack of precision, and this is clouded by focusing on the post-rounded durations without seeking to examine what the pre-rounded durations were.
An illustrated example should help to explain the seriousness of the problem. A test call programmed to last 34 seconds will, in practice, typically last 34.1 seconds because of the lags in the operation of the test device itself, and because of lags in how it interacts with the telecoms network. So if a call that is programmed for x seconds actually lasts x.1 seconds, then the network allowance for processing variations and the rounding logic used for charging will conspire to make this test either more sensitive or less sensitive to the duration measurement errors that are most likely to occur in real life. Suppose that all durations are rounded up, as stated in the customer’s contract. If all test calls have a duration between x.1s and x.2s then they are very unlikely to yield an error, because even if the pre-rounded measurement of duration included a positive variance of 0.8s then all the calls would still yield a billable duration of x+1s. If, however, we conducted tests that last between x.7s and x.8s, a positive variance in network measurement of just 0.3s would be enough to lead all of these test calls to be presented on the bill as lasting x+2s.
In this example, very many tests lasting between x.1s and x.2s will not yield any examples of overcharging, whilst tests lasting x.7s would have identified a systematic problem with overcharging. Furthermore, because customers are as likely to enjoy a call that lasts x.7s as they are to enjoy a call that lasts x.2s, if there was an average 0.3s overmetering variance it would result in an extra second being charged for 30 percent of all calls, but none of this would be apparent if the telco relied on an inappropriately designed test sample.
The basic concepts of the ETSI TS 102 845 standard are sound, and all telcos should adopt them in practice. Sadly, it is unlikely that the standard will be widely read, and this might have been helped by producing a more structured document or series of documents where the main ideas are expressed succinctly, alongside the key reasons to measure charging accuracy.
Where the standard fails is due to a blind spot amongst the authors, who rightly favor the use of test technology, but do not address key issues with how such technology is implemented and used in practice. The length of a call, or the amount of data consumed by a customer, is not solely a matter of defining how systems work. We must be mindful that a system may work the way it was intended, but still be configured to cheat customers by overcharging them. An impartial test oriented around the service that was actually provided to the customer would also highlight problems with the design of the charging method.
By not paying sufficient attention to important details, the standard ignores the most common deficiency in the design of duration-based tests, which is that they attempt to assure the accuracy of per second charges by only performing tests which supposedly last a whole number of seconds. In reality this creates a skew in the sample that is not reflective of actual network traffic, and this skew may make it significantly less likely that the tests will detect real examples of relatively small metering variances which nevertheless lead to an extra second being added, and charged, for a large proportion of all events.
Version 2.0 of the ETSI TS 102 845 standard was published in October 2018. You can download it for free from here.