NaN's and zeros in data (interpretation). Time window. Regularity.
In my opinion:
- NaNs in "telecom" data (data volume - zone2): there are no activity.
- Zeroes in "telecom" data: there are calls/net activity but less than threshold (1 in fact). For example: ON_NET 0.0 => there was call (calls), but less than 1 minute or less than 30 secs. Confirmation: there are no float in these columns => no seconds are presented, data has been rounded.
I am not sure about these statements:
- NaN's in "nontelecom" (revenue etc.) mean that there no way to calculate values due to low activity or may be the tariff from previous time window was used.
- There are 0 zeroes in "nontelecom" data => all columns can be calculated with telecom values that presents in time window.
About top pack: I think that NaNs can represents these versions: a) in this time window there was no activity with tariff (tariff was selected before time window and in our time window it used without changing). So tariff was selected before and it was "top pack" in previous time window. b) problems with identification such as nans in region column. c) there are 2 or more tariffs were choosen same time.
Questions:
- Is our time window 90 days? (Only one premise for this statement is existence of arpu segment)
- Why regularity min value is 1 and max value is 62? Is 62 means that time window is 62 days? Why there are no zeroes and NaNs in this column?
Technically, there is no way to use time window because we don't have any time series data.