statistics

    Smart progress bars

    Progress bars suck at predicting how long things will take. I’ll tell you what I want (what I really really want): a system-wide resource that receives a description of what the progress bar will be measuring and uses it to make an informed estimate the entire process’s duration. For example, suppose that an application installer will do several things in series, one after another. Perhaps an explanation of that process could be written in a machine-readable format like this:

    vendor: Foo Corp
    name: My Cool App installer
    stages:
    - Downloading files:
      - resource: internet
        size: 1000  # Number of MB to download
    - Extracting files:
      - resource: disk_read
        size: 1000  # Size of the downloaded archive file, in MB
      - resource: disk_write
        size: 2000  # Size of the extracted archive file, in MB
    - Copying files into place:
      - resource: disk_read
        size: 2000  # Now we read the extracted files...
      - resource: disk_write
        size: 2000  # and copy them elsewhere.
    - Configuring:
      - resource: cpu
        size: 100  # Expected CPU time in some standard-ish unit
    

    Because I’ve used the progress bar resource before, it knows about how long each of those things might take:

    • Since I’m currently on my fast home Internet, that download will probably last about 20 seconds.
    • I have a fast SSD, so the “Extracting files” step might be 6 seconds long.
    • “Copying files into place” will run at about the same speed, for another 8 second.
    • My shiny new CPU can chew through 100 CPU units in 10 seconds.

    Ta-da! The whole installation should run about 44 seconds. When the installer runs, instead of updating the progress bar manually like

    update_progress_bar(percent=23)
    

    it would tell the resource how far it had gotten in its work with a series of updates like

    update_progress_bar('Downloading files', internet=283)
    ...
    update_progress_bar('Copying files into place', disk_read=500)
    update_progress_bar('Copying files into place', disk_write=500)
    ...
    update_progress_bar('Configuring', cpu=30)
    

    The app itself would not be responsible for knowing how what percent along it is. How could it? It knows nothing about my system! Furthermore, statistical modeling could lead to more accurate predictions with observations like “Foo Corp always underestimates how many CPU units something will take compared to every other vendor so add 42% to their CPU numbers” or “Bar, Inc.’s website downloads are always slow, so cap the Internet speed at 7MB/s for them.” Hardware vendors could ship preconfigured numbers for new systems based on their disk and CPU speeds where the system can make decent estimates right out of the box. But once a new system is deployed, it gathers observations about its real performance to make better predictions that evolve as it’s used.

    We should be able to do a much better job at better job of guessing how long it’s going to take to install an app. This solution needs to exist.