Home > Data Lab > Data Set
  • Last_Mile_Rush

    Providers : CaiNiao Networks

    Posted : 2016.10.27

    #Participants : 0

Data Set Description

Document (You can download after you login)

Format

new_1.csv

.csv (3KB)

new_2.csv

.csv (240KB)

new_3.csv

.csv (14KB)

new_4.csv

.csv (178KB)

new_5.csv

.csv (103KB)

new_6.csv

.csv (5KB)

Background

As e-commerce grows internationally wise at unprecedented pace, we have observed that a large portion of express orders is generated by e-commerce. In China, for example, it is estimated that over 60 percent of express orders are due to e-commerce requests. The most challenging issue in e-commerce shipping business is the last mile delivery where packages are delivered from local branches of express companies to individual customers. The last mile delivery also includes the scenario of intra city O2O delivery. O2O business is attracting increasing amount of investments as it aims to effectively integrate online and offline service and thus expand the boundary of conventional e-commerce business. Both the last mile delivery in e-commerce and O2O delivery for intra city service play a very important role in nowadays shipping industry in China. The competition proposed by Cainiao Network aims to address the last mile delivery problem from the viewpoint of big data and intelligent computation. It requires the participants to find effective solutions that are able to simultaneously improve the efficiency and reduce the cost.

Description
In this competition, all the participants are required to provide solutions to couriers for two types of packages as we mentioned in the background, i.e. e-commerce package and intra city O2O package. For e-commerce packages, couriers will pick up packages at local branches of express companies and deliver to individual customers, while for intra city O2O packages, couriers will pick up packages at O2O shops and deliver to customers with time requirement.
In this competition we will focus on one express company that owns 124 service branches in Shanghai and is responsible for all e-commerce delivery requests in Shanghai. We note that no two local branches have overlapping in their service range. All e-commerce packages arrive at service branches before 8:00am. All couriers start working at 8:00am and are required to deliver e-commerce packages to customers before 20:00. At the same time, these couriers also deliver intra city O2O package (e.g. takeout orders). They pick up takeout orders from O2O shops at the designated time and need to deliver the orders to customers no later than the designated time.
To simplify the delivery model, we make the following assumptions:
1. We take 9214 POIs (Point of Interest) to represent all the customers’ addresses for both e-commerce and intra city O2O orders, and simplify the delivery process as the process of delivering packages to individual POIs. We only provide latitude and longitude for all POIs. Note that each POI could request multiple packages.
2. Since there is no overlapping in service range between any two local branches, each POI will be served by only one local branch.
3. No courier is allowed to carry more than 140 packages at any time.
4. Since one POI could request multiple e-commerce packages, couriers are required to pick up all e-commerce packages requested by an POI and deliver them to the POI at one time.
5. O2O packages from 598 large O2O shops in Shanghai will be used in this competition. Since each O2O order could be comprised of multiple packages, couriers are required to pick up, from O2O shops, all packages of a single order at the same time. 
6. Each O2O order is associated with a pickup time and a delivery time. Couriers are required to arrive at O2O shops no later than the designated pickup time, and have to wait if they come earlier. Couriers are also required to deliver packages to customers no later than the designated delivery time, and do not have to wait if they come earlier. 
7. Both the latitude and longitude information of all the locations, including service branches, POIs and O2O shops, are provided. The distance between two locations is given in (1). The traveling speed of any courier is assumed to be 15km per hour.
8. The processing time of couriers at POIs depends on the number of packages and is given in (2).
9. 1000 couriers are included in this study. We note that each courier could serve more than one branch. The number of couriers could be less than 1000 in the solution that the participant provided.
10. Couriers start their works sharply at 8:00 from service branches and end their days at the last POI.

All participants are required to provide dispatch plans for all couriers, including courier id, arrival and departure time, and the number of packages processed in each location.

DataSet
1. Location information for service branches

Description

Site_id

Service branch codee.g. A001

Lng

Longitude of a service   branch

Lat

Latitude of a service   branch

2.     Location information for 9214 POIs

Description

Spot_id

POI id e.g. B0001

Lng

Longitude of an POI

Lat

Latitude of an POI

3.     Location information for 598 O2O shops

Description

shop_id

O2O shop id e.g. S001

Lng

Longitude of the O2O   shop

Lat

Latitude of the O2O   shop

4.     Information for e-commerce packages

Description

Order_id

e-commerce order id

Spot_id

POI id

Site_id

Service branch code

Num

Number of packages in   the order to be delivered from the given service branch to the given POI

5.     Information for intra city O2O orders

Description

Order_id

Intra city O2O order id

Spot_id

POI id

Shop_id

O2O shop id

Pickup_time

Pickup time at the O2O shop

Delivery_time

Delivery time to   customers

Num

Number of packages in   the order

6.     Courier id

Description

Courier_id

Courier ide.g. D0001

Evaluation
All the participants need to submit their dispatch plans for couriers as follows

Description

Courier_di

Delivery man id (e.g. D0001)

Addr

Branch code or POI id or O2O shop id

Arrival_time

Arrival minutes (minutes from 08:00 a.m. to arrival   time. e.g. arrival time is 11:00, the value is 180)

Departure_time

Departure minutes (minutes from 08:00 a.m. to departure   time. e.g. departure time is 15:00, the value is 420)

Amount

The number of packages processed at the location by the   courier. the value is positive at pickup node, negative at   delivery node

Order_id

Delivery order id

The performance is measured by the total amount of time spent by all couriers, and the team with the shortest time will be the winner of the competition. The total amount of time spent by one courier includes his traveling time between different places and the time for him to process packages at different locations, i.e.

where Ti represents all the travel time of the ith courier and Pi represents all the package processing time of all the POIs that the ith courier visits.
Note 1. The arrival and departure time of the same place should satisfy assumption 8. For each courier, the departure time of the last place and the arrival time of the next place should satisfy the travel speed given in the assumption 7. 10 times the error time, which is explained below by two examples, will be added to the total time spent if any violation of assumptions occurs.
Example 1: Consider a case when a courier arrives at an POI at 9:00am, and takes 23 minutes to processes 36 packages at the POI, he is supposed to leave at 9:23am according to assumption 8. If the departure time presented in the solution is 9:21 instead of 9:23, the error time is 2 minutes  (e.g |9:21-9:23|=2 minutes) in this case.
Example 2: Consider a courier is leaving POI A for POI B at 9:00am. If the two POIs are 10km away, according to assumption 7, it will take the courier 40 minutes to travel from POI A to POI B, and therefore he will arrive at POI B at 9:40am. If the arrival time presented in the solution is 9:35am, the error time will be 5 minutes (e.g |9:40-9:35|=5 minutes) in this case.

Note 2. If a courier arrives at an O2O shop later than the designate time or deliver packages to customers later than the designate time, 5 times the latency (i.e. the number of minutes later than the designated time) will be added to the total time spent.
Note 3. All the time information is accurate up to minutes.


Sample:

D0545

A083

0

0

34

F6344

D0545

A083

0

0

11

F6360

D0545

A083

0

0

19

F6358

D0545

A083

0

0

12

F6353

D0545

A083

0

0

63

F6354

D0545

B5800

7

29

-34

F6344

D0545

B7555

30

59

-63

F6354

D0545

B7182

62

77

-12

F6353

D0545

B8307

79

97

-19

F6358

D0545

B8461

102

117

-11

F6360

D0545

A083

124

124

46

F6349

D0545

A083

124

124

53

F6325

D0545

A083

124

124

39

F6314

D0545

B6528

132

157

-46

F6349

D0545

S245

160

257

1

E0895

D0545

B3266

259

267

-1

E0895

D0545

B3266

267

294

-53

F6325

D0545

B2337

296

320

-39

F6314

D0545

A083

324

324

36

F6366

D0545

A083

324

324

27

F6345

D0545

A083

324

324

36

F6346

D0545

A083

324

324

33

F6308

D0545

S294

340

508

1

E1088

D0545

B1940

525

547

-33

F6308

D0545

B6104

550

573

-36

F6346

D0545

B8926

577

585

-1

E1088

D0545

B9072

587

610

-36

F6366

D0545

B6103

612

633

-27

F6345

The dispatch plan of 545th courier as follows: the courier starts from branch A083 at 08:00, gets packages of order F6344, F6360, F6358, F6353, F6354, at this time, the number of carrying packages is 139<140, the Arrival_time is 0 equal to 08:00 minus 08:00, the service time is 0, the Departure_time = 0 = the Arrival_time + the service time. Then, the courier delivers packages from order F6344, F6354, F6353, F6358, F6360 to poi B5800, B7555, B7182, B8307, B8461. The service times meet the formula (2), are 22, 29, 15, 18, 15 respectively. The Arrival_time = the Departure_time of previous point + the distance between previous point and this point / the speed of courier, are 7, 30, 62, 79, 102 respectively, at this time, the number of carrying packages is 0. The Departure_time = the Arrival_time + the service time, is 29, 59, 77, 97, 117 respectively. Similarly, the courier gets packages of order F6349, F6325, F6314 from A083. At this time, the number of carrying packages is 138<140 and the service time is 0. Then, the courier delivers the package of order F6349 to B6528, and the computation mode of Arrival_time and Departure_time is the same as delivering package of F6344 to poi B5800. Then, the courier gets package of O2O order E0895. The computation mode of Arrival_time is as same as delivering package of F6344 to poi B5800, but the Departure_time = max (the Arrival_time, the Pickup_time minutes), the Pickup_time minutes = the Pickup_time – 08:00, so, the Departure_time of this point = max(160, 12:17-08:00) = 257. Then, the courier delivers package of O2O order E0895 to poi B3266, and the computation mode of Arrival_time and Departure_time is the same as delivering package of F6344 to poi B5800. The follows are as above. It’s unnecessary to go into details.

--------------------------------------------------------以下是中文描述--------------------------------------------------------

赛题背景
电商的蓬勃发展使得目前很大一部分的物流包裹均来源于线上电商订单。在中国,该比例超过了60%。这些包裹在配送的最后环节,是由快递员将包裹从网点送到消费者手中。另一方面,随着互联网逐渐向线下渗透,涌现出了越来越多的同城包裹配送需求,如外卖订单或鲜花蛋糕等等同城订单。这两类包裹的配送是目前中国最后一公里配送中最典型的场景。这次菜鸟算法大赛希望能够通过大数据对中国物流最后一公里提供智能的配送方案,通过全局优化来提升效率及降低成本。

赛题内容:
最后一公里极速配送大赛主要针对赛题背景中提到的两类包裹提供最优的快递员配送方案。第一类是电商包裹,快递员需要从网点提取并配送至消费者,第二类是同城O2O包裹,快递员需要在指定时间去商户提取并在指定时间内配送至消费者。
上海市有124个网点,这些网点负责全部上海电商包裹的配送,大概每天229000件。每个网点的配送范围两两不重合并完全覆盖了整个上海。每天8点前,所有上海的电商包裹都抵达网点。快递员在8点开始从网点进行派送,快递员需要在晚上8点前将所有的电商包裹送至消费者手中。在配送电商包裹的同时,快递员还要配送同城O2O包裹(如外卖订单等),对于这类包裹,快递员需要在指定时间之前到达商户领取包裹,并在指定时间之前送达消费者手中。参赛选手需要提供所有快递员的调度计划,即快递员在网点、配送点和商户的到达时间,离开时间,取/送订单及包裹量。
为了简化模型,我们做了如下假设:
1. 我们将上海所有要派送的包裹(含电商包裹和同城O2O包裹)地址汇聚9214个配送点上,每个配送点上都有若干包裹。快递员把包裹送至消费者手中可抽象为快递员把包裹送到离消费者最近的配送点,并在配送点处理完成。因此,我们将不会提供消费者的地址信息,只提供这9214个配送点的经纬度。
2. 由于每个网点的配送范围两两不重合,所以每个配送点仅会被一个网点服务到。
3. 每个快递员任何时刻携带的包裹量不得大于140件。
4. 每个配送点的电商包裹只能一次配送完毕,不能分多次配送。
5. 我们选取上海较大的O2O商户(含外卖等)598家。这些同城O2O包裹将由快递员到商户取走并送到我们的配送点给消费者。每笔同城O2O订单可能包含若干包裹,从消费者体验考虑,只可在满足总包裹量不大于140件的情况下一次取走,不可分批取走。
6. 每笔同城O2O订单,有商户的领取时间限制,快递员不得晚于该时间到达商户,如果快递员早于该时间到达,则快递员需要等待至领取时间。同时,每笔订单还有消费者最晚收货时间,即快递员不得晚于该时间送达至配送点,如果快递员早于该时间送达至配送点,快递员不需等待,可直接进行配送。
7. 我们提供所有地点的经纬度信息并假定两辆之间的距离符合计算公式(1),并且快递员的平均时速是15公里/小时.
8. 快递员在配送点的停留(处理)时间,这里停留时间和配送地点该次所要配送的包裹量相关,并满足计算公式(2)。
9. 快递员数量上限1000个,有配送任务的快递员数量可以少于1000,快递员可以服务多个网点,编号从D0001D1000
10. 作为起始条件,参赛选手可以指定快递员8点从任何一个网点出发,快递员如果完成当天的任务,则在最后一个配送点结束。

参赛的选手需要提供所有快递员的派送计划,含快递员id,每个地点(网点、递送点或者是商户)的到达时间和离开时间,每个地点的取/送订单id及包裹量。

两点间距离计算公式

赛题数据集
1. 网点id及经纬度,供124个网点

字段

说明

Site_id

网点ide.g. A001

Lng

网点经度

Lat

网点纬度

2.     配送点id及经纬度,共9214个配送点

字段

说明

Spot_id

配送点id e.g. B0001

Lng

配送点经度

Lat

配送点纬度

3.     商户id及经纬度,共598个商户

字段

说明

Shop_id

商户id e.g. S001

Lng

商户经度

Lat

商户纬度

4.     电商订单,共9214笔电商订单,总包裹量为229780

字段

说明

Order_id

订单ide.g. F0001

Spot_id

配送点id

Site_id

网点id

Num

网点需要送至改配送点的电商包裹量

5.     同城O2O订单,3273O2O订单,总包裹量为8856

字段

说明

Order_id

订单ide.g. E0001

Spot_id

配送点id

Shop_id

商户id

Pickup_time

到商户的领取时间(e.g. 11:00

Delivery_time

送达至消费者的最晚时间(e.g. 20:00

Num

订单所含包裹量

6.     快递员id列表,最多1000位小件员

字段

说明

Courier_id

快递员ide.g. D0001

赛题评估标准:
选手需提交所有快递员的调度计划:

字段

说明

Courier_id

快递员id

Addr

网点或配送点或商户id

Arrival_time

到达时长(距离08:00时长分钟数. e.g. 到达时刻为11:00,则到达时间为180

Departure

离开时长(距离08:00时长分钟数. e.g. 离开时刻为15:00,则离开时间为420

Amount

/送货量(取为+,送为-

Order_id

订单id

所有快递员总耗时,最短的队伍获胜。所有快递员的总耗时包括所有快递员的行驶时间和所有配送点的停留处理时间:

其中Ti为快递员i的所有行驶时间,Pi为快递员i在他所经过的所有配送点的停留处理时间。
1. 请按arrival_time 排序,注意同一个配送地点的到达时间和离开时间需满足假设8,前一个地点的离开时间和下一个地点的到达时间需满足假设7的行驶时间.如果同一个配送地点的到达时间和离开时间不满足假设8或前一个地点的离开时间和下一个地点的到达时间不满足假设7的行驶时间,则产生偏差的时间部分10倍计入总耗时。
示例1. 假定快递员在一个配送点要处理36个包裹,如果他在9点到达,则根据假设8,他的处理时间是23分钟,因此他在9:23离开。如果参赛者提供的时间不为9:23(比如9:25),那么就产生了误差时间(|9:25-9:23|=2 minutes),10*2=20分钟将计入总耗时。
示例2. 假定快递员从配送点A至配送点B,如果两点距离为10km,根据假设7, 行驶时间为40分钟。如果快递员9:00离开A,则他到达B的时间为9:40。如果参赛者提供的到达时间不为9:40(比如9:35),那么就产生了误差时间(|9:40-9:35|=5 minutes),10*5=50分钟将计入总耗时。

2. 如果出现 O2O 到达商户的时间超过商户要求的领取时间,或送达用户的时间超过用户要求的送达时间,则超出的时间5倍计入总耗时。
3. 所有时间精确到分钟


实例:

D0545

A083

0

0

34

F6344

D0545

A083

0

0

11

F6360

D0545

A083

0

0

19

F6358

D0545

A083

0

0

12

F6353

D0545

A083

0

0

63

F6354

D0545

B5800

7

29

-34

F6344

D0545

B7555

30

59

-63

F6354

D0545

B7182

62

77

-12

F6353

D0545

B8307

79

97

-19

F6358

D0545

B8461

102

117

-11

F6360

D0545

A083

124

124

46

F6349

D0545

A083

124

124

53

F6325

D0545

A083

124

124

39

F6314

D0545

B6528

132

157

-46

F6349

D0545

S245

160

257

1

E0895

D0545

B3266

259

267

-1

E0895

D0545

B3266

267

294

-53

F6325

D0545

B2337

296

320

-39

F6314

D0545

A083

324

324

36

F6366

D0545

A083

324

324

27

F6345

D0545

A083

324

324

36

F6346

D0545

A083

324

324

33

F6308

D0545

S294

340

508

1

E1088

D0545

B1940

525

547

-33

F6308

D0545

B6104

550

573

-36

F6346

D0545

B8926

577

585

-1

E1088

D0545

B9072

587

610

-36

F6366

D0545

B6103

612

633

-27

F6345

545号快递员调度计划如下:该快递员08:00(到达时长=08:00-08:00=0,服务时长=0,离开时长=到达时长+服务时长=0+0=0)从网点A083开始取货,分别取订单编号为F6344, F6360, F6358, F6353, F6354的包裹(此时快递员携带包裹量为139<140),然后分别去订单编号为F6344, F6354, F6353, F6358, F6360的配送点B5800, B7555, B7182, B8307, B8461送货,服务时长(满足公式(2))分别为223sqrt(34)+5, 29 (3sqrt(63)+5), 15, 18, 15,到达时长=上一地点的离开时长+上一地点与该地点的距离/速度,分别为708:07-08:00, 3008:30-08:00, 62, 79, 102,由于配送点配送的都是电商订单,所以,离开时长=到达时长+服务时长,分别为297+22, 5929+30, 77, 97, 117(目前,快递员携带包裹量为0)。类似,快递员去网点A083取订单编号为F6349, F6325, F6314的包裹(快递员携带包裹量138<140,服务时长=0),然后去配送点B6528配送订单编号为F6349的包裹,到达时间与离开时间计算方式同上,而后,快递员去店铺S245O2O订单编号为E0895的包裹,到达时长=上一地点离开时长+上一地点与该地点的距离/速度=157+3=160,离开时长=max(到达时长,到商户的领取时长=到商户的领取时间-08:00=max160, 257=12:17-08:00=257,接着快递员去配送点B3266配送O2OE0895,到达时长=257+2=259,离开时长=到达时长+服务时长=259+8=267.后续配送计划计算逻辑如上对照,不做详述。

解题思路
首先,需要定位这是什么问题?显然是典型的VRPVehicle Routing Problem)问题,维基上关于VRP问题的介绍,https://en.wikipedia.org/wiki/Vehicle_routing_problem,在网上还可以搜到很多比较成熟的参考资料以及常用求解工具。

下一步,就需要对实际问题进行建模,搞清楚决策函数及约束条件。题目要求快递员总体配送时间最短,即为决策函数。对于约束条件,VRP中常见的约束无外乎最大容量约束(CVRP)、时间窗口约束(VRPTW)、先pickupdelivery约束(VRPPD),该题目全包含了。题目中描述所有电商单的配送需要到网点取后才能送,所有的O2O单要到店铺取后才能送,这是先pickupdelivery约束;题目中要求O2O单的取送货时间要满足要求的最早取货时间和最晚配送时间,这是时间窗口约束;题目中要求快递员任意时刻携带的包裹量不能超过140,这是最大容量约束。至此,该问题就建模完成,宝宝就可以很开心的调用求解工具或者自己coding采用贪婪的方式去求解问题了。

但是,比较popular的工具对问题规模都是有限制的,超过限制性能就会出现问题,比如求解时间会成倍增加。楼主了解到,目前上百个点的规划基本是不成问题的,速度很快,但最大规模也仅限于几百个点,像最后一公里中上万点的规模是相当大的,一般的工具应该是hold不住。我们只能退而求其次,由最优转为次优,将问题规模缩小,分治求解,采用什么样的分治方法,大家可以多尝试,比如配送点聚类后分给各小件员,问题规模可以一下缩很小,可行解很快就可以出来。但有时分配不合理就会出现找不到可行解的情况,仔细阅读一下题目,时间窗口约束较弹性,可以不满足,加惩罚就ok了,所以,加上松弛变量就大致ok了,甚至可以将约束去掉~,大不了cost大点,也是有效的解。

至此,一个还过得去的解就出炉啦~

提交成绩
根据目前为止楼主较长一段时间的测评发现,大家提交的无效解存在的问题都很类似,楼主在此总结下,还望大家在发现成绩无的时候不至于惊慌失措。存在问题:
一,快递员到同一地点取/送多笔订单时,未分开计算到达时间与离开时间。大家往往会忽视一点,快递员到达配送点,取/送多笔订单时,不同的订单还是要分开处理的。举例说明,快递员D000111点到达点B0001,配送2笔订单F0001,F0002,包裹量分别是10,20,则配送计划需要写成:

D0001,B0001,180,194,-10,F0001

D0001,B0001,194,212,-20,F0002

或者

D0001,B0001,180,198,-20,F0002

D0001,B0001,198,212,-10,F0001

二,两点间行驶时间(当前点到达时间-上一点离开时间)计算不正确。奇怪的是,出现这种情况往往不是整体出现,而是局部,所以,楼主也百思不得其解,为什么一些算对了,一些又不对,还望亲们好好检查下,不行就写个代码,将时间整体调整下。记得一定要采用四舍五入的方式哦~,是round不是int!突然想起来了,可能大家的行驶时间中还加入了等待时间,请大家谨记,等待时间只有可能发生在店铺取订单时,其他任何时候的等待都是非法等待,是要计入作弊次数的哈(累计10次以上就无效成绩了哈!)

三,订单的服务时间(离开时间-到达时间)计算不正确。对于网点来说,服务时间一定是0的哈,电商单的取货点是没有服务时间的,否则成绩直接无效哈;对于店铺来说,服务时间的有无决定于到达店铺的时间与商家要求的最早时间,如果早于要求的最早时间到达,是需要等待至最早取货时间才可以出发的哈,否则成绩直接无效,而不是惩罚哈;对于配送点,服务时间是严格遵守公式(2)的哈,记得同样是四舍五入(任何时候精确方式都采用四舍五入),计算方式应该是round(3*sqrt(x)+5)